The process for collecting logs using Flume is as follows:
- Install the Flume client.
- Configure the Flume server and client parameters.
- Collect and query logs using the Flume client.
- Stop and uninstall the Flume client.
A Flume client consists of the source, channel, and sink. The source sends the data to the channel, and then the sink transmits the data from the channel to the external device.
A source receives or generates data and sends the data to one or more channels. Sources can work in either data-driven or polling mode.
Typical sources include:
A source must be associated with at least one channel.
A channel is used to buffer data between a source and a sink. After the sink transmits the data to the next channel or the destination, the cache is deleted automatically.
The persistency of the channels varies with the channel types:
Channels support the transaction feature to ensure simple sequential operations. A channel can work with sources and sinks of any quantity.
A sink transmits data to the next hop or destination. After the transmission is complete, it deletes the data from the channel.
Typical sinks include:
A sink must be associated with at least one channel.
A Flume client can have multiple sources, channels, and sinks. A source can send data to multiple channels, and then multiple sinks send the data out of the client.
Multiple Flume clients can be cascaded. That is, a sink can send data to the source of another client.
- What are the reliability measures of Flume?
- The transaction mechanism is implemented between sources and channels, and between channels and sinks.
- The sink processor supports failover and load balancing.
The following is an example of the load balancing configuration:
server.sinkgroups=g1 server.sinkgroups.g1.sinks=k1 k2 server.sinkgroups.g1.processor.type=load_balance server.sinkgroups.g1.processor.backoff=true server.sinkgroups.g1.processor.selector=random
- What are the precautions for the aggregation and cascading of multiple Flume clients?
- Use the Avro or Thrift protocol for cascading.
- When the aggregation end contains multiple nodes, evenly distribute the clients to these nodes. Do not connect all the clients to a single node.