Help Center > > User Guide> MRS Cluster Component Operation Guide> Using Flume> Introduction


Updated at: Sep 12, 2019 GMT+08:00


The process for collecting logs using Flume is as follows:

  1. Install the Flume client.
  2. Configure the Flume server and client parameters.
  3. Collect and query logs using the Flume client.
  4. Stop and uninstall the Flume client.

Flume Client

A Flume client consists of the source, channel, and sink. The source sends the data to the channel, and then the sink transmits the data from the channel to the external device.

Table 1 Module description




A source receives or generates data and sends the data to one or more channels. Sources can work in either data-driven or polling mode.

Typical sources include:

  • Syslog and Netcat, which are integrated in the system to receive data
  • Exec and SEQ that generate event data automatically
  • Avro that is used for communication between agents

A source must be associated with at least one channel.


A channel is used to buffer data between a source and a sink. After the sink transmits the data to the next channel or the destination, the cache is deleted automatically.

The persistency of the channels varies with the channel types:

  • Memory channel: no persistency
  • File channel: persistency implemented based on write-ahead logging (WAL)
  • JDBC channel: persistency implemented based on the embedded database

Channels support the transaction feature to ensure simple sequential operations. A channel can work with sources and sinks of any quantity.


A sink transmits data to the next hop or destination. After the transmission is complete, it deletes the data from the channel.

Typical sinks include:

  • HDFS and Kafka that store data to the destination
  • Null sink that automatically consumes the data
  • Avro that is used for communication between agents

A sink must be associated with at least one channel.

A Flume client can have multiple sources, channels, and sinks. A source can send data to multiple channels, and then multiple sinks send the data out of the client.

Multiple Flume clients can be cascaded. That is, a sink can send data to the source of another client.

Supplementary Information

  1. What are the reliability measures of Flume?
    • The transaction mechanism is implemented between sources and channels, and between channels and sinks.
    • The sink processor supports failover and load balancing.
      The following is an example of the load balancing configuration:
      server.sinkgroups.g1.sinks=k1 k2
  1. What are the precautions for the aggregation and cascading of multiple Flume clients?
    • Use the Avro or Thrift protocol for cascading.
    • When the aggregation end contains multiple nodes, evenly distribute the clients to these nodes. Do not connect all the clients to a single node.

Did you find this page helpful?

Submit successfully!

Thank you for your feedback. Your feedback helps make our documentation better.

Failed to submit the feedback. Please try again later.

Which of the following issues have you encountered?

Please complete at least one feedback item.

Content most length 200 character

Content is empty.

OK Cancel