Help Center> MapReduce Service> User Guide> Alarm Reference (Applicable to MRS 3.x)> ALM-24005 Exception Occurs When Flume Transmits Data
Updated on 2024-03-01 GMT+08:00

ALM-24005 Exception Occurs When Flume Transmits Data

Alarm Description

The alarm module monitors the capacity status of Flume Channel. The alarm is generated immediately when the duration that Channel is fully occupied exceeds the threshold or the number of times that Source fails to send data to Channel exceeds the threshold.

The default threshold is 10. You can change the threshold by modifying the channelfullcount parameter of the related channel in the properties.properties configuration file in the conf directory.

The alarm is cleared when the space of Flume Channel is released and the alarm handling is complete.

Alarm Attributes

Alarm ID

Alarm Severity

Auto Cleared

24005

Major

Yes

Alarm Parameters

Parameter

Description

Source

Specifies the cluster for which the alarm was generated.

ServiceName

Specifies the service for which the alarm was generated.

HostName

Specifies the host for which the alarm was generated.

AgentId

Specifies the ID of the agent for which the alarm was generated.

ComponentType

Specifies the type of the component for which the alarm was generated.

ComponentName

Specifies the name of the component for which the alarm was generated.

Impact on the System

If the disk usage of Flume Channel increases continuously, the time required for importing data to a specified destination prolongs. When the disk usage of Flume Channel reaches 100%, the Flume agent process pauses.

Possible Causes

  • Flume Sink is faulty, so the data cannot be sent.
  • The network is faulty, so the data cannot be sent.

Handling Procedure

Check whether Flume Sink is faulty.

  1. Open the properties.properties configuration file on the local PC, search for type = hdfs in the file, and check whether the Flume sink type is HDFS.

    • If yes, go to 2.
    • If no, go to 3.

  2. On FusionInsight Manager, check whether HDFS Service Unavailable alarm is generated in the alarm list and whether the HDFS service is stopped in the service list.

    • If the alarm is reported, clear it according to the handling suggestions of ALM-14000 HDFS Service Unavailable; if the HDFS service is stopped, start it. Then, go to 7.
    • If no, go to 7.

  3. Open the properties.properties configuration file on the local PC, search for type = hbase in the file, and check whether the Flume sink type is HBase.

    • If yes, go to 4.
    • If no, go to 5.

  4. On FusionInsight Manager, check whether HBase Service Unavailable alarm is generated in the alarm list and whether the HBase service is stopped in the service list.

    • If the alarm is reported, clear it according to the handling suggestions of ALM-19000 HBase Service Unavailable; if the HBase service is stopped, start it. Then, go to 7.
    • If no, go to 7.

  5. Open the properties.properties configuration file on the local PC, search for org.apache.flume.sink.kafka.KafkaSink in the file, and check whether the Flume sink type is Kafka.

    • If yes, go to 6.
    • If no, go to 9.

  6. On FusionInsight Manager, check whether Kafka Service Unavailable alarm is generated in the alarm list and whether the Kafka service is stopped in the service list.

    • If the alarm is reported, clear it according to the handling suggestions of ALM-38000 Kafka Service Unavailable; if the Kafka service is stopped, start it. Then, go to 7.
    • If no, go to 7.

  7. On FusionInsight Manager, choose Cluster > Name of the desired cluster > Services > Flume > Instance.
  8. Go to the Flume instance page of the faulty node to check whether the indicator Sink Speed Metrics is 0.

    • If yes, go to 13.
    • If no, go to 9.

Check the network connection between the faulty node and the node that corresponds to the Flume Sink IP address.

  1. Open the properties.properties configuration file on the local PC, search for type = avro in the file, and check whether the Flume sink type is Avro.

    • If yes, go to 10.
    • If no, go to 13.

  2. Log in to the faulty node as user root, and run the ping IP address of the Flume sink command to check whether the peer host can be pinged successfully.

    • If yes, go to 13.
    • If no, go to 11.

  3. Contact the network administrator to restore the network.
  4. In the alarm list, check whether the alarm is cleared after a period.

    • If yes, no further action is required.
    • If no, go to 13.

Collect the fault information.

  1. On FusionInsight Manager, choose O&M. In the navigation pane on the left, choose Log > Download.
  2. Expand the Service drop-down list, and select Flume for the target cluster.
  3. Click the edit icon in the upper right corner, and set Start Date and End Date for log collection to 1 hour ahead of and after the alarm generation time, respectively. Then, click Download.
  4. Contact O&M personnel and provide the collected logs.

Alarm Clearance

This alarm is automatically cleared after the fault is rectified.

Related Information

None