Help Center > > User Guide> FusionInsight Manager Operation Guide> Alarm Reference (Applicable to MRS 3.x)> ALM-24005 Exception Occurs When Flume Transmits Data

ALM-24005 Exception Occurs When Flume Transmits Data

Updated at: Mar 25, 2021 GMT+08:00

Description

The alarm module monitors the capacity status of Flume Channel. The alarm is generated immediately when the duration that Channel is fully occupied exceeds the threshold or the number of times that Source fails to send data to Channel exceeds the threshold.

The default threshold is 10. You can change the threshold by modifying the properties.properties file.

The alarm is cleared when the space of Flume Channel is released and the alarm handling is complete.

Attribute

Alarm ID

Alarm Severity

Automatically Cleared

24005

Major

Yes

Parameters

Name

Meaning

Source

Specifies the cluster for which the alarm is generated.

ServiceName

Specifies the service name for which the alarm is generated.

HostName

Specifies the host name for which the alarm is generated.

AgentName

Specifies the agent name for which the alarm is generated.

ComponentType

Specifies the component type for which the alarm is generated.

ComponentName

Specifies the component name for which the alarm is generated.

Impact on the System

If the disk usage of Flume Channel increases continuously, the time required for importing data to a specified destination prolongs. When the disk usage of Flume Channel reaches 100%, the Flume Agent process pauses.

Possible Causes

  • Flume Sink is faulty, so the data cannot be sent.
  • The network is faulty, so the data cannot be sent.

Procedure

Check whether Flume Sink is faulty.

  1. Check whether the type of Flume Sink is HDFS.

    • If yes, go to 2.
    • If no, go to 3.

  2. On FusionInsight Manager, check whether HDFS Service Unavailable alarm is generated in the alarm list and whether the HDFS service is stopped in the service list.

    • If yes, rectify the fault by following the steps provided in ALM-14000 HDFS Service Unavailable. If the HDFS has been stopped, start the HDFS service and go to 7.
    • If no, go to 7.

  3. Check whether the type of Flume Sink is HBase.

    • If yes, go to 4.
    • If no, go to 5.

  4. On FusionInsight Manager, check whether HBase Service Unavailable alarm is generated in the alarm list and whether the HBase service is stopped in the service list.

    • If yes, rectify the fault by following the steps provided in ALM-19000 HBase Service Unavailable. If the HBase service has been stopped, start the HBase service and go to 7.
    • If no, go to 7.

  5. Check whether the type of Flume Sink is Kafka.

    • If yes, go to 6.
    • If no, go to 9.

  6. On FusionInsight Manager, check whether Kafka Service Unavailable alarm is generated in the alarm list and whether the Kafka service is stopped in the service list.

    • If yes, rectify the fault by following the steps provided in ALM-38000 Kafka Service Unavailable. If the Kafka service has been stopped, start the Kafka service and go to 7.
    • If no, go to 7.

  7. On the FusionInsight Manager portal, choose Cluster > Name of the desired cluster > Services > Flume > Instance.
  8. Go to the Flume instance page of the faulty node to check whether the indicator Sink Speed Metrics is 0.

    • If yes, go to 13.
    • If no, no further operation is required.

Check the network connection between the faulty node and the node that corresponds to the Flume Sink IP address.

  1. Confirm whether the type of Flume Sink is avro.

    • If yes, go to 10.
    • If no, go to 13.

  2. Log in to the faulty node as user root, and run the ping IP address of Flume Source command to check whether the peer host can be pinged successfully.

    • If yes, go to 13.
    • If no, go to 11.

  3. Contact the network administrator to restore the network.
  4. In the alarm list, check whether the alarm is cleared after a period.

    • If yes, no further action is required.
    • If no, go to 13.

Collect fault information.

  1. On the FusionInsight Manager portal, choose O&M > Log > Download.
  2. Select Flume in the required cluster from the Service drop-down list box.
  3. Click in the upper right corner, and set Start Date and End Date for log collection to 1 hour ahead of and after the alarm generation time, respectively. Then, click Download.
  4. Contact the O&M personnel and send the collected fault logs.

Alarm Clearing

After the fault is rectified, the system automatically clears this alarm.

Related Information

None

Did you find this page helpful?

Submit successfully!

Thank you for your feedback. Your feedback helps make our documentation better.

Failed to submit the feedback. Please try again later.

Which of the following issues have you encountered?







Please complete at least one feedback item.

Content most length 200 character

Content is empty.

OK Cancel