Updated on 2024-04-11 GMT+08:00

ALM-24004 Flume Fails to Read Data (For MRS 2.x or Earlier)

Description

The alarm module monitors the Flume source status. This alarm is generated if the duration that Flume source fails to read data exceeds the threshold.

Users can modify the threshold as required.

This alarm is cleared if the source reads data successfully.

Attribute

Alarm ID

Alarm Severity

Auto Clear

24004

Major

Yes

Parameters

Parameter

Description

ServiceName

Specifies the service for which the alarm is generated.

HostName

Specifies the host for which the alarm is generated.

ComponentType

Specifies the component type for which the alarm is generated.

ComponentName

Specifies the component name for which the alarm is generated.

Impact on the System

Data collection is stopped.

Possible Causes

  • The Flume source is faulty.
  • The network is faulty.

Procedure

  1. Check whether the Flume source is normal.

    1. Check whether the Flume source is the spoolDir type.
      • If yes, go to 1.b.
      • If no, go to 1.c.
    2. Query the spoolDir directory and check whether all files have been sent.
      • If yes, no further action is required.
      • If no, go to 1.e.
    3. Check whether the Flume source is the Kafka type.
      • If yes, go to 1.d.
      • If no, go to 1.e.
    4. Log in to the Kafka client and run the following commands to check whether all topic data configured for the Kafka source has been consumed.

      cd /opt/client/Kafka/kafka/bin

      ./kafka-consumer-groups.sh --bootstrap-server Kafka cluster IP address:21007 --new-consumer --describe --group example-group1 --command-config

      ../config/consumer.properties
      • If yes, no further action is required.
      • If no, go to 1.e.
    5. Go to the cluster details page and click Components.
    6. Choose Flume > Instances.
    7. Click the Flume instance of the faulty node and check whether the value of the Source Speed Metrics is 0.
      • If yes, go to 2.a.
      • If no, no further action is required.

  2. Check the status of the network between the Flume source and faulty node.

    1. Check whether the Flume source is the avro type.
      • If yes, go to 2.c.
      • If no, go to 3.
    2. Log in to the host where the faulty node resides. Run the following command to switch to user root:

      sudo su - root

    3. Run the ping Flume source IP address command to check whether the Flume source can be pinged.
      • If yes, go to 3.
      • If no, go to 2.d.
    4. Contact the network administrator to repair the network.
    5. Wait for a while and check whether the alarm is cleared.
      • If yes, no further action is required.
      • If no, go to 3.

  3. Collect fault information.

    1. On MRS Manager, choose System > Export Log.
    2. Contact the O&M engineers and send the collected logs.

Related Information

N/A