Help Center/ MapReduce Service/ User Guide/ Alarm Reference (Applicable to MRS 3.x)/ ALM-24004 Exception Occurs When Flume Reads Data
Updated on 2024-04-11 GMT+08:00

ALM-24004 Exception Occurs When Flume Reads Data

Alarm Description

The alarm module monitors the Flume source status. This alarm is generated when the duration in which the source cannot read data exceeds the threshold.

The default threshold is 0, indicating that this function is disabled. You can change the threshold by modifying the properties.properties file in the conf directory. Specifically, modify the NoDatatime parameter of required the source.

The alarm is cleared when the source reads the data and the alarm handling is complete.

Alarm Attributes

Alarm ID

Alarm Severity

Auto Cleared

24004

Major

Yes

Alarm Parameters

Parameter

Description

Source

Specifies the cluster for which the alarm was generated.

ServiceName

Specifies the service for which the alarm was generated.

HostName

Specifies the host for which the alarm was generated.

AgentId

Specifies the ID of the agent for which the alarm was generated.

ComponentType

Specifies the type of the component for which the alarm was generated.

ComponentName

Specifies the name of the component for which the alarm was generated.

Impact on the System

If data is found in the data source but the Flume source continuously fails to read data, the collection is stopped.

Possible Causes

  • The Flume source is faulty, so data cannot be sent.
  • The network is faulty, so the data cannot be sent.

Handling Procedure

Check whether the Flume source is faulty.

  1. Open the properties.properties configuration file on the local PC, search for keyword type = spooldir in the file, and check whether the Flume source type is spoolDir.

    • If yes, go to 2.
    • If no, go to 3.

  2. View the spoolDir monitoring directory to check whether all files are already transferred.

    • If yes, no further action is required.
    • If no, go to 5.

      The monitoring directory of spoolDir is specified by the .spoolDir parameter in the properties.properties configuration file. If all files in the monitoring directory have been transferred, the file name extension of all files in the monitoring directory is .COMPLETED.

  3. Open the properties.properties configuration file on the local PC, search for org.apache.flume.source.kafka.KafkaSource in the file, and check whether the Flume source type is Kafka.

    • If yes, go to 4.
    • If no, go to 7.

  4. Check whether the topic data configured by the Kafka source has been used up.

    • If yes, no further action is required.
    • If no, go to 5.

  5. On FusionInsight Manager, choose Cluster, click the name of the desired cluster, and choose Services > Flume > Instances.
  6. Go to the Flume instance page of the faulty node to check whether the Source Speed Metrics in the alarm is 0.

    • If yes, go to 11.
    • If no, go to 7.

Check the network connectivity between the node with the IP address configured for the Flume source and the faulty node.

  1. Open the properties.properties configuration file on the local PC, search for type = avro in the file, and check whether the Flume source type is Avro.

    • If yes, go to 8.
    • If no, go to 11.

  2. Log in to the faulty node as user root, and run the ping IP address of the Flume source command to check whether the peer host can be pinged successfully.

    • If yes, go to 11.
    • If no, go to 9.

  3. Contact the network administrator to restore the network.
  4. Wait for a while and check whether the alarm is cleared in the alarm list.

    • If yes, no further action is required.
    • If no, go to 11.

Collect fault information.

  1. On FusionInsight Manager, choose O&M. In the navigation pane on the left, choose Log > Download.
  2. Expand the Service drop-down list, and select Flume for the target cluster.
  3. Click the edit icon in the upper right corner, and set Start Date and End Date for log collection to 1 hour ahead of and after the alarm generation time, respectively. Then, click Download.
  4. Contact O&M personnel and provide the collected logs.

Alarm Clearance

This alarm is automatically cleared after the fault is rectified.

Related Information

None