ALM-24001 Flume Agent Exception
Alarm Description
The Flume agent monitoring module monitors the Flume agent status. This alarm is generated when the Flume agent process is faulty (checked every 5 seconds) or the Flume agent fails to start (an alarm is reported immediately).
This alarm is cleared when the Flume agent process recovers, Flume agent starts successfully, and the alarm handling is completed.
Alarm Attributes
Alarm ID |
Alarm Severity |
Auto Cleared |
---|---|---|
24001 |
Major |
Yes |
Alarm Parameters
Parameter |
Description |
---|---|
Source |
Specifies the cluster for which the alarm was generated. |
ServiceName |
Specifies the service for which the alarm was generated. |
AgentId |
Specifies the ID of the agent for which the alarm was generated. |
RoleName |
Specifies the role for which the alarm was generated. |
HostName |
Specifies the host for which the alarm was generated. |
Impact on the System
The Flume agent instance for which the alarm is generated cannot provide services properly, and the data transmission tasks of the instance are temporarily interrupted. Real-time data is lost during real-time data transmission.
Possible Causes
- The JAVA_HOME directory does not exist, or the Java permission is incorrect.
- The Flume agent directory permission is incorrect.
- The Flume agent fails to start.
Handling Procedure
Check whether the JAVA_HOME directory exists or whether the JAVA permission is correct.
- Log in to the host for which the alarm is generated as user root.
- Obtain the installation directory of the Flume client for which the alarm is generated. (The value of AgentId can be obtained from Location of the alarm.)
ps -ef|grep AgentId | grep -v grep | awk -F 'conf-file ' '{print $2}' | awk -F 'fusioninsight' '{print $1}'
- Run the su - Flume installation user command to switch to the Flume installation user and run the cd Flume client installation directory/fusioninsight-flume-1.9.0/conf/ command to go to the Flume configuration directory.
- Run the cat ENV_VARS | grep JAVA_HOME command.
- Check whether the JAVA_HOME directory exists. If both the command output in 4 and ll $JAVA_HOME/ are not empty, the JAVA_HOME directory exists.
- Specify a correct JAVA_HOME directory, for example, export JAVA_HOME=${BIGDATA_HOME}/common/runtime0/jdkVersion number.
- Run the $JAVA_HOME/bin/java -version command to check whether the Flume agent running user has the Java execution permission. If the Java version is displayed in the command output, the Java permission meets the requirement. Otherwise, the Java permission does not meet the requirement.
- Run the chmod 750 $JAVA_HOME/bin/java command to grant the Java execution permission to the Flume agent running user.
Check the directory permission of the Flume agent.
- Log in to the host for which the alarm is generated as user root.
- Run the following command to switch to the Flume agent installation directory:
cd Flume client installation directory/fusioninsight-flume-1.9.0/conf/
- Run the ls -al * -R command to check whether any file owner is the user running the Flume agent.
- If yes, go to 12.
- If no, run the chown command to change the file owner to the user who runs the Flume agent.
Check the Flume agent configuration.
- Run the cat properties.properties | grep spooldir and cat properties.properties | grep TAILDIR commands to check whether the Flume source type is spoolDir or tailDir. If any command output is displayed, the Flume source type is spoolDir or tailDir.
- Check whether the data monitoring directory exists.
- Specify a correct data monitoring directory.
- Check whether the Flume agent user has the read, write, and execute permissions on the monitoring directory specified in 13.
- Run the chmod 777 Flume monitoring directory command to grant the Flume agent running user the read, write, and execute permissions on the monitoring directory specified in 13.
- Check whether the components connected to the Flume sink are in safe mode.
- If yes, go to 18.
- If no, go to 23.
If the sinks in the properties.properties configuration file are the HDFS sink and HBase sink, and the configuration file contains a keytab file, the components connected to the Flume sink are in safe mode.
If the sink in the properties.properties configuration file is the Kafka sink and *.security.protocol is set to SASL_PLAINTEXT or SASL_SSL, Kafka connected to the Flume sink is in safe mode.
- Run the ll ketab path command to check whether the keytab authentication path specified by the *.kerberosKeytab parameter in the configuration file exists.
- Change the value of kerberosKeytab in 18 to the custom keytab path and go to 21.
- Go to 18 and check whether the Flume agent running user has the permission to access the keytab authentication file. If the keytab path is returned, the user has the permission. Otherwise, the user does not have the permission.
- Run the chmod 755 ketab file command to grant the read permission on the keytab file specified in 19, and restart the Flume process.
- Check whether the alarm is cleared.
- If yes, no further action is required.
- If no, go to 23.
Collect fault information.
- On FusionInsight Manager, choose O&M. In the navigation pane on the left, choose Log > Download.
- Expand the Service drop-down list, and select Flume for the target cluster.
- Click the edit icon in the upper right corner, and set Start Date and End Date for log collection to 1 hour ahead of and after the alarm generation time, respectively. Then, click Download.
- Contact O&M personnel and provide the collected logs.
Alarm Clearance
This alarm is automatically cleared after the fault is rectified.
Related Information
None
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot