ALM-14037 DataNodes Outside the Cluster

Alarm Description

The NameNode checks whether there are DataNodes that are not managed in the cluster every 8 hours. This alarm is generated when there is a DataNode outside the cluster. This alarm is cleared when no DataNode is outside the cluster.

This alarm applies only to MRS 3.3.1 or later.

Alarm Attributes

Alarm ID	Alarm Severity	Auto Cleared
14037	Major	Yes

Alarm Parameters

Type	Parameter	Description
Location Information	Source	Specifies the cluster for which the alarm was generated.
	ServiceName	Specifies the service for which the alarm was generated.
	NameServiceName	Specifies the NameService for which the alarm was generated.
Additional Information	Trigger Condition	Specifies the alarm triggering condition, that is, the IP address and port of a DataNode outside the cluster is detected.

Impact on the System

Data may be lost.

Possible Causes

After a host is forcibly deleted, the host is powered on again, and the process is restarted.

Handling Procedure

Log in to FusionInsight Manager, click O&M, and choose Alarm > Alarms to view the alarm details. In the additional information area, check the IP address of the host for which the alarm is generated.
Stop the DataNode process on the host for which the alarm is reported.

If there are multiple IP addresses of the host, you can stop only one DataNode process at a time and stop the next DataNode process only after Number of Blocks to Be Replicated changes to 0.
1. Log in to the host for which the alarm is generated as the root user and change the permission on the hadoop directory in the installation directory ${BIGDATA_HOME}/FusionInsight_HD_*/install.
  chmod 000 ${BIGDATA_HOME}/FusionInsight_HD_8.1.0.1/install/FusionInsight-Hadoop-3.3.1
2. Run the following commands to obtain the PID of the DataNode process and stop it on the host:
  ps -ef | grep Dproc_datanode
  
  kill -15 PID
3. Choose Cluster > Services > HDFS. Check the Basic Information area in the Dashboard tab (or the NameService Summary area in the Dashboard tab of HDFS), and wait until the value of Blocks to be Replicated changes to 0.
Wait for 8 hours and check whether the alarm is cleared and whether the number of blocks to be replicated is 0.
- If yes, no further action is required.
- If no, go to 4.

Collect fault information.

On FusionInsight Manager, choose O&M. In the navigation pane on the left, choose Log > Download.
Expand the drop-down list next to the Service field. In the Services dialog box that is displayed, select HDFS for the target cluster.
Click the edit icon in the upper right corner, and set Start Date and End Date for log collection to 10 minutes ahead of and after the alarm generation time, respectively. Then, click Download.
Contact O&M engineers and provide the collected logs.

Parent topic: MRS Cluster Alarm Handling Reference

Previous topic: ALM-14036 NameNode Is In Safe Mode

Next topic: ALM-14038 Router Heap Memory Usage Exceeds the Threshold

Feedback

Was this page helpful?

Helpful Not helpful

Provide feedback

Thank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.

The system is busy. Please try again later.

Which of the following issues have you encountered?

Content is inconsistent with the product UI

Unclear descriptions

Lack of examples or code

Incorrect steps

Can't find what I need

Lack of best practices

Feedback (optional)

0/500

Select at least one type of issue, and enter your comments or suggestions.

Enter a maximum of 500 characters.

Submit Cancel

For any further questions, feel free to contact us through the chatbot.

Chatbot