ALM-14037 DataNodes Outside the Cluster
Alarm Description
The NameNode checks whether there are DataNodes that are not managed in the cluster every 8 hours. This alarm is generated when there is a DataNode outside the cluster. This alarm is cleared when no DataNode is outside the cluster.
This alarm applies only to MRS 3.3.1 or later.
Alarm Attributes
Alarm ID |
Alarm Severity |
Auto Cleared |
---|---|---|
14037 |
Major |
Yes |
Alarm Parameters
Type |
Parameter |
Description |
---|---|---|
Location Information |
Source |
Specifies the cluster for which the alarm was generated. |
ServiceName |
Specifies the service for which the alarm was generated. |
|
NameServiceName |
Specifies the NameService for which the alarm was generated. |
|
Additional Information |
Trigger Condition |
Specifies the alarm triggering condition, that is, the IP address and port of a DataNode outside the cluster is detected. |
Impact on the System
Data may be lost.
Possible Causes
After a host is forcibly deleted, the host is powered on again, and the process is restarted.
Handling Procedure
- Log in to FusionInsight Manager, click O&M, and choose Alarm > Alarms to view the alarm details. In the additional information area, check the IP address of the host for which the alarm is generated.
- Stop the DataNode process on the host for which the alarm is reported.
If there are multiple IP addresses of the host, you can stop only one DataNode process at a time and stop the next DataNode process only after Number of Blocks to Be Replicated changes to 0.
- Log in to the host for which the alarm is generated as the root user and change the permission on the hadoop directory in the installation directory ${BIGDATA_HOME}/FusionInsight_HD_*/install.
chmod 000 ${BIGDATA_HOME}/FusionInsight_HD_8.1.0.1/install/FusionInsight-Hadoop-3.3.1
- Run the following commands to obtain the PID of the DataNode process and stop it on the host:
kill -15 PID
- Choose Cluster > Services > HDFS. Check the Basic Information area in the Dashboard tab (or the NameService Summary area in the Dashboard tab of HDFS), and wait until the value of Blocks to be Replicated changes to 0.
- Log in to the host for which the alarm is generated as the root user and change the permission on the hadoop directory in the installation directory ${BIGDATA_HOME}/FusionInsight_HD_*/install.
- Wait for 8 hours and check whether the alarm is cleared and whether the number of blocks to be replicated is 0.
- If yes, no further action is required.
- If no, go to 4.
Collect fault information.
- On FusionInsight Manager, choose O&M. In the navigation pane on the left, choose Log > Download.
- Expand the drop-down list next to the Service field. In the Services dialog box that is displayed, select HDFS for the target cluster.
- Click the edit icon in the upper right corner, and set Start Date and End Date for log collection to 10 minutes ahead of and after the alarm generation time, respectively. Then, click Download.
- Contact O&M engineers and provide the collected logs.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot