ALM-14012 HDFS Journalnode Data Is Not Synchronized (For MRS 2.x or Earlier)
Description
On the active NameNode, the system checks data synchronization on all JournalNodes in the cluster every 5 minutes. This alarm is generated when data on a JournalNode is not synchronized with that on other JournalNodes.
This alarm is cleared in 5 minutes after data on JournalNodes is synchronized.
Attribute
Alarm ID |
Alarm Severity |
Auto Clear |
---|---|---|
14012 |
Major |
Yes |
Parameters
Parameter |
Description |
---|---|
ServiceName |
Specifies the service for which the alarm is generated. |
RoleName |
Specifies the role for which the alarm is generated. |
IP |
Specifies the service IP address of the JournalNode instance for which the alarm is generated. |
Impact on the System
When a JournalNode is working incorrectly, data on the node is not synchronized with that on other JournalNodes. If data on more than half of JournalNodes is not synchronized, the NameNode cannot work correctly, making the HDFS service unavailable.
Possible Causes
- The JournalNode instance has not been started or has been stopped.
- The JournalNode instance is working incorrectly.
- The network of the JournalNode is unreachable.
Procedure
- Check whether the JournalNode instance has been started.
- On the MRS cluster details page, click Alarms. In the alarm list, click the alarm.
- In the Alarm Details area, check Location and obtain the IP address of the JournalNode for which the alarm is generated.
- Choose Components > HDFS > Instances. In the instance list, click the JournalNode for which the alarm is generated and check whether Operating Status of the node is Started.
- Select the JournalNode instance and choose More > Start Instance to start it.
- Wait 5 minutes and check whether the alarm is cleared.
- If yes, no further action is required.
- If no, go to 4.
- Check whether the JournalNode instance is working correctly.
- Check whether the network of the JournalNode is reachable.
- On the MRS cluster details page, choose Components > HDFS > Instances to check the service IP address of the active NameNode.
- Log in to the active NameNode.
- Run the ping command to check whether a timeout occurs or the network between the active NameNode and the JournalNode is unreachable.
ping service IP address of the JournalNode
- Contact O&M personnel to rectify the network fault. Wait 5 minutes and check whether the alarm is cleared.
- If yes, no further action is required.
- If no, go to 4.
- Collect fault information.
- On MRS Manager, choose .
- Contact the O&M engineers and send the collected logs.
Reference
None
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.