The Standby NameNode Fails to Be Started Because It Is Not Started for a Long Time
Symptom
The standby NameNode is not started for a long time. After the edits file is automatically deleted due to the aging policy, this file cannot be found when the NameNode is restarted. As a result, an error is reported.
There appears to be a gap in the edit log. We expected txid XXX, but got txid XXX.
Solution
- Go to the All Configurations page of HDFS by referring to Modifying Cluster Service Configuration Parameters, search for dfs.namenode.name.dir and check the value to obtain the NameNode data directory, for example, /srv/BigData/namenode/current.
- On the HDFS service page, click the Instances tab to view and record the service IP addresses of the active and standby NameNodes.
- Log in to the faulty standby NameNode as user root and back up the fsimage file in the data directory obtained in 1. For example, back up the data to the /srv/BigData/namenode/current.bak directory.
mv /srv/BigData/namenode/current/ /srv/BigData/namenode/current.bak
- Log in to the active NameNode as the root user and run the following command to copy the fsimage file to the standby NameNode:
scp -rp /srv/BigData/namenode/current/ {IP address of the standby NameNode}:/srv/BigData/namenode/
chown omm:wheel /srv/BigData/namenode/current -R
- Restart the standby NameNode and check whether restart is successful. If the operation fails, contact technical support.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot