Help Center/ MapReduce Service/ Troubleshooting/ Using HDFS/ The Standby NameNode Fails to Be Started Because It Is Not Started for a Long Time

Updated on 2025-08-19 GMT+08:00

View PDF

The Standby NameNode Fails to Be Started Because It Is Not Started for a Long Time

Symptom

The standby NameNode is not started for a long time. After the edits file is automatically deleted due to the aging policy, this file cannot be found when the NameNode is restarted. As a result, an error is reported.

There appears to be a gap in the edit log. We expected txid XXX, but got txid XXX.

Procedure

Log in to MRS Manager.
Choose Cluster > Service > HDFS > Configuration > All Configurations.
Search for dfs.namenode.name.dir and check the value to obtain the NameNode data directory, for example, /srv/BigData/namenode/current.

dfs.namenode.name.dir: Directory on the local filesystem the DFS name node stores the name table (fsimage). If this is a comma-delimited list of directories then the name table is replicated in all of the directories, for redundancy. The default value is "${BIGDATA_DATADIR}/namenode".
On the HDFS service page, click the Instances tab to view and record the service IP addresses of the active and standby NameNodes.
Log in to the faulty standby NameNode as user root and back up the fsimage file in the data directory obtained in 3. For example, back up the data to the /srv/BigData/namenode/current.bak directory.

mv /srv/BigData/namenode/current/ /srv/BigData/namenode/current.bak
Log in to the active NameNode as the root user and run the following command to copy the fsimage file to the standby NameNode:

scp -rp /srv/BigData/namenode/current/ {IP address of the standby NameNode}:/srv/BigData/namenode/

chown omm:wheel /srv/BigData/namenode/current -R
Restart the standby NameNode and check whether restart is successful. If the operation fails, contact technical support.