Help Center/ MapReduce Service/ Component Operation Guide (LTS)/ Using HDFS/ HDFS Troubleshooting/ The Standby NameNode Fails to Be Started Because It Is Not Started for a Long Time
Updated on 2024-12-13 GMT+08:00

The Standby NameNode Fails to Be Started Because It Is Not Started for a Long Time

Symptom

The standby NameNode is not started for a long time. After the edits file is automatically deleted due to the aging policy, this file cannot be found when the NameNode is restarted. As a result, an error is reported.

There appears to be a gap in the edit log. We expected txid XXX, but got txid XXX.

Solution

  1. Go to the All Configurations page of HDFS by referring to Modifying Cluster Service Configuration Parameters, search for dfs.namenode.name.dir and check the value to obtain the NameNode data directory, for example, /srv/BigData/namenode/current.
  2. On the HDFS service page, click the Instances tab to view and record the service IP addresses of the active and standby NameNodes.
  3. Log in to the faulty standby NameNode as user root and back up the fsimage file in the data directory obtained in 1. For example, back up the data to the /srv/BigData/namenode/current.bak directory.

    mv /srv/BigData/namenode/current/ /srv/BigData/namenode/current.bak

  4. Log in to the active NameNode as the root user and run the following command to copy the fsimage file to the standby NameNode:

    scp -rp /srv/BigData/namenode/current/ {IP address of the standby NameNode}:/srv/BigData/namenode/

    chown omm:wheel /srv/BigData/namenode/current -R

  5. Restart the standby NameNode and check whether restart is successful. If the operation fails, contact technical support.