Help Center/ MapReduce Service/ Troubleshooting/ Using HDFS/ Manually Performing Checkpoints When a NameNode Is Faulty for a Long Time
Updated on 2025-08-19 GMT+08:00

Manually Performing Checkpoints When a NameNode Is Faulty for a Long Time

Symptom

If the standby NameNode is faulty for a long time, a large amount of edit logs will be accumulated. In this case, if the HDFS or active NameNode is restarted, the active NameNode reads a large amount of uncombined edit logs. As a result, the HDFS or active NameNode takes a long time to restart and even fails to restart.

Cause Analysis

The standby NameNode periodically combines editlog files and generates the fsimage file. This process is called checkpoint. After the fsimage file is generated, the standby NameNode transfers it to the active NameNode.

As the standby NameNode periodically combines editlog files, it cannot combine them when it becomes abnormal. As a result, the active NameNode needs to load many editlog files during its next startup, which occupies much memory and takes a long time.

The period of metadata combination is determined by the following parameters. If the NameNode runs for 30 minutes or one million counts of operations are performed on HDFS, the checkpoint is implemented.

  • dfs.namenode.checkpoint.period: specifies the checkpoint period. The default value is 1800s.
  • dfs.namenode.checkpoint.txns: specifies the times of operations for triggering the checkpoint execution. The default value is 1000000.

Procedure

Before restarting the HDFS or active NameNode, perform checkpoint manually to merge metadata of the active NameNode.

  1. Stop workloads.
  2. Obtain the hostname of the active NameNode.
  3. Run the following commands on the client:

    source /opt/client/bigdata_env

    kinit Component user

    Note: Replace /opt/client with the actual installation path of the client.

  4. Run the following command to enable the safe mode for the active NameNode (replace linux22 with the hostname of the active NameNode):

    hdfs dfsadmin -fs Hostname of the active NameNode:25000 -safemode enter

  5. Run the following command to merge edit logs on the active NameNode:

    hdfs dfsadmin -fs Hostname of the active NameNode:25000 -saveNamespace

  6. Run the following command to disable the safe mode on the active NameNode:

    hdfs dfsadmin -fs Hostname of the active NameNode:25000 -safemode leave

  7. Check whether the merging is complete.

    cd /srv/BigData/namenode/current

    Check whether the time of the first generated fsimage is the current time. If yes, the combination is complete.