Help Center/ MapReduce Service/ Troubleshooting/ Using HDFS/ HDFS NameNode Failed to Start Due to Insufficient Memory
Updated on 2023-12-22 GMT+08:00

HDFS NameNode Failed to Start Due to Insufficient Memory

Symptom

Scenario 1: After the HDFS service is restarted, HDFS is in the Bad state, and the NameNode instance status is abnormal and cannot exit the safe mode for a long time.

Scenario 2: The NameNode fails to be started after the startup times out, and the native web UI cannot be opened.

Cause Analysis

  1. In the NameNode run log (/var/log/Bigdata/hdfs/nn/hadoop-omm-namenode-XXX.log), search for WARN. It is found that GC takes 63 seconds.
    2017-01-22 14:52:32,641 | WARN  | org.apache.hadoop.util.JvmPauseMonitor$Monitor@1b39fd82 | Detected pause in JVM or host machine (eg GC): pause of approximately 63750ms
    GC pool 'ParNew' had collection(s): count=1 time=0ms
    GC pool 'ConcurrentMarkSweep' had collection(s): count=1 time=63924ms | JvmPauseMonitor.java:189
  2. Analyze the NameNode log /var/log/Bigdata/hdfs/nn/hadoop-omm-namendoe-XXX.log. It is found that the NameNode is waiting for block reporting and the total number of blocks is too large. In the following example, the total number of blocks is 36.29 million.
    2017-01-22 14:52:32,641 | INFO  | IPC Server handler 8 on 25000 | STATE* Safe mode ON. 
    The reported blocks 29715437 needs additional 6542184 blocks to reach the threshold 0.9990 of total blocks 36293915.
  3. On Manager, check the GC_OPTS parameter of the NameNode:
    Figure 1 Checking the GC_OPTS parameter of the NameNode
  4. For details about the mapping between the NameNode memory configuration and data volume, see Table 1.
    Table 1 Mapping between NameNode memory configuration and data volume

    Number of File Objects

    Reference Value

    10,000,000

    -Xms6G -Xmx6G -XX:NewSize=512M -XX:MaxNewSize=512M

    20,000,000

    -Xms12G -Xmx12G -XX:NewSize=1G -XX:MaxNewSize=1G

    50,000,000

    -Xms32G -Xmx32G -XX:NewSize=2G -XX:MaxNewSize=3G

    100,000,000

    -Xms64G -Xmx64G -XX:NewSize=4G -XX:MaxNewSize=6G

    200,000,000

    -Xms96G -Xmx96G -XX:NewSize=8G -XX:MaxNewSize=9G

    300,000,000

    -Xms164G -Xmx164G -XX:NewSize=12G -XX:MaxNewSize=12G

Solution

  1. Modify the NameNode memory parameter based on the specifications. If the number of blocks is 36 million, change the parameter value to -Xms32G -Xmx32G -XX:NewSize=2G -XX:MaxNewSize=3G.
  2. Restart a NameNode and check that the NameNode can be started normally.
  3. Restart the other NameNode and check that the page status is restored.