Help Center/ MapReduce Service/ User Guide (Kuala Lumpur Region)/ Troubleshooting/ Using HDFS/ HDFS Failed to Start Due to Insufficient Memory
Updated on 2022-12-14 GMT+08:00

HDFS Failed to Start Due to Insufficient Memory

Symptom

After the HDFS service is restarted, HDFS is in the Bad state, the NameNode instance status is abnormal, and the system cannot exit the security mode for a long time.

Cause Analysis

  1. In the NameNode run log (/var/log/Bigdata/hdfs/nn/hadoop-omm-namendoe-XXX.log), search for WARN. It is found that GC takes 63 seconds.
    2017-01-22 14:52:32,641 | WARN  | org.apache.hadoop.util.JvmPauseMonitor$Monitor@1b39fd82 | Detected pause in JVM or host machine (eg GC): pause of approximately 63750ms
    GC pool 'ParNew' had collection(s): count=1 time=0ms
    GC pool 'ConcurrentMarkSweep' had collection(s): count=1 time=63924ms | JvmPauseMonitor.java:189
  2. Analyze the NameNode log /var/log/Bigdata/hdfs/nn/hadoop-omm-namendoe-XXX.log. It is found that the NameNode is waiting for block reporting and the total number of blocks is too large. In the following example, the total number of blocks is 36.29 million.
    2017-01-22 14:52:32,641 | INFO  | IPC Server handler 8 on 25000 | STATE* Safe mode ON. 
    The reported blocks 29715437 needs additional 6542184 blocks to reach the threshold 0.9990 of total blocks 36293915.
  3. On Manager, check the GC_OPTS parameter of the NameNode:
    Figure 1 Checking the GC_OPTS parameter of the NameNode
  4. For details about the mapping between the NameNode memory configuration and data volume, see Table 1.
    Table 1 Mapping between NameNode memory configuration and data volume

    Number of File Objects

    Reference Value

    10,000,000

    -Xms6G -Xmx6G -XX:NewSize=512M -XX:MaxNewSize=512M

    20,000,000

    -Xms12G -Xmx12G -XX:NewSize=1G -XX:MaxNewSize=1G

    50,000,000

    -Xms32G -Xmx32G -XX:NewSize=2G -XX:MaxNewSize=3G

    100,000,000

    -Xms64G -Xmx64G -XX:NewSize=4G -XX:MaxNewSize=6G

    200,000,000

    -Xms96G -Xmx96G -XX:NewSize=8G -XX:MaxNewSize=9G

    300,000,000

    -Xms164G -Xmx164G -XX:NewSize=12G -XX:MaxNewSize=12G

Solution

  1. Modify the NameNode memory parameter based on the specifications. If the number of blocks is 36 million, change the parameter value to -Xms32G -Xmx32G -XX:NewSize=2G -XX:MaxNewSize=3G.
  2. Restart a NameNode and check that the NameNode can be started normally.
  3. Restart the other NameNode and check that the page status is restored.