HDFS Failed to Start Due to Insufficient Memory
Symptom
After the HDFS service is restarted, HDFS is in the Bad state, the NameNode instance status is abnormal, and the system cannot exit the security mode for a long time.
Cause Analysis
- In the NameNode run log (/var/log/Bigdata/hdfs/nn/hadoop-omm-namendoe-XXX.log), search for WARN. It is found that GC takes 63 seconds.
2017-01-22 14:52:32,641 | WARN | org.apache.hadoop.util.JvmPauseMonitor$Monitor@1b39fd82 | Detected pause in JVM or host machine (eg GC): pause of approximately 63750ms GC pool 'ParNew' had collection(s): count=1 time=0ms GC pool 'ConcurrentMarkSweep' had collection(s): count=1 time=63924ms | JvmPauseMonitor.java:189
- Analyze the NameNode log /var/log/Bigdata/hdfs/nn/hadoop-omm-namendoe-XXX.log. It is found that the NameNode is waiting for block reporting and the total number of blocks is too large. In the following example, the total number of blocks is 36.29 million.
2017-01-22 14:52:32,641 | INFO | IPC Server handler 8 on 25000 | STATE* Safe mode ON. The reported blocks 29715437 needs additional 6542184 blocks to reach the threshold 0.9990 of total blocks 36293915.
- On Manager, check the GC_OPTS parameter of the NameNode:
Figure 1 Checking the GC_OPTS parameter of the NameNode
- For details about the mapping between the NameNode memory configuration and data volume, see Table 1.
Table 1 Mapping between NameNode memory configuration and data volume Number of File Objects
Reference Value
10,000,000
-Xms6G -Xmx6G -XX:NewSize=512M -XX:MaxNewSize=512M
20,000,000
-Xms12G -Xmx12G -XX:NewSize=1G -XX:MaxNewSize=1G
50,000,000
-Xms32G -Xmx32G -XX:NewSize=2G -XX:MaxNewSize=3G
100,000,000
-Xms64G -Xmx64G -XX:NewSize=4G -XX:MaxNewSize=6G
200,000,000
-Xms96G -Xmx96G -XX:NewSize=8G -XX:MaxNewSize=9G
300,000,000
-Xms164G -Xmx164G -XX:NewSize=12G -XX:MaxNewSize=12G
Solution
- Modify the NameNode memory parameter based on the specifications. If the number of blocks is 36 million, change the parameter value to -Xms32G -Xmx32G -XX:NewSize=2G -XX:MaxNewSize=3G.
- Restart a NameNode and check that the NameNode can be started normally.
- Restart the other NameNode and check that the page status is restored.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot