Help Center/ MapReduce Service/ Troubleshooting/ Using HDFS/ CPU Usage of a DataNode Reaches 100% Occasionally, Causing Node Loss (SSH Connection Is Slow or Fails)
Updated on 2023-09-05 GMT+08:00

CPU Usage of a DataNode Reaches 100% Occasionally, Causing Node Loss (SSH Connection Is Slow or Fails)

Symptom

The CPU usage of DataNodes is close to 100% occasionally, causing node loss.

Figure 1 DataNode CPU usage close to 100%

Cause Analysis

  1. A lot of write failure logs exist on DataNodes.
    Figure 2 DataNode write failure log
  2. A large number of files are written in a short time, causing insufficient DataNode memory.
    Figure 3 Insufficient DataNode memory

Solution

  1. Check DataNode memory configuration and whether the remaining server memory is sufficient.
  2. Increase DataNode memory and restart the DataNode.