Help Center/ MapReduce Service/ Troubleshooting/ Using HDFS/ CPU Usage of DataNodes Is Close to 100% Occasionally, Causing Node Loss
Updated on 2023-11-30 GMT+08:00

CPU Usage of DataNodes Is Close to 100% Occasionally, Causing Node Loss

Symptom

There is a possibility that the CPU usage of DataNodes is close to 100%. As a result, nodes may be lost (the SSH connection is slow or fails).

Figure 1 DataNode CPU usage close to 100%

Cause Analysis

  1. A lot of write failure logs exist on DataNodes.
    Figure 2 DataNode write failure log
  2. A large number of files are written in a short time, causing insufficient DataNode memory.
    Figure 3 Insufficient DataNode memory

Solution

  1. Check DataNode memory configuration and whether the remaining server memory is sufficient.
  2. Increase DataNode memory and restart the DataNode.