Help Center/ MapReduce Service/ Troubleshooting/ Using HDFS/ Uneven Data Distribution Due to the Client Installation on the DataNode
Updated on 2023-01-11 GMT+08:00

Uneven Data Distribution Due to the Client Installation on the DataNode

Symptom

Data is unevenly distributed on HDFS DataNodes. Disk usage of a node is high or even reaches 100% while disks on other nodes have sufficient idle space.

Cause Analysis

In the HDFS data replica mechanism, the first replica is stored to the local node where the client is stored. As a result, disks of the node run out while disks of other nodes have sufficient idle space.

Solution

  1. For the existing data unevenly distributed, run the following command to balance data:

    /opt/client/HDFS/hadoop/sbin/start-balancer.sh -threshold 10

    /opt/client indicates the actual client installation directory.

  2. For new data, install the client on the node without DataNode.