Uneven Data Distribution Due to the Client Installation on the DataNode
Symptom
Data is unevenly distributed on HDFS DataNodes. Disk usage of a node is high or even reaches 100% while disks on other nodes have sufficient idle space.
Cause Analysis
In the HDFS data replica mechanism, the first replica is stored to the local node where the client is stored. As a result, disks of the node run out while disks of other nodes have sufficient idle space.
Solution
- For the existing data unevenly distributed, run the following command to balance data:
/opt/client/HDFS/hadoop/sbin/start-balancer.sh -threshold 10
/opt/client indicates the actual client installation directory.
- For new data, install the client on the node without DataNode.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.