Handling Unbalanced DataNode Disk Usage on Nodes
Symptom
The disk usage of each DataNode on a node is uneven.
Example:
189-39-235-71:~ # df -h Filesystem Size Used Avail Use% Mounted on /dev/xvda 360G 92G 250G 28% / /dev/xvdb 700G 900G 200G 78% /srv/BigData/hadoop/data1 /dev/xvdc 700G 900G 200G 78% /srv/BigData/hadoop/data2 /dev/xvdd 700G 900G 200G 78% /srv/BigData/hadoop/data3 /dev/xvde 700G 900G 200G 78% /srv/BigData/hadoop/data4 /dev/xvdf 10G 900G 890G 2% /srv/BigData/hadoop/data5 189-39-235-71:~ #
Possible Causes
Some disks are faulty and are replaced with new ones. The new disk usage is low.
Disks are added. For example, the original four data disks are expanded to five disks.
Cause Analysis
There are two policies for writing data to Block disks on DataNodes: 1. Round Robin (default value) and 2. Preferentially writing data to the disk with the more available space.
Description of the dfs.datanode.fsdataset.volume.choosing.policy parameter
Possible values:
- Polling: org.apache.hadoop.hdfs.server.datanode.fsdataset.RoundRobinVolumeChoosingPolicy
- Preferentially writing data to the disk with more available space: org.apache.hadoop.hdfs.server.datanode.fsdataset.AvailableSpaceVolumeChoosingPolicy
Solution
Change the value of dfs.datanode.fsdataset.volume.choosing.policy to org.apache.hadoop.hdfs.server.datanode.fsdataset.AvailableSpaceVolumeChoosingPolicy, save the settings, and restart the affected services or instances.
- Data written to the DataNode will be preferentially written to the disk with more available disk space.
- The high usage of some disks can be relieved with the gradual deletion of aging data from the HDFS.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.