Failed to Calculate the Capacity of a DataNode when Multiple data.dir Directories Are Configured in a Disk Partition
Question
The capacity of a DataNode fails to calculate when multiple data.dir directories are configured in a disk partition.
Answer
Currently, the capacity is calculated based on disks, which is similar to the df command in Linux. Ideally, users do not configure multiple data.dir directories in a disk partition. Otherwise, all data will be written to the same disk, greatly deteriorating the performance.
You are advised to configure them as below.
For example, if a node contains the following disks:
host-4:~ # df -h Filesystem Size Used Avail Use% Mounted on /dev/sda1 352G 11G 324G 4% / udev 190G 252K 190G 1% /dev tmpfs 190G 72K 190G 1% /dev/shm /dev/sdb1 2.7T 74G 2.5T 3% /data1 /dev/sdc1 2.7T 75G 2.5T 3% /data2 /dev/sdd1 2.7T 73G 2.5T 3% /da
Recommended configuration:
<property> <name>dfs.datanode.data.dir</name> <value>/data1/datadir/,/data2/datadir,/data3/datadir</value> </property>
Unrecommended configuration:
<property> <name>dfs.datanode.data.dir</name> <value>/data1/datadir1/,/data2/datadir1,/data3/datadir1,/data1/datadir2,data1/datadir3,/data2/datadir2,/data2/datadir3,/data3/datadir2,/data3/datadir3</value> </property>
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.