Help Center/ MapReduce Service/ Component Operation Guide (LTS)/ Using HDFS/ FAQ/ Why Does an Error Occur During DataNode Capacity Calculation When Multiple data.dir Are Configured in a Partition?
Updated on 2022-11-18 GMT+08:00

Why Does an Error Occur During DataNode Capacity Calculation When Multiple data.dir Are Configured in a Partition?

Question

DataNode capacity count incorrect if several data.dir configured in one disk partition.

Answer

Currently calculation will be done based on the disk like df command in linux. Ideally user should not configure multiple directories for same disk which will be huge impact on performance where all data will go to one disk.

Hence it is always better to configure like below:

For example:

if the machine is having disks like following:

host-4:~ # df -h
Filesystem      Size    Used    Avail    Use%   Mounted on
/dev/sda1       352G   11G      324G    4%      /
udev              190G    252K   190G    1%      /dev
tmpfs             190G   72K      190G    1%     /dev/shm
/dev/sdb1       2.7T    74G      2.5T      3%    /data1
/dev/sdc1       2.7T    75G      2.5T      3%    /data2
/dev/sdd1       2.7T    73G      2.5T      3%    /data

Suggested way of configuration:

<property>
<name>dfs.datanode.data.dir</name>
<value>/data1/datadir/,/data2/datadir,/data3/datadir</value>
</property>

Following is not recommended:

<property>
<name>dfs.datanode.data.dir</name>
<value>/data1/datadir1/,/data2/datadir1,/data3/datadir1,/data1/datadir2/data1/datadir3,/data2/datadir2,/data2/datadir3,/data3/datadir2,/data3/datadir3</value>
</property>