Help Center/ MapReduce Service/ Component Operation Guide (LTS)/ Using HDFS/ HDFS Troubleshooting/ Why Does DataNode Fail to Start When the Number of Disks Specified by dfs.datanode.data.dir Equals dfs.datanode.failed.volumes.tolerated?
Updated on 2024-10-09 GMT+08:00

Why Does DataNode Fail to Start When the Number of Disks Specified by dfs.datanode.data.dir Equals dfs.datanode.failed.volumes.tolerated?

Question

If the number of disks specified by dfs.datanode.data.dir is equal to the value of dfs.datanode.failed.volumes.tolerated, DataNode startup will fail.

Answer

By default, the failure of a single disk will cause the HDFS DataNode process to shut down, which results in the NameNode scheduling additional replicas for each block that is present on the DataNode. This causes needless replications of blocks that reside on disks that have not failed.

To prevent this, you can configure DataNodes to tolerate the failure of dfs.data.dir directories; Log in to FusionInsight Manager, choose Cluster > Services > HDFS > Configuration > All Configurations, and search for dfs.datanode.failed.volumes.tolerated. For example, if the value for this parameter is 3, the DataNode will only shut down after four or more data directories have failed. This value is respected on DataNode startup.

When we are configuring tolerate volumes which should be always less than the configured volumes or else we can keep this as -1 which is equal to n-1 (where n is number of disks) then DataNode will not be shut down.