Help Center/ MapReduce Service/ Component Operation Guide (Normal)/ Using HDFS/ HDFS Troubleshooting/ DataNode Fails to Be Started When the Number of Disks Defined in dfs.datanode.data.dir Equals the Value of dfs.datanode.failed.volumes.tolerated
Updated on 2024-10-08 GMT+08:00

DataNode Fails to Be Started When the Number of Disks Defined in dfs.datanode.data.dir Equals the Value of dfs.datanode.failed.volumes.tolerated

Question

When the number of disks defined in dfs.datanode.data.dir equals the value of dfs.datanode.failed.volumes.tolerated, DataNode fails to be started.

Answer

By default, if a single disk is faulty, the HDFS DataNode process is stopped. As a result, NameNode schedules extra copies for each block stored in DataNode, causing block replication on normal disks.

To prevent this problem, you can configure a DataNodes tolerance value for the dfs.data.dir fault. Log in to FusionInsight Manager, choose Cluster > Services > HDFS. On the displayed page, click Configurations > All Configurations, and search for dfs.datanode.failed.volumes.tolerated. For example, if this parameter is set to 3, DataNode startup fails only when four or more directories are faulty.

To prevent DataNode faults, the value of dfs.datanode.failed.volumes.tolerated must be less than the number of configured volumes. You can also set dfs.datanode.failed.volumes.tolerated to -1, which is equivalent to n-1 (n indicates the number of volumes). This way, DataNode will be started normally.