DataNode Fails to Be Started When the Number of Disks Defined in dfs.datanode.data.dir Equals the Value of dfs.datanode.failed.volumes.tolerated
Question
When the number of disks defined in dfs.datanode.data.dir equals the value of dfs.datanode.failed.volumes.tolerated, DataNode fails to be started.
Answer
By default, if a single disk is faulty, the HDFS DataNode process is stopped. As a result, NameNode schedules extra copies for each block stored in DataNode, causing block replication on normal disks.
To prevent this problem, you can configure a DataNodes tolerance value for the dfs.data.dir fault. Log in to FusionInsight Manager, choose Cluster > Services > HDFS. On the displayed page, click Configurations > All Configurations, and search for dfs.datanode.failed.volumes.tolerated. For example, if this parameter is set to 3, DataNode startup fails only when four or more directories are faulty.
To prevent DataNode faults, the value of dfs.datanode.failed.volumes.tolerated must be less than the number of configured volumes. You can also set dfs.datanode.failed.volumes.tolerated to -1, which is equivalent to n-1 (n indicates the number of volumes). This way, DataNode will be started normally.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.