Failed to Decommission a DataNode Due to HDFS Block Loss
Symptom
A DataNode fails to be decommissioned.
Cause Analysis
- Check the decommissioning log. It shows that there are 1564 blocks but one block cannot be backed up.
- Log in to the master node of the cluster, go to the HDFS client, run the hdfs fsck / command to check the damaged block, and record the file path.
Example: /tmp/hive-scratch/omm/_tez_session_dir/xxx-resources/xxx.jar
The HDFS status is CORRUPT.
Procedure
- Check whether the damaged block can be deleted.
- If yes, go to 2.
- If no, contact technical support.
- Run the following commands to log in to the HDFS client:
cd HDFS client installation directory
source bigdata_env
kinit Service user
- Run the following command to delete the damaged block:
hdfs dfs -rm -skipTrash /tmp/hive-scratch/omm/_tez_session_dir/xxx-resources/xxx.jar
- Run the following command to check whether the HDFS status is restored to HEALTHY:
hdfs fsck /
- Decommission the DataNode again.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.