Failed to Decommission a DataNode Due to HDFS Block Loss
Symptom
A DataNode fails to be decommissioned.
Cause Analysis
- Check the decommissioning log. It shows that there are 1564 blocks but one block cannot be backed up.
- Log in to the master node of the cluster, go to the HDFS client, run the hdfs fsck / command to check the damaged block, and record the file path.
Example: /tmp/hive-scratch/omm/_tez_session_dir/xxx-resources/xxx.jar
The HDFS status is CORRUPT.
Procedure
- Check whether the damaged block can be deleted.
- If yes, go to 2.
- If no, contact technical support.
- Run the following commands to log in to the HDFS client:
cd HDFS client installation directory
source bigdata_env
kinit Service user
- Run the following command to delete the damaged block:
hdfs dfs -rm -skipTrash /tmp/hive-scratch/omm/_tez_session_dir/xxx-resources/xxx.jar
- Run the following command to check whether the HDFS status is restored to HEALTHY:
hdfs fsck /
- Decommission the DataNode again.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot