Help Center/ MapReduce Service/ Troubleshooting/ Using HDFS/ Failed to Decommission a DataNode Due to HDFS Block Loss
Updated on 2022-12-09 GMT+08:00

Failed to Decommission a DataNode Due to HDFS Block Loss

Symptom

A DataNode fails to be decommissioned.

Cause Analysis

  1. Check the decommissioning log. It shows that there are 1564 blocks but one block cannot be backed up.

  2. Log in to the master node of the cluster, go to the HDFS client, run the hdfs fsck / command to check the damaged block, and record the file path.

    Example: /tmp/hive-scratch/omm/_tez_session_dir/xxx-resources/xxx.jar

    The HDFS status is CORRUPT.

Procedure

  1. Check whether the damaged block can be deleted.

    • If yes, go to 2.
    • If no, contact technical support.

  2. Run the following commands to log in to the HDFS client:

    cd HDFS client installation directory

    source bigdata_env

    kinit Service user

  3. Run the following command to delete the damaged block:

    hdfs dfs -rm -skipTrash /tmp/hive-scratch/omm/_tez_session_dir/xxx-resources/xxx.jar

  4. Run the following command to check whether the HDFS status is restored to HEALTHY:

    hdfs fsck /

  5. Decommission the DataNode again.