Updated on 2022-12-08 GMT+08:00

Locating Common Balance Problems

Problem 1: Lack of Permission to Execute the balance Task (Access denied).

Problem details: After the start-balancer.sh command is executed, the " hadoop-root-balancer-hostname.out" log displays "Access denied for user test1. Superuser privilege is required."

 
cat /opt/client/HDFS/hadoop/logs/hadoop-root-balancer-host2.out
Time Stamp               Iteration#  Bytes Already Moved  Bytes Left To Move  Bytes Being Moved
INFO: Watching file:/opt/client/HDFS/hadoop/etc/hadoop/log4j.properties for changes with interval : 60000
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.AccessControlException): Access denied for user test1. 
Superuser privilege is required
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkSuperuserPrivilege(FSPermissionChecker.java:122)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkSuperuserPrivilege(FSNamesystem.java:5916)

Cause analysis:

The administrator account is required for executing the balance task.

Solution

  • Secure version

    Perform authentication for user hdfs or a user in the supergroup group and then execute the balance task.

  • General version

    Run the su - hdfs command on the client before running the balance command on HDFS.

Problem 2: The balance command fails to be executed, and the /system/balancer.id file is abnormal.

Problem details:

A user starts a balance process on the HDFS client. After the process is stopped unexpectedly, the user performs the balance operation again. The operation fails.

org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.protocol.RecoveryInProgressException): Failed to APPEND_FILE /system/balancer.id for DFSClient because lease recovery is in progress. Try again later.

Cause analysis:

Generally, after the balance operation is complete in HDFS, the /system/balancer.id file is automatically released and the balance operation can be performed again.

In the preceding scenario, the first balance operation is stopped abnormally. Therefore, when the balance operation is performed for the second time, the /system/balancer.id file still exists. As a result, the append /system/balancer.id operation is triggered and the balance operation fails.

Solution

Method 1: After the hard lease period exceeds one hour, release the lease on the original client and perform the balance operation again.

Method 2: Delete the /system/balancer.id file from HDFS and perform the balance operation again.