Configuring HDFS DataNode Data Balancing

Scenario

In the HDFS cluster, unbalanced disk usage among DataNodes may occur, for example, when new DataNodes are added to the cluster. Unbalanced disk usage may result in multiple problems. For example, MapReduce applications cannot make full use of local computing advantages, network bandwidth usage between data nodes cannot be optimal, or node disks cannot be used. Therefore, the MRS cluster administrator needs to periodically check and maintain DataNode data balance.

HDFS provides a capacity balancing program Balancer. By running Balancer, you can balance the HDFS cluster and ensure that the difference between the disk usage of each DataNode and that of the HDFS cluster does not exceed the threshold. DataNode disk usage before and after balancing is shown in Figure 1 and Figure 2, respectively.

Figure 1 DataNode disk usage before balancing
Click to enlarge

Figure 2 DataNode disk usage after balancing
Click to enlarge

The time of the balancing operation is affected by the following two factors:

Total amount of data to be migrated:
The data volume of each DataNode must be greater than (Average usage - Threshold) x Average data volume and less than (Average usage + Threshold) x Average data volume. If the actual data volume is less than the minimum value or greater than the maximum value, imbalance occurs. The system sets the largest deviation volume on all DataNodes as the total data volume to be migrated.
Balancer migration is performed in sequence in iteration mode. The amount of data to be migrated in each iteration does not exceed 10 GB, and the usage of each iteration is recalculated.

Therefore, for a cluster, you can estimate the time consumed by each iteration (by observing the time consumed by each iteration recorded in balancer logs) and divide the total data volume by 10 GB to estimate the task execution time.

The balancer can be started or stopped at any time.

Notes and Constraints

This section applies to MRS 3.x or later.
The balance operation occupies network bandwidth resources of DataNodes. Perform the operation during maintenance based on service requirements.
The balance operation may affect the running services if the bandwidth traffic (the default bandwidth control is 20 MB/s) is reset or the data volume is increased.

Prerequisites

You have installed the HDFS client. For example, the installation path is /opt/client.

Configuring a Balancing Task

Log in to the node where the client is installed as a client installation user. Run the following command to switch to the client installation directory, for example, /opt/client:

If the cluster is in normal mode, run the su - omm command to switch to user omm.
```
cd /opt/client
```
Run the following command to configure environment variables:
```
source bigdata_env
```
If the cluster is in security mode, run the following command to authenticate the user hdfs. The default password of user hdfs is Hdfs@123.
```
kinit hdfs
```
Determine whether to adjust the bandwidth control.
- If yes, go to 5.
- If no, go to 6.
Run the following command to change the maximum bandwidth of Balancer, and then go to 6.
```
hdfs dfsadmin -setBalancerBandwidth <bandwidth in bytes per second>
```
<bandwidth in bytes per second> indicates the bandwidth limit, in bytes. For example, to set the bandwidth control to 20 MB/s (the corresponding value is 20971520), run the following command:
```
hdfs dfsadmin -setBalancerBandwidth 20971520
```
- The maximum bandwidth of Balance is 20 MB/s by default. It is adequate for scenarios where 10 GE networks are used in the current cluster with ongoing services. If the service idle time window is insufficient for balance maintenance, you can increase the value of this parameter to shorten the balance time, for example, to 209715200 (200 MB/s).
- You need to adjust the value of this parameter based on the actual networking. If the cluster is heavily loaded, change the value to 209715200 (200 MB/s). If the cluster is idle, change the value to 1073741824 (1 GB/s).
- If the bandwidth of the DataNode cannot reach the specified maximum bandwidth, you can increase the value of the HDFS parameter dfs.datanode.balance.max.concurrent.moves on FusionInsight Manager and restart the HDFS service.
  dfs.datanode.balance.max.concurrent.moves indicates the maximum number of threads allowed for load balancing on DataNodes. The value ranges from 5 to 1000.
Check whether the upper limit of the HDFS client memory needs to be adjusted.

If the HDFS client memory has a small upper limit, the error message "OutOfMemoryError" may be displayed during data balancing.
- If yes, perform the following operations to adjust the memory upper limit and then go to 7.
  1. Run the following command to check the available memory size:
```
free -h
```
    The command output is as follows:
```
              total        used        free      shared  buff/cache   available
Mem:           56Gi        36Gi       3.2Gi       6.2Gi        17Gi        10Gi
Swap:            0B          0B          0B
```
  2. Run the following command to modify the file:
```
vim Client installation path/HDFS/component_env
```
    Adjust the memory size based on site requirements. For example, change the maximum heap memory size to 1 GB.
```
CLIENT_GC_OPTS="-Xmx1G"
```
  3. Then run the following command to apply the modifications.
```
source Client installation path/bigdata_env
```
- If no, go to 7.
Run the following command to start the balance task:
```
bash /opt/client/HDFS/hadoop/sbin/start-balancer.sh -threshold <threshold of balancer>
```
-threshold specifies the deviation value of the DataNode disk usage, which is used for determining whether the HDFS data is balanced. When the difference between the disk usage of each DataNode and the average disk usage of the entire HDFS cluster is less than this threshold, the system considers that the HDFS cluster has been balanced and ends the balance task.

For example, to set deviation rate to 5%, run the following command:
```
bash /opt/client/HDFS/hadoop/sbin/start-balancer.sh -threshold 5
```
- /opt/client indicates the client installation directory. Replace it with the directory you use.
- The preceding command executes the task in the background. You can query related logs in the hadoop-root-balancer-Host name.out log file in the /opt/client/HDFS/hadoop/logs directory of the host.
- To stop the balance task, run the following command:
```
bash /opt/client/HDFS/hadoop/sbin/stop-balancer.sh
```
- If only data on some nodes needs to be balanced, you can add the -include parameter in the script to specify the nodes to be migrated. You can run commands to view the usage of different parameters.
  For example:
```
bash /opt/client/HDFS/hadoop/sbin/start-balancer.sh -threshold 5 -include IP1,IP2,IP3
```
- If the command fails to be executed and the error information "Failed to APPEND_FILE /system/balancer.id" is displayed in the log, run the following command to forcibly delete /system/balancer.id and run the start-balancer.sh script again:
```
hdfs dfs -rm -f /system/balancer.id
```
After you run the script in 7, the hadoop-root-balancer-Host name.out log file is generated in /opt/client/HDFS/hadoop/logs, the client installation directory. You can view the following information in the log:
- Time Stamp
- Bytes Already Moved
- Bytes Left To Move
- Bytes Being Moved
If message "Balance took xxx seconds" is displayed in the log, the balancing operation is complete.

Setting Automatic Execution of the Balancing Task

Choose Cluster > Services > HDFS. Click Configurations then All Configurations, search for the following parameters, and change the parameter values.

**Table 1** Parameter description
Parameter	Description	Default Value
dfs.balancer.auto.enable	Whether to enable the automatic execution of balancing tasks. false: The function is disabled. true: The function is enabled.	false
dfs.balancer.auto.cron.expression	Cron expression of the HDFS balancing operation, which is used to control the start time of the balancing operation. This parameter is valid only when dfs.balancer.auto.enable is set to true. The default value *0 1 * 6** indicates that the balancing task is executed at 01:00 every Saturday. For details about the parameter value expressions, see Table 2. * indicates consecutive time segments.	0 1 * * 6
dfs.balancer.auto.stop.cron.expression	Cron expression for stopping the HDFS balancing operation, which is used to control the end time of the balancing operation. This parameter is valid only when dfs.balancer.auto.enable is set to true. For details about the parameter value expressions, see Table 2. * indicates consecutive time segments. For example, if the parameter is set to *0 5 * 6**, the balancing task ends at 05:00 every Saturday.	-

**Table 2** Parameters in the execution expression
Column	Description
1	Minute. The value ranges from 0 to 59.
2	Hour. The value ranges from 0 to 23.
3	Date. The value ranges from 1 to 31.
4	Month. The value ranges from 1 to 12.
5	Week. The value ranges from 0 to 6. 0 indicates Sunday.

Running parameters of the balance task that is automatically executed are shown in Table 3.

**Table 3** Running parameters of the automatic balancer
Parameter	Parameter description	Default Value
dfs.balancer.auto.threshold	Balancing threshold of the disk capacity percentage. This parameter is valid only when dfs.balancer.auto.enable is set to true.	10
dfs.balancer.auto.exclude.datanodes	List of DataNodes on which automatic disk balancing is not required. Use commas (,) to separate DataNodes. This parameter is valid only when dfs.balancer.auto.enable is set to true.	The value is left blank by default.
dfs.balancer.auto.bandwidthPerSec	Specifies the maximum bandwidth (MB/s) of each DataNode for load balancing.	20
dfs.balancer.auto.maxIdleIterations	Specifies the maximum number of consecutive idle iterations of Balancer. An idle iteration is an iteration without moving blocks. When the number of consecutive idle iterations reaches the maximum number, the balance task ends. The value -1 indicates infinity.	5
dfs.balancer.auto.maxDataNodesNum	Controls the number of DataNodes that perform automatic balance tasks. Assume that the value of this parameter is N. If N is greater than 0, data is balanced between N DataNodes with the highest percentage of remaining space and N DataNodes with the lowest percentage of remaining space. If N is 0, data is balanced among all DataNodes in the cluster.	5

Click Save to make configurations take effect. You do not need to restart the HDFS service.

Go to the /var/log/Bigdata/hdfs/nn/hadoop-omm-balancer-Host name.log file to view the task execution logs saved in the active NameNode.

Helpful Links

If the hadoop-root-balancer-Hostname.out log file contains "Access denied for user test1. Superuser privilege is required" after you run the start-balancer.sh command, see Locating Common Balance Problems.
If a balance process fails to restart after it is stopped unexpectedly on the HDFS client, see When Does a Balance Process in HDFS, Shut Down and Fail to be Executed Again?.
If data is unevenly distributed, that is, a disk is exhausted but other disks are not, see Uneven Data Distribution Due to Non-HDFS Data Residuals.
If the disk usage of DataNodes on a single node is unbalanced, see Unbalanced DataNode Disk Usages of a Node.
To balance disk data on a running DataNode, see Configuring HDFS Disk Balancing.