Configuring Replica Replacement Policy for DataNodes with Inconsistent Capacity

Scenario

By default, NameNode randomly selects a DataNode to write files. If the disk capacities vary among data nodes in a cluster, the node with a smaller disk capacity will be used up first. To resolve this problem, you can change the default disk selection policy for data written to DataNode to the available space block policy. This policy increases the probability of writing data blocks to the node with large available disk space. This ensures that the node usage is balanced when disk capacity of DataNodes is inconsistent, significantly improving HDFS data reliability and read/write performance.

The default replica storage policy of the NameNode is as follows:

First replica: stored on the node where the client resides.
Second replica: stored on DataNodes of the remote rack.
Third replica: stored on different nodes of the same rack for the node where the client resides.

If there are more replicas, randomly store them on other DataNodes.

The replica selection mechanism of the available disk space block storage policy is as follows:

First replica: stored on the DataNode where the client resides (the same as the default storage policy).
Second replica:
- When selecting a storage node, select two data nodes that meet the requirements.
- Compare the disk usages of the two DataNodes. If the difference is smaller than 5%, store the replicas to the first node.
- If the difference exceeds 5%, there is a 60% probability (specified by dfs.namenode.available-space-block-placement-policy.balanced-space-preference-fraction and default value is 0.6) that the replica is written to the node whose disk space usage is low.
As for the storage of the third replica and subsequent replicas, refer to that of the second replica.

Impact on the System

Adjusting the disk selection policy of HDFS data may affect the HDFS write performance.

Prerequisites

The total disk capacity deviation of DataNodes in the MRS cluster cannot exceed 100%.

Procedure

Log in to FusionInsight Manager.

For details about how to log in to FusionInsight Manager, see Accessing MRS Manager.
Choose Cluster > Services > HDFS > Configurations > All Configurations.

Modify the disk selection policy parameters of HDFS data writing.

**Table 1** Disk selection policy parameters
Parameter	Description	Example Value
dfs.block.replicator.classname	Specifies the DataNode replica placement policy. org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyDefault: Standard Hadoop replica storage logic. It uses the rack awareness policy to balance reliability, network efficiency, and read/write performance. org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyWithRackGroup: Rack group-based replica storage policy. Based on the standard rack awareness, multiple racks can be divided into a logical group to provide fine-grained replica storage control. To enable the rack group-based storage policy, you need to set dfs.use.dfs.network.topology to false and net.topology.impl to org.apache.hadoop.net.NetworkTopologyWithRackGroup as well. org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyWithNodeLabel: Node label-based replica storage policy for fine-grained control of data block storage locations. This policy allows you to label nodes based on node hardware features (such as SSDs and HDDs), locations (racks and data centers), or roles, and define storage policies based on the labels to optimize data distribution and resource utilization. org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyWithNodeGroup: Node group-based replica storage policy. Based on standard rack awareness, it allows more flexible definition of fault domains and data redundancy rules. This policy applies to complex network topologies (such as multiple data centers and hybrid clouds) or scenarios where more refined fault isolation is required. org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyWithNonAffinityNodeGroup: Based on the node group-based replica storage policy, allocates replicas to different node groups to provide stronger fault isolation. This policy applies to high availability (HA) scenarios that require cross-data center and cross-region deployment. It ensures that data availability even if the entire node group is faulty. org.apache.hadoop.hdfs.server.blockmanagement.AvailableSpaceBlockPlacementPolicy (default value): Preferentially allocates data blocks to nodes with sufficient storage space based on the standard replicas storage policy of Hadoop. This prevents write failures or data skew caused by insufficient space on some nodes. org.apache.hadoop.hdfs.server.blockmanagement.BlockPlacementPolicyRackFaultTolerant: Optimizes the distribution of replicas across racks to ensure data availability even if multiple racks are faulty. This policy applies to large-scale clusters that require high reliability.	org.apache.hadoop.hdfs.server.blockmanagement.AvailableSpaceBlockPlacementPolicy