Updated on 2025-10-11 GMT+08:00

Optimizing HDFS DataNode RPC QoS

Scenario

When the speed at which the client writes data to the HDFS is greater than the disk bandwidth of the DataNode, the disk bandwidth is fully occupied. As a result, the DataNode does not respond. The client can back off only by canceling or restoring the channel, which results in write failures and unnecessary channel recovery operations.

Add the parameter dfs.pipeline.ecn to MRS. When this parameter is enabled, DataNodes send signals when the write channel is overloaded and blocked. The client may perform backoff based on the blocking signal to prevent the system from being overloaded. This parameter is introduced to make the channel more stable and reduce unnecessary cancellation or recovery operations. After receiving the signal, the client backs off for a period of time (5,000 ms), and then adjusts the backoff time based on the related filter (the maximum backoff time is 50,000 ms).

Notes and Constraints

This section applies to MRS 3.x or later.

Procedure

  1. Log in to FusionInsight Manager.

    For details about how to log in to FusionInsight Manager, see Accessing MRS Manager.

  2. Choose Cluster > Services > HDFS > Configurations > All Configurations.
  3. Search for the following parameters and change their values as required.

    Table 1 NameNode ECN configuration

    Parameter

    Description

    Default Value

    dfs.pipeline.ecn

    Whether to enable the congestion signaling capability on the DataNode.

    After this function is enabled, DataNodes can send signals to clients when blocking occurs.

    false

  4. Save the settings. Restart the expired service or instance for the configuration to take effect.