Help Center/ MapReduce Service/ Component Operation Guide (Normal)/ Using HDFS/ HDFS Performance Tuning/ Improving the HDFS Client Connection Performance with Active NameNode Caching
Updated on 2025-10-11 GMT+08:00

Improving the HDFS Client Connection Performance with Active NameNode Caching

Scenario

When HDFS is deployed in high availability (HA) mode with multiple NameNode instances, the HDFS client needs to connect to each NameNode in sequence to determine which is the active NameNode and perform client operations on it.

Once the active NameNode is identified, its details can be cached and shared to all clients running on the client host. In this way, each new client first attempts to load the details of the active Name Node from the cache and save the RPC call to the standby NameNode. This mechanism has significant advantages in exceptional situations. For example, if the standby NameNode does not respond for a long time, the system switches another NameNode to the active state, and the information about the current active NameNode is updated in the cache.

Notes and Constraints

  • This section applies to MRS 3.x or later.
  • The cache files created by the HDFS client are reused by other clients, and thus these files will not be deleted from the local system. If this function is disabled, you may need to manually clear the data.

Procedure

  1. Log in to FusionInsight Manager.

    For details about how to log in to FusionInsight Manager, see Accessing MRS Manager.

  2. Choose Cluster > Services > HDFS > Configurations > All Configurations.
  3. Search for the following parameters and change their values as required.

    Table 1 Configuration parameters

    Parameter

    Description

    Example Value

    dfs.client.failover.proxy.provider.[nameservice ID]

    Specifies the Client Failover proxy provider class that uses the authenticated protocol to create the NameNode proxy.

    • org.apache.hadoop.hdfs.server.namenode.ha.BlackListingFailoverProxyProvider: uses the NameNode blacklist feature on the HDFS client.
    • org.apache.hadoop.hdfs.server.namenode.ha.ObserverReadProxyProvider: uses the feature that supports reads from NameNode.
    • org.apache.hadoop.hdfs.server.namenode.ha.AdaptiveFailoverProxyProvider: uses the dynamic adjustment policy to select the optimal node for read and write operations based on the health status and load of NameNodes.
    • org.apache.hadoop.hdfs.server.namenode.ha.RequestHedgingProxyProvider: sends requests to multiple NameNodes at the same time. The node that responds to the request first is the active NameNode.
    • org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider: determines the failover policy based on the settings in the configuration file. You can customize the failover policy.

    Set the parameter as required.

    dfs.client.failover.activeinfo.share.flag

    Specifies whether to enable the cache function and share the detailed information about the current active NameNode with other clients. The default value is false, indicating that the cache function is disabled.

    true

    dfs.client.failover.activeinfo.share.path

    Specifies the local directory for storing the shared files created by all clients in the host. If a cache area is to be shared by different users, the directory must have required permissions (for example, creating, reading, and writing cache files in the specified directory).

    /tmp

    dfs.client.failover.activeinfo.share.io.timeout.sec

    (Optional) Used to control timeout. The cache file is locked when it is being read or written, and if the file cannot be locked within the specified time, the attempt to read or update the caches will be abandoned. The unit is second.

    The value ranges from 1 to 3600, and the default value is 5.

    5

  4. Save the settings. Restart the expired service or instance for the configuration to take effect.