Help Center/ MapReduce Service/ Component Operation Guide (Normal)/ Using HDFS/ HDFS Performance Tuning/ Improving the HDFS Client Connection Performance with Active NameNode Caching
Updated on 2024-10-08 GMT+08:00

Improving the HDFS Client Connection Performance with Active NameNode Caching

Scenario

When HDFS is deployed in high availability (HA) mode with multiple NameNode instances, the HDFS client needs to connect to each NameNode in sequence to determine which is the active NameNode and use it for client operations.

Once the active NameNode is identified, its details can be cached and shared to all clients running on the client host. In this way, each new client first tries to load the details of the active Name Node from the cache and save the RPC call to the standby NameNode, which can help a lot in abnormal scenarios, for example, when the standby NameNode cannot be connected for a long time.

When a fault occurs and the other NameNode is switched to the active state, the cached details are updated to the information about the current active NameNode.

This section applies to MRS 3.x or later.

Procedure

Navigation path for setting parameters:

On FusionInsight Manager, choose Cluster > Services > HDFS, click Configurations then All Configurations, and enter the parameter name in the search box.

Table 1 Configuration parameters

Parameter

Description

Default Value

dfs.client.failover.proxy.provider.[nameservice ID]

Client Failover proxy provider class which creates the NameNode proxy using the authenticated protocol. If this parameter is set to org.apache.hadoop.hdfs.server.namenode.ha.BlackListingFailoverProxyProvider, you can use the NameNode blacklist feature on the HDFS client. If this parameter is set to org.apache.hadoop.hdfs.server.namenode.ha.ObserverReadProxyProvider, you can configure the observer NameNode to process read requests.

org.apache.hadoop.hdfs.server.namenode.ha.AdaptiveFailoverProxyProvider

dfs.client.failover.activeinfo.share.flag

Specifies whether to enable the cache function and share the detailed information about the current active NameNode with other clients. Set it to true to enable the cache function.

false

dfs.client.failover.activeinfo.share.path

Specifies the local directory for storing the shared files created by all clients in the host. If a cache area is to be shared by different users, the directory must have required permissions (for example, creating, reading, and writing cache files in the specified directory).

/tmp

dfs.client.failover.activeinfo.share.io.timeout.sec

(Optional) Used to control timeout. The cache file is locked when it is being read or written, and if the file cannot be locked within the specified time, the attempt to read or update the caches will be abandoned. The unit is second.

5

The cache files created by the HDFS client are reused by other clients, and thus these files will not be deleted from the local system. If this function is disabled, you may need to manually clear the data.