Improving the HDFS Client Connection Performance with Active NameNode Caching
Scenario
When HDFS is deployed in high availability (HA) mode with multiple NameNode instances, the HDFS client needs to connect to each NameNode in sequence to determine which is the active NameNode and use it for client operations.
Once the active NameNode is identified, its details can be cached and shared to all clients running on the client host. In this way, each new client first tries to load the details of the active Name Node from the cache and save the RPC call to the standby NameNode, which can help a lot in abnormal scenarios, for example, when the standby NameNode cannot be connected for a long time.
When a fault occurs and the other NameNode is switched to the active state, the cached details are updated to the information about the current active NameNode.
This section applies to MRS 3.x or later.
Procedure
Navigation path for setting parameters:
On FusionInsight Manager, choose Cluster > Services > HDFS, click Configurations then All Configurations, and enter the parameter name in the search box.
Parameter |
Description |
Default Value |
---|---|---|
dfs.client.failover.proxy.provider.[nameservice ID] |
Client Failover proxy provider class which creates the NameNode proxy using the authenticated protocol. If this parameter is set to org.apache.hadoop.hdfs.server.namenode.ha.BlackListingFailoverProxyProvider, you can use the NameNode blacklist feature on the HDFS client. If this parameter is set to org.apache.hadoop.hdfs.server.namenode.ha.ObserverReadProxyProvider, you can configure the observer NameNode to process read requests. |
org.apache.hadoop.hdfs.server.namenode.ha.AdaptiveFailoverProxyProvider |
dfs.client.failover.activeinfo.share.flag |
Specifies whether to enable the cache function and share the detailed information about the current active NameNode with other clients. Set it to true to enable the cache function. |
false |
dfs.client.failover.activeinfo.share.path |
Specifies the local directory for storing the shared files created by all clients in the host. If a cache area is to be shared by different users, the directory must have required permissions (for example, creating, reading, and writing cache files in the specified directory). |
/tmp |
dfs.client.failover.activeinfo.share.io.timeout.sec |
(Optional) Used to control timeout. The cache file is locked when it is being read or written, and if the file cannot be locked within the specified time, the attempt to read or update the caches will be abandoned. The unit is second. |
5 |
The cache files created by the HDFS client are reused by other clients, and thus these files will not be deleted from the local system. If this function is disabled, you may need to manually clear the data.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.