Improving Read Performance By HDFS Client Metadata Caching
Scenario
The HDFS client caches metadata block locations to reduce the number of metadata queries to NameNode, lower network overhead, improve the client operation response efficiency and HDFS read performance.
Notes and Constraints
- This section applies to MRS 3.x or later.
- This function is recommended only for reading files that are not modified frequently. Because the data modifications made by other clients on the server is invisible to the cache clients. As a result, the metadata obtained from the cache may be outdated.
Procedure
- Log in to FusionInsight Manager.
For details about how to log in to FusionInsight Manager, see Accessing MRS Manager.
- Choose Cluster > Services > HDFS > Configurations > All Configurations.
- Search for the following parameters and change their values as required.
Table 1 Parameter configuration Parameter
Description
Default Value
dfs.client.metadata.cache.enabled
Indicates whether to enable the client to cache the metadata of block locations.
- true: The function is enabled.
- false: The function is disabled.
false
dfs.client.metadata.cache.pattern
Indicates the regular expression pattern of the path of the file to be cached. The metadata of block locations of these files is cached until the metadata expires.
This parameter is valid only when dfs.client.metadata.cache.enabled is set to true.
Example: /test.* indicates that all files whose paths start with /test are read.
- To ensure consistency, configure a specific mode to cache only files that are not frequently modified by other clients.
- The regular expression pattern verifies only the path of the URI, but not the schema and authority in the case of the Fully Qualified path.
-
dfs.client.metadata.cache.expiry.sec
Indicates the duration for caching metadata. The cache entry becomes invalid after its caching time exceeds this duration. Even metadata that is frequently used during the caching process can become invalid.
Time suffixes s/m/h can be used to indicate second, minute, and hour, respectively.
If this parameter is set to 0s, the cache function is disabled.
60s
dfs.client.metadata.cache.max.entries
Indicates the maximum number of non-expired data items that can be cached at a time.
Value range: 0 to 65536.
65536
- Save the settings. Restart the expired service or instance for the configuration to take effect.

Call DFSClient#clearLocatedBlockCache() to completely clear the client cache before it expires.
The sample usage is as follows:
FileSystem fs = FileSystem.get(conf); DistributedFileSystem dfs = (DistributedFileSystem) fs; DFSClient dfsClient = dfs.getClient(); dfsClient.clearLocatedBlockCache();
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot