Help Center/ MapReduce Service/ Component Operation Guide (Normal)/ Using HDFS/ HDFS Performance Tuning/ Improving Read Performance By HDFS Client Metadata Caching
Updated on 2025-10-11 GMT+08:00

Improving Read Performance By HDFS Client Metadata Caching

Scenario

The HDFS client caches metadata block locations to reduce the number of metadata queries to NameNode, lower network overhead, improve the client operation response efficiency and HDFS read performance.

Notes and Constraints

  • This section applies to MRS 3.x or later.
  • This function is recommended only for reading files that are not modified frequently. Because the data modifications made by other clients on the server is invisible to the cache clients. As a result, the metadata obtained from the cache may be outdated.

Procedure

  1. Log in to FusionInsight Manager.

    For details about how to log in to FusionInsight Manager, see Accessing MRS Manager.

  2. Choose Cluster > Services > HDFS > Configurations > All Configurations.
  3. Search for the following parameters and change their values as required.

    Table 1 Parameter configuration

    Parameter

    Description

    Default Value

    dfs.client.metadata.cache.enabled

    Indicates whether to enable the client to cache the metadata of block locations.

    • true: The function is enabled.
    • false: The function is disabled.

    false

    dfs.client.metadata.cache.pattern

    Indicates the regular expression pattern of the path of the file to be cached. The metadata of block locations of these files is cached until the metadata expires.

    This parameter is valid only when dfs.client.metadata.cache.enabled is set to true.

    Example: /test.* indicates that all files whose paths start with /test are read.

    • To ensure consistency, configure a specific mode to cache only files that are not frequently modified by other clients.
    • The regular expression pattern verifies only the path of the URI, but not the schema and authority in the case of the Fully Qualified path.

    -

    dfs.client.metadata.cache.expiry.sec

    Indicates the duration for caching metadata. The cache entry becomes invalid after its caching time exceeds this duration. Even metadata that is frequently used during the caching process can become invalid.

    Time suffixes s/m/h can be used to indicate second, minute, and hour, respectively.

    If this parameter is set to 0s, the cache function is disabled.

    60s

    dfs.client.metadata.cache.max.entries

    Indicates the maximum number of non-expired data items that can be cached at a time.

    Value range: 0 to 65536.

    65536

  4. Save the settings. Restart the expired service or instance for the configuration to take effect.

Call DFSClient#clearLocatedBlockCache() to completely clear the client cache before it expires.

The sample usage is as follows:

    FileSystem fs = FileSystem.get(conf);
    DistributedFileSystem dfs = (DistributedFileSystem) fs;
    DFSClient dfsClient = dfs.getClient();
    dfsClient.clearLocatedBlockCache();