Configuring the Number of Files in a Single HDFS Directory
Scenario
Generally, multiple services are deployed in a cluster, and the storage of most services depends on the HDFS file system. Different components such as Spark and Yarn or clients are constantly writing files to the same HDFS directory when the cluster is running. However, the number of files in a single directory in HDFS is limited. Users must plan to prevent excessive files in a single directory and task failure.
You can set the number of files in a single directory using the dfs.namenode.fs-limits.max-directory-items parameter in HDFS.
Procedure
- Go to the All Configurations page of HDFS by referring to Modifying Cluster Service Configuration Parameters.
- Search for the configuration item dfs.namenode.fs-limits.max-directory-items.
Table 1 Parameter description Parameter
Description
Default Value
dfs.namenode.fs-limits.max-directory-items
Maximum number of items in a directory
Value range: 1 to 6,400,000
1048576
- Set the maximum number of files that can be stored in a single HDFS directory. Save the modified configuration. Restart the expired service or instance for the configuration to take effect.
Plan data storage in advance based on time and service type categories to prevent excessive files in a single directory. You are advised to use the default value, which is about 1 million pieces of data in a single directory.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.