Help Center/ MapReduce Service/ Component Operation Guide (LTS)/ Using HDFS/ Overview of HDFS File System Directories
Updated on 2025-08-22 GMT+08:00

Overview of HDFS File System Directories

Hadoop Distributed File System (HDFS) implements reliable and distributed read/write of massive amounts of data. HDFS is applicable to the scenario where data read/write features "write once and read multiple times". However, the write operation is performed in sequence, that is, it is a write operation performed during file creation or an adding operation performed behind the existing file. HDFS ensures that only one caller can perform write operation on a file but multiple callers can perform read operation on the file at the same time.

Table 1 describes the directory structure in HDFS.

Table 1 HDFS file system directories

Path

Type

Function

Whether the Directory Can Be Deleted

Deletion Consequence

  • /tmp/spark2x/sparkhive-scratch
  • /tmp/spark/sparkhive-scratch

Fixed directory

Stores temporary files of metastore sessions in Spark JDBCServer.

No

Failed to run the task.

/tmp/sparkhive-scratch

Fixed directory

Stores temporary files of metastore sessions that are executed using the Spark CLI.

No

Failed to run the task.

/tmp/logs/

Fixed directory

Stores container log files.

Yes

Container logs cannot be viewed.

/tmp/carbon/

Fixed directory

Stores abnormal CarbonData data that is detected during data import.

Yes

Error data is lost.

/tmp/Loader-${Job name}_${MR job ID}

Temporary directory

Stores region information for Loader HBase bulk load jobs. The data is automatically deleted once the job completes.

No

Failed to run the Loader HBase bulk load job.

/tmp/hadoop-omm/yarn/system/rmstore

Fixed directory

Stores the ResourceManager running information.

Yes

Status information is lost after ResourceManager restarts.

/tmp/archived

Fixed directory

Archives the MR task logs on HDFS.

Yes

MR task logs are lost.

/tmp/hadoop-yarn/staging

Fixed directory

Stores the run logs, summary information, and configuration attributes of ApplicationMaster jobs.

No

Tasks are not running properly.

/tmp/hadoop-yarn/staging/history/done_intermediate

Fixed directory

Stores temporary files in the /tmp/hadoop-yarn/staging directory after all tasks are executed.

No

MR task logs are lost.

/tmp/hadoop-yarn/staging/history/done

Fixed directory

The periodic scan thread periodically moves the done_intermediate log file to the done directory.

No

MR task logs are lost.

/tmp/mr-history

Fixed directory

Stores the historical record files that are pre-loaded.

No

Historical MR task log data is lost.

/tmp/hive-scratch

Fixed directory

Stores temporary data (such as session information) generated during Hive running.

No

Failed to run the current task.

/user/{user}/.sparkStaging

Fixed directory

Stores temporary files of the SparkJDBCServer application.

No

Failed to start the executor.

/user/spark2x/jars

Fixed directory

Stores dependency packages required for running the Spark2x executor.

No

Failed to start the executor.

/user/loader

Fixed directory

Stores dirty data of Loader jobs and data of HBase jobs.

No

Failed to execute the HBase job. Or dirty data is lost.

/user/loader/etl_dirty_data_dir

/user/loader/etl_hbase_putlist_tmp

/user/loader/etl_hbase_tmp

/user/oozie

Fixed directory

Stores dependent libraries required for Oozie running. You need to manually upload the libraries.

No

Oozie scheduling failed.

/user/mapred/hadoop-mapreduce-xxx.tar.gz

Fixed files

Stores JAR packages used by the distributed MR cache.

No

The MR distributed cache function is unavailable.

/user/hive

Fixed directory

Default directory for storing Hive-related data, including the dependent Spark lib package and default table data storage path.

No

User data is lost.

/user/omm-bulkload

Temporary directory

Temporary directory of the HBase batch import tool.

No

HBase batch import tasks failed.

/user/hbase

Temporary directory

Temporary directory of the HBase batch import tool.

No

HBase batch import tasks failed.

  • /spark2xJobHistory2x
  • /sparkJobHistory

Fixed directory

Stores Spark event log data.

No

The History Server service is unavailable, and task execution failed.

/flume

Fixed directory

Stores data collected by Flume from HDFS.

No

Flume is not running properly.

/mr-history/tmp

Fixed directory

Stores logs generated by MapReduce jobs.

Yes

Log information is lost.

/mr-history/done

Fixed directory

Stores logs managed by MR JobHistory Server.

Yes

Log information is lost.

/tenant

Created when a tenant is added.

Directory in the HDFS for a tenant. By default, the system automatically creates a folder in the /tenant directory based on the tenant name.

For example, the default HDFS storage directory for ta1 is tenant/ta1. When a tenant is created for the first time, the system automatically creates the /tenant directory in the HDFS root directory. You can customize the storage path.

No

The tenant account is unavailable.

/apps{1~5}/

Fixed directory

Stores the Hive package used by WebHCat.

No

Failed to run the WebHCat tasks.

/hbase

Fixed directory

Stores HBase data.

No

HBase user data is lost.

/hbaseFileStream

Fixed directory

Stores HFS files.

No

The HFS files are lost and cannot be recovered.

Checking File System Directories

You can log in to the HDFS client and check the directories by referring to the following operations:

  1. The client has been installed, for example, in the /opt/client directory.

    For details about how to download and install the cluster client, see Installing an MRS Cluster Client.

  2. Log in to the node where the client is installed as the client installation user.
  3. Go to the client installation directory.

    cd /opt/client

  4. Configure environment variables.

    source bigdata_env

  5. If Kerberos authentication has been enabled for the cluster (in security mode), run the command for user authentication. User authentication is not required for clusters with Kerberos authentication disabled.

    kinit Component service user

  6. Check the HDFS system directory.

    hdfs dfs -ls Folder name

    For example, run the following command to view file information in the / directory of the HDFS system:

    hdfs dfs -ls /

    The command output is as follows:

    ...
    drwxrwxrwx   - mapred     hadoop              0 2025-03-10 21:47 /mr-history
    drwxrwxrwx   - hdfs       hadoop              0 2025-03-10 21:47 /mrs
    drwx--x--x   - admin      supergroup          0 2025-03-10 21:47 /tenant
    drwxrwxrwx   - hdfs       hadoop              0 2025-03-10 21:50 /tmp
    drwxrwxrwx   - hdfs       hadoop              0 2025-03-10 21:51 /user
    ...

Helpful Links