Overview of HDFS File System Directories
This section describes the directory structure in HDFS, as shown in the following table.
| Path | Type | Function | Whether the Directory Can Be Deleted | Deletion Consequence |
|---|---|---|---|---|
| /tmp/spark/sparkhive-scratch | Fixed directory | Stores temporary files of metastore sessions in Spark JDBCServer. | No | Failed to run the task. |
| /tmp/sparkhive-scratch | Fixed directory | Stores temporary files of metastore session that are executed using Spark CLI. | No | Failed to run the task. |
| /tmp/carbon/ | Fixed directory | Stores the abnormal data in this directory if abnormal CarbonData data exists during data import. | Yes | Error data is lost. |
| /tmp/Loader-${Job name}_${MR job ID} | Temporary directory | Stores the region information about Loader HBase bulkload jobs. The data is automatically deleted after the job running is completed. | No | Failed to run the Loader HBase Bulkload job. |
| /tmp/logs | Fixed directory | Stores the collected MR task logs. | Yes | MR task logs are lost. |
| /tmp/archived | Fixed directory | Archives the MR task logs on HDFS. | Yes | MR task logs are lost. |
| /tmp/hadoop-yarn/staging | Fixed directory | Stores the run logs, summary information, and configuration attributes of ApplicationMaster running jobs. | No | Services are running improperly. |
| /tmp/hadoop-yarn/staging/history/done_intermediate | Fixed directory | Stores temporary files in the /tmp/hadoop-yarn/staging directory after all tasks are executed. | No | MR task logs are lost. |
| /tmp/hadoop-yarn/staging/history/done | Fixed directory | The periodic scanning thread periodically moves the done_intermediate log file to the done directory. | No | MR task logs are lost. |
| /tmp/mr-history | Fixed directory | Stores the historical record files that are pre-loaded. | No | Historical MR task log data is lost. |
| /tmp/hive | Fixed directory | Stores Hive temporary files. | No | Failed to run the Hive task. |
| /tmp/hive-scratch | Fixed directory | Stores temporary data (such as session information) generated during Hive running. | No | Failed to run the current task. |
| /user/{user}/.sparkStaging | Fixed directory | Stores temporary files of the SparkJDBCServer application. | No | Failed to start the executor. |
| /user/spark/jars | Fixed directory | Stores running dependency packages of the Spark executor. | No | Failed to start the executor. |
| /user/loader | Fixed directory | Stores dirty data of Loader jobs and data of HBase jobs. | No | Failed to execute the HBase job. Or dirty data is lost. |
| /user/loader/etl_dirty_data_dir | ||||
| /user/loader/etl_hbase_putlist_tmp | ||||
| /user/loader/etl_hbase_tmp | ||||
| /user/mapred | Fixed directory | Stores Hadoop-related files. | No | Failed to start Yarn. |
| /user/hive | Fixed directory | Stores Hive-related data by default, including the depended Spark lib package and default table data storage path. | No | User data is lost. |
| /user/omm-bulkload | Temporary directory | Stores HBase batch import tools temporarily. | No | Failed to import HBase tasks in batches. |
| /user/hbase | Temporary directory | Stores HBase batch import tools temporarily. | No | Failed to import HBase tasks in batches. |
| /sparkJobHistory | Fixed directory | Stores Spark event log data. | No | The History Server service is unavailable, and the task fails to be executed. |
| /flume | Fixed directory | Stores data collected by Flume from HDFS. | No | Flume runs improperly. |
| /mr-history/tmp | Fixed directory | Stores logs generated by MapReduce jobs. | Yes | Log information is lost. |
| /mr-history/done | Fixed directory | Stores logs managed by MR JobHistory Server. | Yes | Log information is lost. |
| /tenant | Created when a tenant is added. | Directory of a tenant in the HDFS. By default, the system automatically creates a folder in the /tenant directory based on the tenant name. For example, the default HDFS storage directory for ta1 is tenant/ta1. When a tenant is created for the first time, the system creates the /tenant directory in the HDFS root directory. You can customize the storage path. | No | The tenant account is unavailable. |
| /apps{1~5}/ | Fixed directory | Stores the Hive package used by WebHCat. | No | Failed to run the WebHCat tasks. |
| /hbase | Fixed directory | Stores HBase data. | No | HBase user data is lost. |
| /hbaseFileStream | Fixed directory | Stores HFS files. | No | The HFS file is lost and cannot be restored. |
| /ats/active | Fixed directory | HDFS path used to store the timeline data of running applications. | No | Failed to run the tez task after the directory deletion. |
| /ats/done | Fixed directory | HDFS path used to store the timeline data of completed applications. | No | Automatically created after the deletion. |
| /flink | Fixed directory | Stores the checkpoint task data. | No | Failed to run tasks after the deletion. |
| Path | Type | Function | Whether the Directory Can Be Deleted | Deletion Consequence |
|---|---|---|---|---|
| /tmp/spark2x/sparkhive-scratch | Fixed directory | Stores temporary files of metastore session in Spark2x JDBCServer. | No | Failed to run the task. |
| /tmp/sparkhive-scratch | Fixed directory | Stores temporary files of metastore sessions that are executed in CLI mode using Spark2x CLI. | No | Failed to run the task. |
| /tmp/logs/ | Fixed directory | Stores container log files. | Yes | Container log files cannot be viewed. |
| /tmp/carbon/ | Fixed directory | Stores the abnormal data in this directory if abnormal CarbonData data exists during data import. | Yes | Error data is lost. |
| /tmp/Loader-${Job name}_${MR job ID} | Temporary directory | Stores the region information about Loader HBase bulkload jobs. The data is automatically deleted after the job running is completed. | No | Failed to run the Loader HBase Bulkload job. |
| /tmp/hadoop-omm/yarn/system/rmstore | Fixed directory | Stores the ResourceManager running information. | Yes | Status information is lost after ResourceManager is restarted. |
| /tmp/archived | Fixed directory | Archives the MR task logs on HDFS. | Yes | MR task logs are lost. |
| /tmp/hadoop-yarn/staging | Fixed directory | Stores the run logs, summary information, and configuration attributes of ApplicationMaster running jobs. | No | Services are running improperly. |
| /tmp/hadoop-yarn/staging/history/done_intermediate | Fixed directory | Stores temporary files in the /tmp/hadoop-yarn/staging directory after all tasks are executed. | No | MR task logs are lost. |
| /tmp/hadoop-yarn/staging/history/done | Fixed directory | The periodic scanning thread periodically moves the done_intermediate log file to the done directory. | No | MR task logs are lost. |
| /tmp/mr-history | Fixed directory | Stores the historical record files that are pre-loaded. | No | Historical MR task log data is lost. |
| /tmp/smallfs | Fixed directory | Stores FGCService files on HDFS. | No | SmallFS functions are abnormal. |
| /tmp/hive-scratch | Fixed directory | Stores temporary data (such as session information) generated during Hive running. | No | Failed to run the current task. |
| /user/{user}/.sparkStaging | Fixed directory | Stores temporary files of the SparkJDBCServer application. | No | Failed to start the executor. |
| /user/spark2x/jars | Fixed directory | Stores running dependency packages of the Spark2x executor. | No | Failed to start the executor. |
| /user/loader | Fixed directory | Stores dirty data of Loader jobs and data of HBase jobs. | No | Failed to execute the HBase job. Or dirty data is lost. |
| /user/loader/etl_dirty_data_dir | ||||
| /user/loader/etl_hbase_putlist_tmp | ||||
| /user/loader/etl_hbase_tmp | ||||
| /user/oozie | Fixed directory | Stores dependent libraries required for Oozie running, which needs to be manually uploaded. | No | Failed to schedule Oozie. |
| /user/mapred/hadoop-mapreduce-3.1.1.tar.gz | Fixed files | Stores JAR files used by the distributed MR cache. | No | The MR distributed cache function is unavailable. |
| /user/hive | Fixed directory | Stores Hive-related data by default, including the depended Spark lib package and default table data storage path. | No | User data is lost. |
| /user/omm-bulkload | Temporary directory | Stores HBase batch import tools temporarily. | No | Failed to import HBase tasks in batches. |
| /user/hbase | Temporary directory | Stores HBase batch import tools temporarily. | No | Failed to import HBase tasks in batches. |
| /spark2xJobHistory2x | Fixed directory | Stores Spark2.x eventlog data. | No | The History Server service is unavailable, and the task fails to be executed. |
| /flume | Fixed directory | Stores data collected by Flume from HDFS. | No | Flume runs improperly. |
| /mr-history/tmp | Fixed directory | Stores logs generated by MapReduce jobs. | Yes | Log information is lost. |
| /mr-history/done | Fixed directory | Stores logs managed by MR JobHistory Server. | Yes | Log information is lost. |
| /tenant | Created when a tenant is added. | Directory of a tenant in the HDFS. By default, the system automatically creates a folder in the /tenant directory based on the tenant name. For example, the default HDFS storage directory for ta1 is tenant/ta1. When a tenant is created for the first time, the system creates the /tenant directory in the HDFS root directory. You can customize the storage path. | No | The tenant account is unavailable. |
| /apps{1~5}/ | Fixed directory | Stores the Hive package used by WebHCat. | No | Failed to run the WebHCat tasks. |
| /hbase | Fixed directory | Stores HBase data. | No | HBase user data is lost. |
| /hbaseFileStream | Fixed directory | Stores HFS files. | No | The HFS file is lost and cannot be restored. |
Last Article: Running the DistCp Command
Next Article: Changing the DataNode Storage Directory
Did this article solve your problem?
Thank you for your score!Your feedback would help us improve the website.