Overview of HDFS File System Directories
Hadoop Distributed File System (HDFS) implements reliable and distributed read/write of massive amounts of data. HDFS is applicable to the scenario where data read/write features "write once and read multiple times". However, the write operation is performed in sequence, that is, it is a write operation performed during file creation or an adding operation performed behind the existing file. HDFS ensures that only one caller can perform write operation on a file but multiple callers can perform read operation on the file at the same time.
This section describes the directory structure in HDFS, as shown in the following table.
Path |
Type |
Function |
Whether the Directory Can Be Deleted |
Deletion Consequence |
---|---|---|---|---|
/tmp/spark2x/sparkhive-scratch |
Fixed directory |
Stores temporary files of metastore session in Spark2x JDBCServer. |
No |
Failed to run the task. |
/tmp/sparkhive-scratch |
Fixed directory |
Stores temporary files of metastore sessions that are executed in CLI mode using Spark2x CLI. |
No |
Failed to run the task. |
/tmp/logs/ |
Fixed directory |
Stores container log files. |
Yes |
Container log files cannot be viewed. |
/tmp/carbon/ |
Fixed directory |
Stores the abnormal data in this directory if abnormal CarbonData data exists during data import. |
Yes |
Error data is lost. |
/tmp/Loader-${Job name}_${MR job ID} |
Temporary directory |
Stores the region information about Loader HBase bulkload jobs. The data is automatically deleted after the job running is completed. |
No |
Failed to run the Loader HBase Bulkload job. |
/tmp/hadoop-omm/yarn/system/rmstore |
Fixed directory |
Stores the ResourceManager running information. |
Yes |
Status information is lost after ResourceManager is restarted. |
/tmp/archived |
Fixed directory |
Archives the MR task logs on HDFS. |
Yes |
MR task logs are lost. |
/tmp/hadoop-yarn/staging |
Fixed directory |
Stores the run logs, summary information, and configuration attributes of ApplicationMaster running jobs. |
No |
Services are running improperly. |
/tmp/hadoop-yarn/staging/history/done_intermediate |
Fixed directory |
Stores temporary files in the /tmp/hadoop-yarn/staging directory after all tasks are executed. |
No |
MR task logs are lost. |
/tmp/hadoop-yarn/staging/history/done |
Fixed directory |
The periodic scanning thread periodically moves the done_intermediate log file to the done directory. |
No |
MR task logs are lost. |
/tmp/mr-history |
Fixed directory |
Stores the historical record files that are pre-loaded. |
No |
Historical MR task log data is lost. |
/tmp/hive-scratch |
Fixed directory |
Stores temporary data (such as session information) generated during Hive running. |
No |
Failed to run the current task. |
/user/{user}/.sparkStaging |
Fixed directory |
Stores temporary files of the SparkJDBCServer application. |
No |
Failed to start the executor. |
/user/spark2x/jars |
Fixed directory |
Stores running dependency packages of the Spark2x executor. |
No |
Failed to start the executor. |
/user/loader |
Fixed directory |
Stores dirty data of Loader jobs and data of HBase jobs. |
No |
Failed to execute the HBase job. Or dirty data is lost. |
/user/loader/etl_dirty_data_dir |
||||
/user/loader/etl_hbase_putlist_tmp |
||||
/user/loader/etl_hbase_tmp |
||||
/user/oozie |
Fixed directory |
Stores dependent libraries required for Oozie running, which needs to be manually uploaded. |
No |
Failed to schedule Oozie. |
/user/mapred/hadoop-mapreduce-xxx.tar.gz |
Fixed files |
Stores JAR files used by the distributed MR cache. |
No |
The MR distributed cache function is unavailable. |
/user/hive |
Fixed directory |
Stores Hive-related data by default, including the depended Spark lib package and default table data storage path. |
No |
User data is lost. |
/user/omm-bulkload |
Temporary directory |
Stores HBase batch import tools temporarily. |
No |
Failed to import HBase tasks in batches. |
/user/hbase |
Temporary directory |
Stores HBase batch import tools temporarily. |
No |
Failed to import HBase tasks in batches. |
/spark2xJobHistory2x |
Fixed directory |
Stores Spark2x eventlog data. |
No |
The History Server service is unavailable, and the task fails to be executed. |
/flume |
Fixed directory |
Stores data collected by Flume from HDFS. |
No |
Flume runs improperly. |
/mr-history/tmp |
Fixed directory |
Stores logs generated by MapReduce jobs. |
Yes |
Log information is lost. |
/mr-history/done |
Fixed directory |
Stores logs managed by MR JobHistory Server. |
Yes |
Log information is lost. |
/tenant |
Created when a tenant is added. |
Directory of a tenant in the HDFS. By default, the system automatically creates a folder in the /tenant directory based on the tenant name. For example, the default HDFS storage directory for ta1 is tenant/ta1. When a tenant is created for the first time, the system creates the /tenant directory in the HDFS root directory. You can customize the storage path. |
No |
The tenant account is unavailable. |
/apps{1~5}/ |
Fixed directory |
Stores the Hive package used by WebHCat. |
No |
Failed to run the WebHCat tasks. |
/hbase |
Fixed directory |
Stores HBase data. |
No |
HBase user data is lost. |
/hbaseFileStream |
Fixed directory |
Stores HFS files. |
No |
The HFS file is lost and cannot be restored. |
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot