El contenido no se encuentra disponible en el idioma seleccionado. Estamos trabajando continuamente para agregar más idiomas. Gracias por su apoyo.

Compute
Elastic Cloud Server
Huawei Cloud Flexus
Bare Metal Server
Auto Scaling
Image Management Service
Dedicated Host
FunctionGraph
Cloud Phone Host
Huawei Cloud EulerOS
Networking
Virtual Private Cloud
Elastic IP
Elastic Load Balance
NAT Gateway
Direct Connect
Virtual Private Network
VPC Endpoint
Cloud Connect
Enterprise Router
Enterprise Switch
Global Accelerator
Management & Governance
Cloud Eye
Identity and Access Management
Cloud Trace Service
Resource Formation Service
Tag Management Service
Log Tank Service
Config
OneAccess
Resource Access Manager
Simple Message Notification
Application Performance Management
Application Operations Management
Organizations
Optimization Advisor
IAM Identity Center
Cloud Operations Center
Resource Governance Center
Migration
Server Migration Service
Object Storage Migration Service
Cloud Data Migration
Migration Center
Cloud Ecosystem
KooGallery
Partner Center
User Support
My Account
Billing Center
Cost Center
Resource Center
Enterprise Management
Service Tickets
HUAWEI CLOUD (International) FAQs
ICP Filing
Support Plans
My Credentials
Customer Operation Capabilities
Partner Support Plans
Professional Services
Analytics
MapReduce Service
Data Lake Insight
CloudTable Service
Cloud Search Service
Data Lake Visualization
Data Ingestion Service
GaussDB(DWS)
DataArts Studio
Data Lake Factory
DataArts Lake Formation
IoT
IoT Device Access
Others
Product Pricing Details
System Permissions
Console Quick Start
Common FAQs
Instructions for Associating with a HUAWEI CLOUD Partner
Message Center
Security & Compliance
Security Technologies and Applications
Web Application Firewall
Host Security Service
Cloud Firewall
SecMaster
Anti-DDoS Service
Data Encryption Workshop
Database Security Service
Cloud Bastion Host
Data Security Center
Cloud Certificate Manager
Edge Security
Managed Threat Detection
Blockchain
Blockchain Service
Web3 Node Engine Service
Media Services
Media Processing Center
Video On Demand
Live
SparkRTC
MetaStudio
Storage
Object Storage Service
Elastic Volume Service
Cloud Backup and Recovery
Storage Disaster Recovery Service
Scalable File Service Turbo
Scalable File Service
Volume Backup Service
Cloud Server Backup Service
Data Express Service
Dedicated Distributed Storage Service
Containers
Cloud Container Engine
SoftWare Repository for Container
Application Service Mesh
Ubiquitous Cloud Native Service
Cloud Container Instance
Databases
Relational Database Service
Document Database Service
Data Admin Service
Data Replication Service
GeminiDB
GaussDB
Distributed Database Middleware
Database and Application Migration UGO
TaurusDB
Middleware
Distributed Cache Service
API Gateway
Distributed Message Service for Kafka
Distributed Message Service for RabbitMQ
Distributed Message Service for RocketMQ
Cloud Service Engine
Multi-Site High Availability Service
EventGrid
Dedicated Cloud
Dedicated Computing Cluster
Business Applications
Workspace
ROMA Connect
Message & SMS
Domain Name Service
Edge Data Center Management
Meeting
AI
Face Recognition Service
Graph Engine Service
Content Moderation
Image Recognition
Optical Character Recognition
ModelArts
ImageSearch
Conversational Bot Service
Speech Interaction Service
Huawei HiLens
Video Intelligent Analysis Service
Developer Tools
SDK Developer Guide
API Request Signing Guide
Terraform
Koo Command Line Interface
Content Delivery & Edge Computing
Content Delivery Network
Intelligent EdgeFabric
CloudPond
Intelligent EdgeCloud
Solutions
SAP Cloud
High Performance Computing
Developer Services
ServiceStage
CodeArts
CodeArts PerfTest
CodeArts Req
CodeArts Pipeline
CodeArts Build
CodeArts Deploy
CodeArts Artifact
CodeArts TestPlan
CodeArts Check
CodeArts Repo
Cloud Application Engine
MacroVerse aPaaS
KooMessage
KooPhone
KooDrive
On this page

Show all

Help Center/ MapReduce Service/ Component Operation Guide (Normal)/ Using HDFS/ Overview of HDFS File System Directories

Overview of HDFS File System Directories

Updated on 2024-10-08 GMT+08:00

Hadoop Distributed File System (HDFS) implements reliable and distributed read/write of massive amounts of data. HDFS is applicable to the scenario where data read/write features "write once and read multiple times". However, the write operation is performed in sequence, that is, it is a write operation performed during file creation or an adding operation performed behind the existing file. HDFS ensures that only one caller can perform write operation on a file but multiple callers can perform read operation on the file at the same time.

This section describes the directory structure in HDFS, as shown in the following table.

Table 1 HDFS directory structure (applicable to versions earlier than MRS 3.x)

Path

Type

Function

Whether the Directory Can Be Deleted

Deletion Consequence

/tmp/spark/sparkhive-scratch

Fixed directory

Stores temporary files of metastore sessions in Spark JDBCServer.

No

Failed to run the task.

/tmp/sparkhive-scratch

Fixed directory

Stores temporary files of metastore session that are executed using Spark CLI.

No

Failed to run the task.

/tmp/carbon/

Fixed directory

Stores the abnormal data in this directory if abnormal CarbonData data exists during data import.

Yes

Error data is lost.

/tmp/Loader-${Job name}_${MR job ID}

Temporary directory

Stores the region information about Loader HBase bulkload jobs. The data is automatically deleted after the job running is completed.

No

Failed to run the Loader HBase Bulkload job.

/tmp/logs

Fixed directory

Stores the collected MR task logs.

Yes

MR task logs are lost.

/tmp/archived

Fixed directory

Archives the MR task logs on HDFS.

Yes

MR task logs are lost.

/tmp/hadoop-yarn/staging

Fixed directory

Stores the run logs, summary information, and configuration attributes of ApplicationMaster running jobs.

No

Services are running improperly.

/tmp/hadoop-yarn/staging/history/done_intermediate

Fixed directory

Stores temporary files in the /tmp/hadoop-yarn/staging directory after all tasks are executed.

No

MR task logs are lost.

/tmp/hadoop-yarn/staging/history/done

Fixed directory

The periodic scanning thread periodically moves the done_intermediate log file to the done directory.

No

MR task logs are lost.

/tmp/mr-history

Fixed directory

Stores the historical record files that are pre-loaded.

No

Historical MR task log data is lost.

/tmp/hive

Fixed directory

Stores Hive temporary files.

No

Failed to run the Hive task.

/tmp/hive-scratch

Fixed directory

Stores temporary data (such as session information) generated during Hive running.

No

Failed to run the current task.

/user/{user}/.sparkStaging

Fixed directory

Stores temporary files of the SparkJDBCServer application.

No

Failed to start the executor.

/user/spark/jars

Fixed directory

Stores running dependency packages of the Spark executor.

No

Failed to start the executor.

/user/loader

Fixed directory

Stores dirty data of Loader jobs and data of HBase jobs.

No

Failed to execute the HBase job. Or dirty data is lost.

/user/loader/etl_dirty_data_dir

/user/loader/etl_hbase_putlist_tmp

/user/loader/etl_hbase_tmp

/user/mapred

Fixed directory

Stores Hadoop-related files.

No

Failed to start Yarn.

/user/hive

Fixed directory

Stores Hive-related data by default, including the depended Spark lib package and default table data storage path.

No

User data is lost.

/user/omm-bulkload

Temporary directory

Stores HBase batch import tools temporarily.

No

Failed to import HBase tasks in batches.

/user/hbase

Temporary directory

Stores HBase batch import tools temporarily.

No

Failed to import HBase tasks in batches.

/sparkJobHistory

Fixed directory

Stores Spark event log data.

No

The History Server service is unavailable, and the task fails to be executed.

/flume

Fixed directory

Stores data collected by Flume from HDFS.

No

Flume runs improperly.

/mr-history/tmp

Fixed directory

Stores logs generated by MapReduce jobs.

Yes

Log information is lost.

/mr-history/done

Fixed directory

Stores logs managed by MR JobHistory Server.

Yes

Log information is lost.

/tenant

Created when a tenant is added.

Directory of a tenant in the HDFS. By default, the system automatically creates a folder in the /tenant directory based on the tenant name. For example, the default HDFS storage directory for ta1 is tenant/ta1. When a tenant is created for the first time, the system creates the /tenant directory in the HDFS root directory. You can customize the storage path.

No

The tenant account is unavailable.

/apps{1~5}/

Fixed directory

Stores the Hive package used by WebHCat.

No

Failed to run the WebHCat tasks.

/hbase

Fixed directory

Stores HBase data.

No

HBase user data is lost.

/hbaseFileStream

Fixed directory

Stores HFS files.

No

The HFS file is lost and cannot be restored.

/ats/active

Fixed directory

HDFS path used to store the timeline data of running applications.

No

Failed to run the tez task after the directory deletion.

/ats/done

Fixed directory

HDFS path used to store the timeline data of completed applications.

No

Automatically created after the deletion.

/flink

Fixed directory

Stores the checkpoint task data.

No

Failed to run tasks after the deletion.

Table 2 Directory structure of the HDFS file system (applicable to MRS 3.x or later)

Path

Type

Function

Whether the Directory Can Be Deleted

Deletion Consequence

/tmp/spark2x/sparkhive-scratch

Fixed directory

Stores temporary files of metastore session in Spark2x JDBCServer.

No

Failed to run the task.

/tmp/sparkhive-scratch

Fixed directory

Stores temporary files of metastore sessions that are executed in CLI mode using Spark2x CLI.

No

Failed to run the task.

/tmp/logs/

Fixed directory

Stores container log files.

Yes

Container log files cannot be viewed.

/tmp/carbon/

Fixed directory

Stores the abnormal data in this directory if abnormal CarbonData data exists during data import.

Yes

Error data is lost.

/tmp/Loader-${Job name}_${MR job ID}

Temporary directory

Stores the region information about Loader HBase bulkload jobs. The data is automatically deleted after the job running is completed.

No

Failed to run the Loader HBase Bulkload job.

/tmp/hadoop-omm/yarn/system/rmstore

Fixed directory

Stores the ResourceManager running information.

Yes

Status information is lost after ResourceManager is restarted.

/tmp/archived

Fixed directory

Archives the MR task logs on HDFS.

Yes

MR task logs are lost.

/tmp/hadoop-yarn/staging

Fixed directory

Stores the run logs, summary information, and configuration attributes of ApplicationMaster running jobs.

No

Services are running improperly.

/tmp/hadoop-yarn/staging/history/done_intermediate

Fixed directory

Stores temporary files in the /tmp/hadoop-yarn/staging directory after all tasks are executed.

No

MR task logs are lost.

/tmp/hadoop-yarn/staging/history/done

Fixed directory

The periodic scanning thread periodically moves the done_intermediate log file to the done directory.

No

MR task logs are lost.

/tmp/mr-history

Fixed directory

Stores the historical record files that are pre-loaded.

No

Historical MR task log data is lost.

/tmp/hive-scratch

Fixed directory

Stores temporary data (such as session information) generated during Hive running.

No

Failed to run the current task.

/user/{user}/.sparkStaging

Fixed directory

Stores temporary files of the SparkJDBCServer application.

No

Failed to start the executor.

/user/spark2x/jars

Fixed directory

Stores running dependency packages of the Spark2x executor.

No

Failed to start the executor.

/user/loader

Fixed directory

Stores dirty data of Loader jobs and data of HBase jobs.

No

Failed to execute the HBase job. Or dirty data is lost.

/user/loader/etl_dirty_data_dir

/user/loader/etl_hbase_putlist_tmp

/user/loader/etl_hbase_tmp

/user/oozie

Fixed directory

Stores dependent libraries required for Oozie running, which needs to be manually uploaded.

No

Failed to schedule Oozie.

/user/mapred/hadoop-mapreduce-3.1.1.tar.gz

Fixed files

Stores JAR files used by the distributed MR cache.

No

The MR distributed cache function is unavailable.

/user/hive

Fixed directory

Stores Hive-related data by default, including the depended Spark lib package and default table data storage path.

No

User data is lost.

/user/omm-bulkload

Temporary directory

Stores HBase batch import tools temporarily.

No

Failed to import HBase tasks in batches.

/user/hbase

Temporary directory

Stores HBase batch import tools temporarily.

No

Failed to import HBase tasks in batches.

/spark2xJobHistory2x

Fixed directory

Stores Spark2x eventlog data.

No

The History Server service is unavailable, and the task fails to be executed.

/flume

Fixed directory

Stores data collected by Flume from HDFS.

No

Flume runs improperly.

/mr-history/tmp

Fixed directory

Stores logs generated by MapReduce jobs.

Yes

Log information is lost.

/mr-history/done

Fixed directory

Stores logs managed by MR JobHistory Server.

Yes

Log information is lost.

/tenant

Created when a tenant is added.

Directory of a tenant in the HDFS. By default, the system automatically creates a folder in the /tenant directory based on the tenant name. For example, the default HDFS storage directory for ta1 is tenant/ta1. When a tenant is created for the first time, the system creates the /tenant directory in the HDFS root directory. You can customize the storage path.

No

The tenant account is unavailable.

/apps{1~5}/

Fixed directory

Stores the Hive package used by WebHCat.

No

Failed to run the WebHCat tasks.

/hbase

Fixed directory

Stores HBase data.

No

HBase user data is lost.

/hbaseFileStream

Fixed directory

Stores HFS files.

No

The HFS file is lost and cannot be restored.

Utilizamos cookies para mejorar nuestro sitio y tu experiencia. Al continuar navegando en nuestro sitio, tú aceptas nuestra política de cookies. Descubre más

Feedback

Feedback

Feedback

0/500

Selected Content

Submit selected content with the feedback