Compute
Elastic Cloud Server
Huawei Cloud Flexus
Bare Metal Server
Auto Scaling
Image Management Service
Dedicated Host
FunctionGraph
Cloud Phone Host
Huawei Cloud EulerOS
Networking
Virtual Private Cloud
Elastic IP
Elastic Load Balance
NAT Gateway
Direct Connect
Virtual Private Network
VPC Endpoint
Cloud Connect
Enterprise Router
Enterprise Switch
Global Accelerator
Management & Governance
Cloud Eye
Identity and Access Management
Cloud Trace Service
Resource Formation Service
Tag Management Service
Log Tank Service
Config
OneAccess
Resource Access Manager
Simple Message Notification
Application Performance Management
Application Operations Management
Organizations
Optimization Advisor
IAM Identity Center
Cloud Operations Center
Resource Governance Center
Migration
Server Migration Service
Object Storage Migration Service
Cloud Data Migration
Migration Center
Cloud Ecosystem
KooGallery
Partner Center
User Support
My Account
Billing Center
Cost Center
Resource Center
Enterprise Management
Service Tickets
HUAWEI CLOUD (International) FAQs
ICP Filing
Support Plans
My Credentials
Customer Operation Capabilities
Partner Support Plans
Professional Services
Analytics
MapReduce Service
Data Lake Insight
CloudTable Service
Cloud Search Service
Data Lake Visualization
Data Ingestion Service
GaussDB(DWS)
DataArts Studio
Data Lake Factory
DataArts Lake Formation
IoT
IoT Device Access
Others
Product Pricing Details
System Permissions
Console Quick Start
Common FAQs
Instructions for Associating with a HUAWEI CLOUD Partner
Message Center
Security & Compliance
Security Technologies and Applications
Web Application Firewall
Host Security Service
Cloud Firewall
SecMaster
Anti-DDoS Service
Data Encryption Workshop
Database Security Service
Cloud Bastion Host
Data Security Center
Cloud Certificate Manager
Edge Security
Managed Threat Detection
Blockchain
Blockchain Service
Web3 Node Engine Service
Media Services
Media Processing Center
Video On Demand
Live
SparkRTC
MetaStudio
Storage
Object Storage Service
Elastic Volume Service
Cloud Backup and Recovery
Storage Disaster Recovery Service
Scalable File Service Turbo
Scalable File Service
Volume Backup Service
Cloud Server Backup Service
Data Express Service
Dedicated Distributed Storage Service
Containers
Cloud Container Engine
SoftWare Repository for Container
Application Service Mesh
Ubiquitous Cloud Native Service
Cloud Container Instance
Databases
Relational Database Service
Document Database Service
Data Admin Service
Data Replication Service
GeminiDB
GaussDB
Distributed Database Middleware
Database and Application Migration UGO
TaurusDB
Middleware
Distributed Cache Service
API Gateway
Distributed Message Service for Kafka
Distributed Message Service for RabbitMQ
Distributed Message Service for RocketMQ
Cloud Service Engine
Multi-Site High Availability Service
EventGrid
Dedicated Cloud
Dedicated Computing Cluster
Business Applications
Workspace
ROMA Connect
Message & SMS
Domain Name Service
Edge Data Center Management
Meeting
AI
Face Recognition Service
Graph Engine Service
Content Moderation
Image Recognition
Optical Character Recognition
ModelArts
ImageSearch
Conversational Bot Service
Speech Interaction Service
Huawei HiLens
Video Intelligent Analysis Service
Developer Tools
SDK Developer Guide
API Request Signing Guide
Terraform
Koo Command Line Interface
Content Delivery & Edge Computing
Content Delivery Network
Intelligent EdgeFabric
CloudPond
Intelligent EdgeCloud
Solutions
SAP Cloud
High Performance Computing
Developer Services
ServiceStage
CodeArts
CodeArts PerfTest
CodeArts Req
CodeArts Pipeline
CodeArts Build
CodeArts Deploy
CodeArts Artifact
CodeArts TestPlan
CodeArts Check
CodeArts Repo
Cloud Application Engine
MacroVerse aPaaS
KooMessage
KooPhone
KooDrive

Configuring HDFS Cold and Hot Data Migration

Updated on 2022-12-14 GMT+08:00

Scenario

The hot and cold data migration tool migrates HDFS files based on the configured policy. A policy is a set of conditional or non-conditional rules. If a file matches the rule set, the tool performs a group of operations for the file.

The hot and cold data migration tool supports the following rules and operations:

  • Migration rules:
    • Data is migrated based on the latest access time of the file.
    • Data is migrated based on the file modification time.
    • Data is migrated without conditions.
    Table 1 Rule condition tags

    Condition Tag

    Description

    <age operator="lt">

    Defines the conditions for changing the age or modification time.

    <atime operator="gt">

    Defines the condition for accessing time.

    NOTE:

    For a manual migration rule, no condition is required.

  • Operations:
    • Set the storage policy to a given data tier.
    • Migrate files to another folder.
    • Configure the number of copies for a file.
    • Delete a file.
    • Set a node label.
    Table 2 Behavior types:

    Behavior Type

    Description

    Required Parameters

    MARK

    Determines the data access frequency and set a data storage policy.

    <param>

    <name>targettier</name>

    <value>STORAGE_POLICY</value>

    <param>

    MOVE

    Sets the data storage policy or NodeLabel and invokes the HDFS Mover tool.

    <param>

    <name>targettier</name>

    <value>STORAGE_POLICY</value>

    <param>

    <param>

    <name>targetnodelabels</name>

    <value>SOME_EXPRESSION</value>

    <param>

    NOTE:

    You can set either or both of the parameters.

    SET_REPL

    Configures the number of copies for a file.

    <param>

    <name>replcount</name>

    <value>INTEGER</value>

    <param>

    MOVE_TO_FOLDER

    Moves the file to the target folder. If overwrite is set to true, the target path will be overwritten.

    <param>

    <name>target</name>

    <value>PATH</value>

    <param>

    <param>

    <name>overwrite</name>

    <value>true/false</value>

    <param>

    NOTE:

    overwrite is an optional parameter. If this parameter is not set, the default value false is used.

    DELETE

    Delete a file.

    N/A

Configuration Description

You must periodically invoke the migration tool and perform the following operations in the hdfs-site.xml file on the client:

Table 3 Parameter description

Parameter

Description

Default Value

dfs.auto-data-movement.policy.class

Specifies the default data migration policy.

NOTE:

Currently, only DefaultDataMovementPolicy is supported.

com.xxx.hadoop.hdfs.datamovement.policy.DefaultDataMovementPolicy

dfs.auto.data.mover.id

Specifies the output file name of the hot and cold data migration policy.

Current system time (ms)

dfs.auto.data.mover.output.dir

Specifies the name of the HDFS directory to which cold and hot data is migrated. The migration tool writes the behavior status file here.

/system/datamovement

DefaultDataMovementPolicy has the configuration file default-datamovement-policy.xml. Users need to define all rules based on the age or access time and operations performed in this file. This file must be stored in classpath of the client.

The following is an example of the default-datamovement-policy.xml file:

<policies>
  <policy>
    <fileset>
      <file>
        <name>/opt/data/1.txt</name>
      </file>
      <file>
        <name>/opt/data/*/subpath/</name>
        <excludes>
          <name>/opt/data/some/subpath/sub1</name>
        </excludes>
      </file>
    </fileset>
    <rules>
      <rule>
        <age>2w</age>
        <action>
          <type>MOVE</type>
          <params>
            <param>
              <name>targettier</name>
              <value>HOT</value>
            </param>
          </params>
        </action>
      </rule>
    </rules>
  </policy>
</policies>
NOTE:

Other attributes can be added to the tags used in policies, rules, and behavior operations. For example, name can be used to manage the mapping between the user UI (for example, Hue UI) and tool input XML.

Example: <policy name="Manage_File1">

The tags are described as follows:

Table 4 Description of configuring tags

Tag

Description

Reusable or Not

<policy>

Define a single policy.

  • idempotent: specifies whether to check the next rule if the current rule is met when multiple rules exist in the policy.

    Example: <policy name ="policy2" idempotent ="true">

    The default value is true, indicating that the rule and action are idempotent and you can continue to check the next rule. If the value is false, the evaluation stops at the current rule.

  • hours_allowed: indicates whether to execute policy evaluation based on the system time. The value of hours_allowed is a number separated by commas (,). The value ranges from 0 to 23, indicating the system time.

    Example: <policy name ="policy1" hours_allowed ="2-6,13-14">

    If the current system time is within the configured range, continue the evaluation. Otherwise, the evaluation will be skipped.

    NOTE:

    In the input XML, only one policy is supported per file. Therefore, all rules in the file must be covered by a policy tag.

Yes

<fileset>

Define a group of files or folders for each policy.

No (in the policy tag)

<file>

One or more <name> tags are configured for the definition file and/or folder in the <file> tag. The file or folder name supports POSIX globs.

Yes (in the fileset tag)

<excludes>

Define this tag in the <file> tag. This tag can contain multiple <name> tags. In the file or folder range configured in the <file> tag, the files or folders contained in the <name> tag will be excluded. The file or folder name supports POSIX globs.

No (in the fileset tag)

<rules>

Specifies multiple rules defined for a policy.

No (in the policy tag)

<rule>

Specifies a single rule to be defined.

Yes (in the rules tag)

<age>or<atime>

Defines the age/accesstime of the file defined in <fileset>. The policy matches the age. The value of age can be in the [num]y[num]m[num]w[num]d[num]h format. In the command, num indicates a number.

The meanings of the letters are as follows:

* y: year (365 days in a year)

* m: month (30 days in a month)

* w: week (7 days in a week)

* d: day

* h: hour

You can use the year, month, week, day, or hour independently, or you can combine them. For example, 1y2d indicates one year and two days or 367 days.

If there is no unit (that is, the number is not followed by any letter), the default unit is day.

NOTE:

You can configure gt (greater) and lt (less) in the <age> and <atime> tags. The default operator is gt.

Example: <age operator="lt">

No (in the rule tag)

<action>

If the rule is matched, this tag defines the action to be executed.

No (in the rule tag)

<type>

Defines the action type. Currently, the supported action types are MOVE and MARK.

No (in the action tag)

<params>

Defines parameters related to each action.

No (in the action tag)

<param>

Defines a name-value format parameter that uses the <name> and <value> tags.

For MARK and MOVE, only the targettier parameter is supported. This parameter specifies the data storage policy if the age rule is met.

If multiple parameters have the same name, the first parameter value is used.

For marks, the supported targettier values are ALL_SSD, ONE_SSD, HOT, WARM, and COLD.

For MOVE, the supported targettier values are ALL_SSD, ONE_SSD, HOT, WARM, and COLD.

Yes (in the params tag)

For files or folders under the <file> tag, the FileSystem#globStatus API is used. For other files or folders, the GlobPattern class (used by GlobFilter) is used. For details, see the description of supported APIs. For example, for globStatus, /opt/hadoop/* will match everything in the /opt/hadoop folder. /opt/*/hadoop matches all hadoop folders in the subdirectories of the/opt directory.

For globStatus, the glob mode of each path component is matched. For other components, the glob mode is directly matched.

https://hadoop.apache.org/docs/r3.1.1/api/org/apache/hadoop/fs/FileSystem.html#globStatus(org.apache.hadoop.fs.Path)

Behavior Operation Example

  • MARK
    <action>
      <type>MARK</type>
      <params>
        <param>
          <name>targettier</name>
          <value>HOT</value>
        </param>
      </params>
    </action>
  • MOVE
    <action>
      <type>MOVE</type>
      <params>
        <param>
          <name>targettier</name>
          <value>HOT</value>
        </param>
        <param>
          <name>targetnodelabels</name>
          <value>SOME_EXPRESSION</value>
        </param>
      </params>
    </action>
  • SET_REPL
    <action>
      <type>SET_REPL</type>
      <params>
        <param>
          <name>replcount</name>
          <value>5</value>
        </param>
      </params>
    </action>
  • MOVE_TO_FOLDER
    <action>
      <type>MOVE_TO_FOLDER</type>
      <params>
        <param>
          <name>target</name>
          <value>path</value>
        </param>
        <param>
          <name>overwrite</name>
          <value>true</value>
        </param>
    
      </params>
    </action>
    NOTE:

    The MOVE_TO_FOLDER operation only changes the file path to the target folder and does not change the block location. If you want to move a block, you need to configure an independent move policy.

  • DELETE
    <action>
      <type>DELETE</type>
    </action>
NOTE:
  • When writing an XML file, pay attention to the configuration and sequence of behavior operations. The hot and cold data migration tool executes the rules in the sequence specified in the input XML file.
  • If you want to run only one rule based on atime/age, sort the rules in descending order of time and set the idempotent attribute to false.
  • If the delete operation is configured for a file set, other rules cannot be configured after the delete operation is performed.
  • The -fs option can be used to specify the default file system address of the client.

Audit Logs

The cold and hot data migration tool supports audit logs of the following operations:

  • Tool startup status
  • Behavior type, parameter details, and status
  • Tool completion status

To enable the audit log tool, add the following attributes to the <HADOOP_CONF_DIR>/log4j.property file:

autodatatool.logger=INFO, ADMTRFA
autodatatool.log.file=HDFSAutoDataMovementTool.audit
log4j.logger.com.xxx.hadoop.hdfs.datamovement.HDFSAutoDataMovementTool.audit=${autodatatool.logger}
log4j.additivity.com.xxx.hadoop.hdfs.datamovement.HDFSAutoDataMovementTool-audit=false
log4j.appender.ADMTRFA=org.apache.log4j.RollingFileAppender
log4j.appender.ADMTRFA.File=${hadoop.log.dir}/${autodatatool.log.file}
log4j.appender.ADMTRFA.layout=org.apache.log4j.PatternLayout
log4j.appender.ADMTRFA.layout.ConversionPattern=%d{ISO8601} %p %c: %m%n
log4j.appender.ADMTRFA.MaxBackupIndex=10
log4j.appender.ADMTRFA.MaxFileSize=64MB
NOTE:

For details, see the <HADOOP_CONF_DIR>/log4j_autodata_movment_template.properties file.

We use cookies to improve our site and your experience. By continuing to browse our site you accept our cookie policy. Find out more

Feedback

Feedback

Feedback

0/500

Selected Content

Submit selected content with the feedback