Updated on 2024-07-19 GMT+08:00

Configuring HDFS Mover

Scenario

Mover is a new data migration tool whose working mode is similar to that of the HDFS Balancer. Mover can redistribute data in the cluster based on the configured data storage policy.

Use Mover to periodically check whether the specified HDFS file or directory in the HDFS file system meets the preset storage policy. If not, migrate data to make them meet the policy.

This section applies to MRS 3.x or later clusters.

Configuration Description

Go to the All Configurations page of HDFS and enter a parameter name in the search box by referring to Modifying Cluster Service Configuration Parameters.

Table 1 Parameter description

Parameter

Description

Default Value

dfs.mover.auto.enable

Specifies whether to enable the data replica migration function. This function supports multiple modes. The default value is false, indicating that this function is disabled.

false

dfs.mover.auto.cron.expression

Specifies the CRON expression for HDFS automatic data migration, and is used to control the start time of data migration. This parameter is valid only when dfs.mover.auto.enable is set to true. The default value is 0 * * * *, indicating that the task is executed on the hour. For details about CRON expression, see Table 2.

0 * * * *

dfs.mover.auto.hdfsfiles_or_dirs

Specifies HDFS file and directory lists that implement automatic replica migration in specified clusters. Multiple values are separated by space. This parameter is valid only when dfs.mover.auto.enable is set to true.

-

Table 2 CRON expressions

Column

Description

1

Minute. The value ranges from 0 to 59.

2

Hour. The value ranges from 0 to 23.

3

Date. The value ranges from 1 to 31.

4

Month. The value ranges from 1 to 12.

5

Week. The value ranges from 0 to 6. 0 indicates Sunday.

Use Restrictions

Run the command on the HDFS client to enable the mover function. The command format is as follows:

hdfs mover -p <Full path or directory path of an HDFS file >

Users running this command on the client must have the supergroup permission. You can use the system user hdfs of the HDFS service. Alternatively, you can create a user with the supergroup permission in the cluster and then run the command.