Help Center > > User Guide> Data Migration (Scenario Edition)> Migrating Data from Hadoop to MRS

Migrating Data from Hadoop to MRS

Updated at: Mar 25, 2021 GMT+08:00

Scenarios

This section describes how to migrate data from the offline IDC equipment room or public cloud Hadoop cluster to MRS on HUAWEI CLOUD. The data volume can be tens of TBs or less. This section uses HUAWEI CLOUD CDM as an example to describe how to migrate data.

Figure 1 Hadoop data migration

Solution Advantages

  • Easy-to-use: The wizard-based development interface frees you from programming but helps you develop migration tasks by simple configurations in minutes.
  • High migration efficiency: The performance of data migration and transmission is enhanced based on the distributed computing framework. Data write performance of specific data sources is optimized to improve data migration efficiency.
  • Real-time monitoring: During the migration, automatic real-time monitoring, alarms, and notifications can be performed.

Procedure

  1. Log in to the CDM management console.
  2. Create a CDM cluster. The security group, VPC, and subnet of the CDM cluster must be the same as those of the destination cluster to ensure that the CDM cluster can communicate with the MRS cluster.
  3. On the Cluster Management page, locate the row where the target cluster resides and click Job Management in the Operation column.
  4. On the Link Management tab page, click Create Link.
  5. Add two HDFS links to the source cluster and destination cluster, respectively. For details, see Creating Links.

    Select a link type based on the actual cluster. For an MRS cluster, select MRS HDFS. For a self-built cluster, select Apache Hadoop.

    Figure 2 Hadoop connection

  6. On the Table/File Migration tab page, click Create Job.
  7. Select the source and destination links.

    • Job Name: Enter a custom job name, which contains 1 to 256 characters consisting of letters, underscores (_), and digits.
    • Source Link Name: Select the HDFS link of the source cluster. Data is exported from this link when the job is running.
    • Destination Link Name: Select the HDFS link of the destination cluster. Data is imported to this link when the job is running.

  8. Configure source job parameters by following instructions in From HDFS. You can set Path Filter and File Filter to specify the directories and files to be migrated. For example, if Path Filter is set to test*, files in the /user/test* folder will be migrated. In this scenario, File Format is fixed to Binary.

    Figure 3 Configuring job parameters

  9. Configure destination job parameters by following instructions in To HDFS.
  10. Click Next. The task configuration page is displayed.

    • If you need to periodically migrate new data to the destination cluster, configure a scheduled task on this page. Alternatively, you can configure a scheduled task later by referring to 14.
    • If no new data needs to be migrated periodically, skip the configurations on this page and click Save.
      Figure 4 Task configuration

  11. Choose Job Management > Table/File Migration, click Run in the Operation column of the job to be executed to start HDFS file data migration. Wait until the job execution is complete.
  12. Log in to the active management node of the destination cluster.
  13. Run the hdfs dfs –ls –h /user/ command to view the migrated files in the destination cluster.
  14. (Optional) If new data in the source cluster needs to be periodically migrated to the destination cluster, configure a scheduled task for incremental data migration until all services are migrated to the destination cluster.

    1. On the Cluster Management page of the CDM console, choose Job Management > Table/File Migration.
    2. In the Operation column of the migration job, choose More > Schedule Execution.
    3. Enable the scheduled job execution function, set the execution cycle based on service requirements and the end time of the validity period to the time after all services are migrated to the new cluster.
      Figure 5 Scheduling job execution

Did you find this page helpful?

Submit successfully!

Thank you for your feedback. Your feedback helps make our documentation better.

Failed to submit the feedback. Please try again later.

Which of the following issues have you encountered?







Please complete at least one feedback item.

Content most length 200 character

Content is empty.

OK Cancel