Updated on 2023-07-14 GMT+08:00

Scenario

Background

Company H intends to build an enterprise-class cloud management platform for its IoV service to centrally manage and deploy hardware resources and common software resources, and implement cloud-based and service-oriented transformation of IT applications. Cloud Data Migration (CDM) helps company H build the platform without code modification and data loss.

Constraints

This solution supports only data migration to MRS 1.x clusters. In MRS 2.x and later versions, HBase tables cannot be rebuilt by running HBase repair commands.

If the target cluster version is 2.x or later, the HBase repair command is no longer supported, and the HBase data directory migration cannot be implemented.

Migration Scheme

Figure 1 Migration scheme

Company H stores 854 tables (400 TB) in the Cloudera Hadoop (CDH) HBase cluster and 149 tables (about 10 TB) in the standby HBase cluster. An amount of 60 TB data is increased in the last month.

Use CDM to extract HBase HFiles from the CDH cluster and save the extracted data to MapReduce Service (MRS) HDFS, and run the HBase repair command to rebuild the HBase table. Based on this migration scheme, the following two migration modes are optional:
  1. CDM migrates data of the last month and data of the standby HBase cluster through the private line.

    CDH → CDM (HUAWEI CLOUD) → MRS

    The advantage and disadvantage of direct migration using the private line are as follows:

    • Advantage: Data does not need to be migrated multiple times, which shortens the overall migration duration.
    • Disadvantage: When a large amount of data is transmitted, the private line bandwidth is heavily occupied, which affects the concurrent services of the customer and crosses multiple switches.
  2. Use CDM to migrate historical data generated one month ago from Data Express Service (DES). The migration path is as follows:

    CDH → DES → CDM (HUAWEI CLOUD) → OBS → CDM (HUAWEI CLOUD) → MRS

    DES is well suited to the scenario where a large amount of data is to be transmitted, no private line is set up between the private cloud and HUAWEI CLOUD, and the bandwidth from the private cloud network to the public network is limited.

    • Advantage: The transmission is highly reliable without depending on the private line and network quality.
    • Disadvantage: The migration takes a long time.