Updated on 2024-04-29 GMT+08:00

Using CDM to Migrate Data of the Last Month

The standby HBase cluster stores about 10 TB data, and the amount of data increased in the last month is about 60 TB. Therefore, the total amount of data is about 70 TB. Company H's 20GE private line supports the cdm.xlarge cluster of CDM. Considering the migration duration, costs, and performance, two cdm.xlarge clusters are used to perform concurrent migrations. Table 1 describes the cluster specifications.

Table 1 CDM cluster specifications

Instance Flavor

vCPUs/Memory

Maximum/Assured Bandwidth

Concurrent Extractors

Scenario

cdm.large

8 vCPUs and 16 GB memory

3/0.8 Gbit/s

16

A single table with 10 million or more than 10 million pieces of data

cdm.xlarge

16 vCPUs and 32 GB memory

10/4 Gbit/s

32

TB-level data migration requiring 10GE bandwidth

cdm.4xlarge

64 vCPUs and 128 GB memory

40/36 Gbit/s

64

-

You can use multiple CDM clusters to perform migrations concurrently to improve migration efficiency. The MRS HDFS multi-replica policy occupies network bandwidth and affects the migration efficiency.

Creating Links on HUAWEI CLOUD CDM

  1. Create two CDM clusters.

    If a DataArts Studio instance includes a CDM cluster (except the trial version) and the cluster meets your requirements, you do not need to buy a DataArts Migration incremental package.

    If you need to create another CDM cluster, buy a DataArts Studio incremental package by referring to Buying a DataArts Studio Incremental Package.

    • Select the cdm.xlarge flavor.
    • The clusters must reside in the same VPC as MRS and DirectConnect.
    • Configure other parameters as required or retain the default values.
  2. Perform the following operations to create a CDH HDFS link:
    1. In the Operation column, click Job Management.
    2. Click the Links tab and then Create Link. On the page that is displayed, select Apache HDFS.

    3. Click Next and configure the link parameters. The URI format is hdfs://NameNode IP address:Port number. If Kerberos authentication is not enabled in the CDH cluster, set Authentication Method to SIMPLE.

    4. Click Test. If a test success message is displayed in the upper right corner, the link works properly. Click Save.
  3. Perform the following operations to create an MRS HDFS link:
    1. Choose Link Management > Create Link. On the page that is displayed, select MRS HDFS.

    2. Click Next and configure the link parameters. Set Authentication Method to SIMPLE and retain the default run mode.

    3. Click Test. If a test success message is displayed in the upper right corner, the link works properly. Click Save.

Creating a Migration Job on HUAWEI CLOUD CDM

  1. On the job management page of the CDM cluster, choose Table/File Migration > Create Job to create jobs. Create a migration job for each table file directory.

    • Source Job Configuration
      • Source Link Name: Select the created CDH HDFS link.
      • Source Directory/File: Select the directory where the HBase table of the CDH cluster resides. For example, /hbase/data/default/table_20180815 indicates that all files in the table_20180815 directory will be migrated.
      • File Format: Select Binary for copying files.
    • Destination Job Configuration
      • Destination Link Name: Select the created MRS HDFS link.
      • Write Directory: Select the MRS HBase directory, for example, /hbase/data/default/table_20180815/. The directory must carry a table name (for example, table_20180815). If the directory does not exist, CDM automatically creates it.
      • File Format: Select Binary.
    • Retain the default values of other parameters.
  2. Click Next to configure the task. By default, Concurrent Extractors is 3. You can increase the number of concurrent extractors (set it to 8 in this example) to improve the migration efficiency. Retain the default values of other parameters.

  3. Repeat the preceding operations to create migration jobs for other directories. The parameter settings are the same. The number of jobs in the two CDM clusters is evenly allocated and executed concurrently.
  4. After a job is executed, you can view the detailed statistics by clicking Historical Record in the Operation column.