Help Center> MapReduce Service> Best Practices> Data Migration> Migrating Data from Hive to MRS
Updated on 2024-03-14 GMT+08:00

Migrating Data from Hive to MRS

Scenario

This section describes how to migrate data from offline IDCs or public cloud Hive clusters to Huawei Cloud MRS. The data volume can be tens of TBs or less. This section uses Huawei Cloud CDM 2.9.1.200 as an example to describe how to migrate data.

Hive data migration consists of two parts:

  • Hive metadata, which is stored in the databases such as MySQL. By default, the metadata of the MRS Hive cluster is stored in MRS DBService (Huawei GaussDB database). You can also use RDS (MySQL) as the external metadata database.
  • Hive service data, which is stored in HDFS or OBS

You can use the scenario migration function of Huawei Cloud CDM to migrate Hive data with one click.

For details about the data sources supported by CDM, see Supported Data Sources. If the data source is Apache Hive, the recommended version is 1.2.X or 3.1.X. Version 2.x is not supported. Before performing the migration, ensure that the data source supports migration.

Figure 1 Hive data migration

Solution Advantages

Scenario-based migration migrates snapshots and then restores table data to speed up migration.

Impact on the System

Migrating large volumes of data has high requirements on network communication. When a migration task is executed, other services may be affected. You are advised to migrate data during off-peak hours.

Procedure

  1. Log in to the CDM management console.
  2. Create a CDM cluster. The security group, VPC, and subnet of the CDM cluster must be the same as those of the destination cluster to ensure that the CDM cluster can communicate with the MRS cluster.
  3. On the Cluster Management page, locate the row containing the desired cluster and click Job Management in the Operation column.
  4. On the Links tab page, click Create Link.
  5. Create links to the source and destination clusters by referring to Creating Links. Select MRS Hive as the connector type.

    Set the connector type based on the actual cluster. For an MRS cluster, select MRS Hive. For a self-built cluster, select Apache Hive.

    Figure 2 Hive link

  6. Create a storage database after data migration in the destination cluster.
  7. Choose Job Management and click the Table/File Migration tab. Then, click Create Job.
  8. In the job configuration dialog box that is displayed, configure the job name, select the data links created in 5 as the source link and destination link, select the names of the database and table to be migrated, and click Next.

    Figure 3 Hive job configuration

  9. Configure the mapping between the source fields and destination fields and click Next.
  10. On the task configuration page that is displayed, click Save without any modification.
  11. Choose Job Management and click Table/File Migration. Locate the row containing the job to run and click Run in the Operation column to start migrating Hive data.
  12. After the migration is complete, you can run the same query statement in the source and destination clusters to compare the query results.

    For example, query the number of records in the catalog_sales table in the destination cluster and source cluster to check whether the number of data records is the same.

    select count(*) from catalog_sales;
    Figure 4 Data records of the source cluster
    Figure 5 Data records of the destination cluster

  13. (Optional) If new data in the source cluster needs to be periodically migrated to the destination cluster, perform the migration based on the data adding mode. Configure a scheduled task to migrate incremental data until all services are migrated to the destination cluster.

    • If no table is added or deleted and the data structure of the existing table is not modified and only the Hive table data is modified: you only need to migrate the files stored on HDFS or OBS. For details about how to migrate data, see the description about the new data migration method in Migrating Data from Hadoop to MRS.
    • If a Hive table is added, choose Job Management and click the Table/File Migration tab. Click Edit in the Operation column of the Hive migration job and select the new data table for data migration.
    • If a Hive table is deleted or the data structure of an existing table is modified, manually delete the table from the destination cluster or manually update the table structure.