Help Center > > User Guide> Data Migration (Scenario Edition)> Migrating Data from Hive to MRS

Migrating Data from Hive to MRS

Updated at: Mar 25, 2021 GMT+08:00

Scenarios

This section describes how to migrate data from the offline IDC equipment room or public cloud Hive cluster to MRS on HUAWEI CLOUD. The data volume can be tens of TBs or less. This section uses HUAWEI CLOUD CDM as an example to describe how to migrate data.

Hive data migration consists of two parts:

  • Hive metadata, which is stored in the databases such as MySQL. By default, the metadata of the MRS Hive cluster is stored in MRS DBService (Huawei GaussDB database). You can also use RDS (MySQL) as the external metadata database.
  • Hive service data, which is stored in HDFS or OBS

You can use the "Scenario Migration" function of CDM on HUAWEI CLOUD to migrate Hive data in one-click mode.

Figure 1 Hive data migration

Solution Advantages

Scenario-based migration migrates snapshots and then restores table data to speed up migration.

Procedure

  1. Log in to the CDM management console.
  2. Create a CDM cluster. The security group, VPC, and subnet of the CDM cluster must be the same as those of the destination cluster to ensure that the CDM cluster can communicate with the MRS cluster.
  3. On the Cluster Management page, locate the row where the target cluster resides and click Job Management in the Operation column.
  4. On the Link Management tab page, click Create Link.
  5. Add links to the source and destination clusters by referring to Creating Links. Select Hadoop release version as the connector type.

    Set the link type based on the actual cluster. For an MRS cluster, set the Hadoop type to MRS. For a self-built cluster, set the Hadoop type to Apache Hadoop.

    Figure 2 Hive link

  6. Create a storage database after data migration in the destination cluster.
  7. Choose Job Management > Scenario Migration, and click Create Job.
  8. The job parameter configuration page is displayed. Set the job name and set the migration scenario to Hive migration, and click Next.
  9. Select the corresponding data links created in 5 for the source link and the destination link. Select the database to be migrated, and click Next.

    Figure 3 Hive job configuration

  10. Select the data table to be migrated and click Next.
  11. On the task configuration page that is displayed, click Save without any modification.
  12. Choose Job Management > Scenario Migration and click Run in the Operation column of the job to be executed to start Hive data migration.
  13. After the migration is complete, you can run the same query statement in the source and destination clusters to compare the query results.

    For example, query the number of records in the catalog_sales table in the destination cluster and source cluster to check whether the number of data records is consistent.

    select count(*) from catalog_sales;
    Figure 4 Data records of the source cluster
    Figure 5 Data records of the destination cluster

  14. (Optional) If new data in the source cluster needs to be periodically migrated to the destination cluster, perform the migration based on the data adding mode. Configure a scheduled task to migrate incremental data until all services are migrated to the destination cluster.

    • If no table is added or deleted and the data structure of the existing table is not modified and only the Hive table data is modified: you only need to migrate the files stored on HDFS or OBS. For details about how to migrate data, see the description about the new data migration method in Migrating Data from Hadoop to MRS.
    • If new Hive table is added, choose Job Management > Scenario Migration. Click Edit in the Operation column of the Hive migration job, and select the new data table for data migration.
    • If a Hive table is deleted or the data structure of an existing table is modified, manually delete the table from the destination cluster or manually update the table structure.

Did you find this page helpful?

Submit successfully!

Thank you for your feedback. Your feedback helps make our documentation better.

Failed to submit the feedback. Please try again later.

Which of the following issues have you encountered?







Please complete at least one feedback item.

Content most length 200 character

Content is empty.

OK Cancel