Help Center> DataArts Studio> Best Practices> Comparing Data Before and After Data Migration Using DataArts Quality
Updated on 2024-04-29 GMT+08:00

Comparing Data Before and After Data Migration Using DataArts Quality

Data comparison checks data consistency before and after migration or processing.

This section describes how to use the DataArts Quality module of DataArts Studio to check consistency of the data before and after it is migrated from DWS to an MRS Hive partitioned table.

Prerequisites

  • You have created a DWS cluster, which can communicate with the DataArts Studio instance. You have the permission to access the KMS key.
  • You have created an MRS cluster which can communicate with the DataArts Studio instance.
  • You have created a CDM cluster. For details, see buyBuying a DataArts Studio Incremental Package.

Creating a Data Migration Link

  1. Log in to the DataArts Studio console, locate a workspace, and click DataArts Migration.
  2. On the Cluster Management page, locate the prepared CDM cluster and click Job Management in the Operation column.

    Figure 1 Job Management page

  3. Click the Links tab and Create Link to create a DWS link. For details about the parameters, see Link to DWS.

    Figure 2 Creating a DWS link

  4. Create an MRS Hive link. or details about the parameters, see Link to Hive.

    Figure 3 Creating an MRS Hive link

Creating and Executing a Data Migration Job

  1. Log in to the DataArts Studio console, locate a workspace, and click DataArts Migration.
  2. On the Cluster Management page, locate the prepared CDM cluster and click Job Management in the Operation column.
  3. Click the Table/File Migration tab and then Create Job to create a data migration job.
  4. Select the DWS link for Source Link Name and MRS Hive link for Destination Link Name, and set required parameters. For details about the parameters, see From DWS and To Hive.

    Figure 4 Job configuration

  5. Configure the field mapping and task and click Save and Execute to execute the CDM job.
  6. On the Table/File Migration job list, view the job status.

    Figure 5 Viewing the job status

Creating a Data Connection

  1. Log in to the DataArts Studio console, locate a workspace, and click Management Center.
  2. Click Create Data Connection to create a DWS data connection. For details about the parameters, see Creating a DWS Connection.

    Figure 6 Creating a DWS data connection

  3. Create an MRS Hive data connection. For details about the parameters, see Creating an MRS Hive Connection.

    Figure 7 Creating an MRS Hive data connection

Creating a Comparison Job

  1. Log in to the DataArts Studio console, locate a workspace, and click DataArts Quality.
  2. In the left navigation pane, choose Quality Monitoring > Comparison Jobs.
  3. Click Create. On the Create Comparison Job page, set basic information.

    Figure 8 Setting basic information for the comparison job

  4. Click Next to go to the Define Rule page. Click , configure the comparison rule, select the data tables before and after the migration, and configure the alarm rule.

    Figure 9 Configuring the comparison rule

    • Configure the source and destination information separately.
    • When configuring Alarm Condition, ${1_1} indicates the number of rows in the source table, and ${2_1} indicates the number of rows in the destination table. In the preceding figure, the alarm condition ${1_1}!=${2_1} indicates that an alarm is generated when the number of rows in the source table is inconsistent with that in the destination table.

  5. Click Next and set subscription information.

    Figure 10 Setting subscription information

    If you enable notification, Alarm triggered indicates that a notification is sent to the SMN topic when an alarm is generated for the job, and Run successfully indicates that a notification is sent to the SMN topic when no alarm is generated for the job.

  6. Click Next and set scheduling parameters.

    Figure 11 Setting scheduling parameters

    Once indicates that the job needs to be manually executed, and On schedule indicates that the job is executed automatically based on your configuration. The configuration in the preceding figure indicates that the job is automatically executed every 15 minutes.

  7. Click Submit to create the comparison job.

Executing the Comparison Job and Viewing the Result Analysis

  1. In the left navigation pane, choose Quality Monitoring > Comparison Jobs.
  2. Locate the created comparison job and click Run in the Operation column.

    Figure 12 Running the comparison job

  3. In the left navigation pane, choose Quality Monitoring > O&M.

    Figure 13 O&M page

  4. After the job is executed, click Details in the Operation column. If the source and destination tables have the same number of rows, the migration is successful.

    Figure 14 Viewing the running result
    • In the running result, the left pane displays the execution result of the rule for source table rows, and the right pane displays the execution result of the rule for destination table rows.
    • The error rate indicates the difference between the number of rows of the source and destination tables. If the error rate is 0, the source and destination tables have the same number of rows.