Help Center> Cloud Data Migration> User Guide> Migration Scenarios> Incremental Migration> Incremental Migration on CDM Supported by DLF

Incremental Migration on CDM Supported by DLF

CDM supports incremental migration. For details, see Incremental File Migration, Incremental Migration of Relational Databases, HBase/CloudTable Incremental Migration, and Incremental Synchronization Using the Macro Variables of Date and Time.

Data Lake Factory (DLF) is a one-stop big data collaboration development platform. With DLF's online script editing, CDM migration jobs can be scheduled to implement incremental migration.

Viewing a Job JSON File

  1. Log in to the CDM management console and create a table/file migration job for migrating data from DWS to OBS.
  2. On the Table/File Migration tab page of the Job Management page, find the job you have created, and choose More > View Job JSON in the Operation column to view the job JSON file. See Figure 1.

    You can also use other job JSON files.

    Figure 1 Viewing a job JSON file
  3. A job JSON file is the request message body template for creating a CDM job. In the URL, [Endpoint], {project_id}, and {cluster_id} must be replaced with the actual information.
    • [Endpoint]: Obtain the value from Regions and Endpoints, for example, cdm.cn-north-1.myhuaweicloud.com.
    • {project_id}: Project ID. Obtain the value from the project list on the My Credentials page
    • {cluster_id}: Cluster ID. Click the cluster name on the Cluster Management page to view the cluster ID.

Modifying a Job JSON File

You can modify the JSON body as required. The following uses 1 day as the cycle. The WHERE clause is used as the judgment condition for data extraction (generally, the time field is used as the judgment condition for incremental migration), and the data generated on the previous day is migrated every day.
  1. Modify the WHERE clause to migrate incremental data in a certain period.
         {
            "name": "fromJobConfig.whereClause",
            "value": "_timestamp >= '${startTime}' and _timestamp < '${currentTime}'"
         }
    • If the migration source is DWS or a MySQL database, the time can be determined as follows:
      _timestamp >= '2018-10-10 00:00:00' and _timestamp < '2018-10-11 00:00:00'
      Or
      _timestamp between '2018-10-10 00:00:00' and '2018-10-11 00:00:00'
    • If the migration source is an Oracle database, the WHERE clause is as follows:
      _timestamp >= to_date (2018-10-10 00:00:00' , 'yyyy-mm-dd hh24:mi:ss' ) and _timestamp < to_date (2018-10-10 00:00:00' , 'yyyy-mm-dd hh24:mi:ss' )
  2. Incremental data in each period is imported to different directories.
         {
            "name": "toJobConfig.outputDirectory",
            "value": "dws2obs/${currentTime}"
         }
  3. Dynamically generate the job name. Otherwise, the job cannot be created because the job name is duplicate.
        "to-connector-name": "obs-connector",
        "from-link-name": "dws_link",
        "name": "dws2obs-${currentTime}"
For details about how to modify more parameters, see the Cloud Data Migration API Reference. The modified JSON example is as follows:
  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
{
  "jobs": [
    {
      "job_type": "NORMAL_JOB",
      "to-config-values": {
        "configs": [
          {
            "inputs": [
              {
                "name": "toJobConfig.bucketName",
                "value": "cdm-test"
              },
              {
                "name": "toJobConfig.outputDirectory",
                "value": "dws2obs/${currentTime}"
              },
              {
                "name": "toJobConfig.outputFormat",
                "value": "CSV_FILE"
              },
              {
                "name": "toJobConfig.fieldSeparator",
                "value": ","
              },
              {
                "name": "toJobConfig.writeToTempFile",
                "value": "false"
              },
              {
                "name": "toJobConfig.validateMD5",
                "value": "false"
              },
              {
                "name": "toJobConfig.encodeType",
                "value": "UTF-8"
              },
              {
                "name": "toJobConfig.duplicateFileOpType",
                "value": "REPLACE"
              },
              {
                "name": "toJobConfig.kmsEncryption",
                "value": "false"
              }
            ],
            "name": "toJobConfig"
          }
        ]
      },
      "from-config-values": {
        "configs": [
          {
            "inputs": [
              {
                "name": "fromJobConfig.schemaName",
                "value": "dws_database"
              },
              {
                "name": "fromJobConfig.tableName",
                "value": "dws_from"
              },
              {
                "name": "fromJobConfig.whereClause",
                "value": "_timestamp >= '${startTime}' and _timestamp < '${currentTime}'"
              },
              {
                "name": "fromJobConfig.columnList",
                "value": "_tiny&_small&_int&_integer&_bigint&_float&_double&_date&_timestamp&_char&_varchar&_text"
              }
            ],
            "name": "fromJobConfig"
          }
        ]
      },
      "from-connector-name": "generic-jdbc-connector",
      "to-link-name": "obs_link",
      "driver-config-values": {
        "configs": [
          {
            "inputs": [
              {
                "name": "throttlingConfig.numExtractors",
                "value": "1"
              },
              {
                "name": "throttlingConfig.submitToCluster",
                "value": "false"
              },
              {
                "name": "throttlingConfig.numLoaders",
                "value": "1"
              },
              {
                "name": "throttlingConfig.recordDirtyData",
                "value": "false"
              },
              {
                "name": "throttlingConfig.writeToLink",
                "value": "obs_link"
              }
            ],
            "name": "throttlingConfig"
          },
          {
            "inputs": [],
            "name": "jarConfig"
          },
          {
            "inputs": [],
            "name": "schedulerConfig"
          },
          {
            "inputs": [],
            "name": "transformConfig"
          },
          {
            "inputs": [],
            "name": "smnConfig"
          },
          {
            "inputs": [],
            "name": "retryJobConfig"
          }
        ]
      },
      "to-connector-name": "obs-connector",
      "from-link-name": "dws_link",
      "name": "dws2obs-${currentTime}"
    }
  ]
}

Creating and Running a CDM Job on DLF

  1. For details about how to create a DLF job, see Creating a Job in the Data Lake Factory User Guide.
  2. After the DLF job is created, double-click the job name to go to the job development page. DLF uses the RestAPI node to call a RESTful API to create a CDM migration job.
  3. Configure the properties of the RestAPI node.
    1. Node Name: Customize a name, for example, CreatingJob. Note that the CDM migration job is only used as a node in the DLF job.
    2. URL Address: Enter the URL obtained in Viewing a Job JSON File, for example, https://cdm.cn-north-1.myhuaweicloud.com/cdm/v1.0/1551c7f6c808414d8e9f3c514a170f2e/clusters/6ec9a0a4-76be-4262-8697-e7af1fac7920/cdm/job.
    3. HTTP Method: Enter POST.
    4. Add the following request headers:
      • Content-Type = application/json
      • X-Language = en-us
    5. Request Body: Enter the modified JSON of the CDM job in Modifying a Job JSON File.
    Figure 2 Properties of the node for creating the CDM job
  4. After configuring the RestAPI node for creating a CDM job, you need to add the RestAPI node for running the CDM job. For details, see Starting a Job in the Cloud Data Migration API Reference.
    • Node Name: Customize a name, for example, StartingJob.
    • URL Address: Keep the values of project_id and cluster_id consistent with those in the RestAPI node for creating the CDM job. Set the job name to dws2obs-${currentTime}.

      For example, https://cdm.cn-north-1.myhuaweicloud.com/cdm/v1.0/1551c7f6c808414d8e9f3c514a170f2e/clusters/6ec9a0a4-76be-4262-8697-e7af1fac7920/cdm/job/dws2obs-${currentTime}/start.

    • HTTP Method: Enter PUT.
    • Add the following request headers:
      • Content-Type = application/json
      • X-Language = en-us
    Figure 3 Properties of the node for running the CDM job

Waiting Until the Job Execution Is Completed

Because the CDM job is running asynchronously, the REST request for running the job returns 200, which does not indicate that the data has been migrated successfully. If a computing job depends on the CDM migration job, a RestAPI node is required to periodically check whether the migration is successful. Computing is performed only when the migration is successful.

  1. For details about the API for querying the migration job status, see Querying Job Status in the Cloud Data Migration API Reference.
  2. After configuring the RestAPI node for running the CDM job, add the node for waiting for the CDM job completion. The node properties are as follows:
    • Node Name: Customize a name, for example, WaitingJobCompletion.
    • URL Address: For example, https://cdm.cn-north-1.myhuaweicloud.com/cdm/v1.0/1551c7f6c808414d8e9f3c514a170f2e/clusters/6ec9a0a4-76be-4262-8697-e7af1fac7920/cdm/job/dws2obs-${currentTime}/status.
    • HTTP Method: Enter GET.
    • Add the following request headers:
      • Content-Type = application/json
      • X-Language = en-us
    • Check Return Value: Select YES.
    • Property Path: Enter submissions[0].status.
    • Request Success Flag: Select SUCCEEDED.
    • Retain the default values of other parameters.

(Optional) Deleting a Job

If computing operations need to be performed after the migration, add various computing nodes to complete data computing.

You can delete the CDM jobs as required. DLF periodically creates CDM jobs to implement incremental migration. Therefore, a large number of CDM jobs are accumulated on the CDM cluster. Therefore, after the migration is successful, you can delete the jobs that have been successfully executed.

If you need to delete a CDM job, add the RestAPI node for deleting the CDM job. Then, DLF calls the API in Deleting a Job in the Cloud Data Migration API Reference.

The node properties are as follows:
  • Node Name: Customize a name, for example, DeletingJob.
  • URL Address: For example, https://cdm.cn-north-1.myhuaweicloud.com/cdm/v1.0/1551c7f6c808414d8e9f3c514a170f2e/clusters/6ec9a0a4-76be-4262-8697-e7af1fac7920/cdm/job/dws2obs-${currentTime}.
  • HTTP Method: Enter DELETE.
  • Add the following request headers:
    • Content-Type = application/json
    • X-Language = en-us
  • Retain the default values of other parameters.
Figure 4 Properties of the node for deleting the CDM job

Configuring DLF Job Parameters

  1. Configure the DLF job parameters. See Figure 5.
    • startTime = $getTaskPlanTime(plantime,@@yyyyMMddHHmmss@@,-24*60*60)
    • currentTime = $getTaskPlanTime(plantime,@@yyyyMMdd-HHmm@@,0)
    Figure 5 Configuring DLF job parameters
  2. After saving the DLF job, choose Scheduling Configuration > Schedule periodically and set the value to one day.

    In this way, DLF works with CDM to migrate data generated on the previous day.