Help Center/ DataArts Studio/ User Guide/ DataArts Factory/ Usage Guidance/ Obtaining the Maximum Value and Transferring It to a CDM Job Using a Query SQL Statement
Updated on 2024-04-03 GMT+08:00

Obtaining the Maximum Value and Transferring It to a CDM Job Using a Query SQL Statement

Scenario

You can run a query SQL statement to transfer the obtained maximum time value to a CDM job. In the advanced attributes of the CDM job, the where clause is used to determine the maximum time range to obtain the data to be migrated and complete the incremental data migration.

Constraints

  1. You have completed operations in Creating a Data Connection.
  2. You have completed operations in Creating a Database.

Examples

Creating an SQL Script

  1. In the left navigation pane of DataArts Factory, choose Data Development > Develop Script.
  2. Create an SQL script. This section uses the MRS Spark SQL script as an example.
  3. Select a created data connection and database.
  4. Compile the SQL script to obtain the maximum time data from table1.
    select max(time) from table1
  5. Save and submit the version. The maxtime script is created.

Creating a Pipeline Subjob

  1. In the left navigation pane of DataArts Factory, choose Development > Develop Job.
  2. Select a CDM Job node and configure the node properties.
    Figure 1 Configuring CDM Job node properties

    Select a CDM cluster and associate the node with an existing CDM job.

    Configure the job parameters and add job parameter maxtime.

    Figure 2 Configuring job parameters

  3. Save and submit the version. The subjob sub is created.

Creating a Pipeline Job

  1. In the left navigation pane of DataArts Factory, choose Development > Develop Job.
  2. Select an MRS Spark SQL node and a For Each node to execute the CDM subjob cyclically.
  3. Configure properties of the MRS Spark SQL node and associate the node with the created maxtime script.
    Figure 3 Configuring properties for the MRS Spark SQL node

  4. Configure properties of the For Each node and associate the node with the created CDM subjob.
    Figure 4 Configuring properties for the For Each node

    After associating the node with the created subjob sub, write a parameter expression.

    #{Loop.current[0]}

    Configure the data set, with an EL expression supported.

    #{Job.getNodeOutput("maxtime")}
  5. Save and submit the version. The job is created.

Obtaining the Maximum Time Value from the CDM Job Using a Where Clause and Transferring the Value to the Destination Job

  1. Open the created subjob.
  2. Click next to the job name to go to the job configuration page.
    Figure 5 Editing the CDM job

  3. In the advanced attributes of the source job configuration, configure a where clause to obtain the data to be migrated. When the job is executed, the migration data obtained from the source will be replicated, exported, and imported to the destination.
    Figure 6 Configuring a where clause

    The where clause is as follows:

    dt > '${maxtime}'