Updated on 2023-06-21 GMT+08:00

Using PatchData

Scenario

In the migration of a project, if you want to supplement historical business data in a previous period and view details of the historical data, PatchData can meet your requirements.

A job executes a scheduling task to generate a series of instances in a certain period of time. This series of instances are called PatchData. PatchData can be used to fix the job instances that have data errors in the historical records or to build job records for debugging programs.

  • In addition to SQL scripts, PatchData supports other nodes.
  • If the content of a SQL script changes, the PatchData job runs the latest script.
  • When you use PatchData, if the variable in the SQL statement is DATE, enter ${DATE} in the script. The script parameter DATE is then automatically added to the job parameters, and its value can be an EL expression. If the variable is a time variable, enter the expression of the DateUtil embedded object. The platform automatically converts the expression into a historical date. For details about how to use EL expressions, see EL Expressions.
  • PatchData jobs support script parameters and global environment variables as well as job parameters.

Constraints

  • PatchData is available only when periodic scheduling is configured for the data development job.

Example

Scenario

Among the product data tables of a company, there is a source data table A that records the product sales amount. To import the historical product sales amount to the destination table B, you can create a PatchData job.

Table 1 lists the source and destination tables.
Table 1 Source and destination tables

Source Table

Destination Table

A

B

Procedure

  1. Prepare the source and destination tables. To facilitate subsequent job execution and verification, you need to create a source DWS table and a destination DWS table and insert data into the tables.
    1. Create a DWS table. You can create a DWS SQL script on the DataArts Factory console of DataArts Studio and run the following SQL statements:
      /* Create tables. */
      CREATE TABLE A (PRODUCT_ID INT, SALES INT, DATE DATE);
      CREATE TABLE B (PRODUCT_ID INT, SALES INT, DATE DATE);
    2. Insert sample data into the source data table. You can create a DWS SQL script on the DataArts Factory console of DataArts Studio and run the following SQL statements:
      /* Insert sample historical data into the source table. */
      INSERT INTO A VALUES ('1','60', '2022-03-01');
      INSERT INTO A VALUES ('2','80', '2022-03-01');
      INSERT INTO A VALUES ('1','50', '2022-02-28');
      INSERT INTO A VALUES ('2','55', '2022-02-28');
      INSERT INTO A VALUES ('1','60', '2022-02-27');
      INSERT INTO A VALUES ('2','45', '2022-02-27');
  2. Develop a PatchData script. Ensure that the script expression contains a time variable. (For example, if the variable in the SQL statement is DATE, enter ${DATE} in the script.) You can set the expression for script parameter DATE in job parameter settings in 3.
    On the Develop Script page, enter following statement in the editor:
    INSERT INTO B (SELECT * FROM A WHERE DATE = ${DATE})
    Figure 1 Developing a script

    After compiling the script, save it and submit the latest version.

  3. Develop a PatchData batch processing job. When developing the job, you need to configure the node attributes and scheduling period.

    In the left navigation pane of the DataArts Factory console, choose Data Development > Develop Job.

    Figure 2 Node parameters

    • If the job-associated SQL script uses a parameter, the parameter name (such as DATE) is displayed. Set the parameter value in the text box next to the parameter name. The parameter value can be an EL expression. For details about EL expressions, see Expression Overview.

      If the parameter is time, view the example expression of the DateUtil embedded object. The platform automatically replaces the parameter with the historical date of the patch data (determined by the service date of the patch data).

      You can also directly enter a SQL expression.

    • If the parameters of the associated SQL script change, you can click to synchronize the change or click to edit the parameters.
    • The following is an example of script parameters:
      Example: #{DateUtil.format(DateUtil.addDays(Job.planTime,-1),'yyyy-MM-dd')}
      • Job.planTime indicates the planned job time, and yyyy-MM-dd indicates the time format.
      • If the planned job time is March 2, the previous day is March 1. The planned job time will be replaced by the configured patch data service date.
      • The Job.planTime is converted into a time in the yyyy-MM-dd format using an expression.

    Configure the scheduling period of the PatchData job. Click Scheduling Setup and set Scheduling Frequency to Every day.

    Figure 3 Configuring the scheduling period

    • If Scheduling Frequency is set to Every day, the job is scheduled every day, and a PatchData instance is generated. You can view the statuses of PatchData instances on the Monitor Instance page. On the Monitor Instance page, view the instance information about the job and perform more operations on instances as required.
    • The job scheduling time takes effect from March 9, 2023, and the job is scheduled at 02:00 every day.
    • Run the following SQL statement to check whether destination table B contains data of source table A:

      SELECT * FROM B

    After configuring the parameters, save and submit the latest version of the job and test the job.

    Click Execute to run the job.

  4. Create a PatchData task.
    After creating a periodic job, you need to configure PatchData for the job.
    1. In the left navigation pane of DataArts Factory, choose Monitoring > Monitor Job.
    2. Click the Batch Job Monitoring tab. In the Operation column of the job, choose More > Configure PatchData. The Configure PatchData page is displayed.
      If you want to supplement historical data from February 27, 2023 to March 1, 2023, set Date to Feb 28, 2023 00:00:00 – Mar 02, 2023 23:59:59. The system automatically transfers the configured date to the planned job time. In the expression of the script time variable DATE, the defined time is the planned job time minus one day. That is, the time of the day before the planned job time is the time range (Feb 27, 2023 to Mar 1, 2023) for PatchData.
      Figure 4 Configuring PatchData

      Table 2 Description

      Parameter

      Description

      PatchData Name

      Name of the automatically generated PatchData task. The value can be modified.

      Job Name

      Name of the job that requires PatchData, which is automatically displayed

      Date

      Period of time when PatchData is required. This date is transferred to the planned job time. When the job is executed, the planned job time is replaced by the time in the PatchData.

      NOTE:

      PatchData can be configured for a job multiple times. However, avoid configuring PatchData multiple times on the same date to prevent data duplication or disorder.

      If you select Patch data in reverse order of date, the patch data of each day is in positive sequence.

      NOTE:
      • This function is applicable when the data of each day is not coupled with each other.
      • The PatchData job will ignore the dependencies between the job instances created before this date.

      Parallel Periods

      Number of instances to be executed at the same time. A maximum of five instances can be executed at the same time.

      NOTE:

      Set this parameter based on the site requirements. For example, if a CDM job instance is used, data cannot be supplemented at the same time. The value of this parameter can only be set to 1.

      Upstream or Downstream Job

      This parameter is optional. Select the downstream jobs (jobs that depend on the current job) that require PatchData. You can select multiple jobs.

    3. Click OK. The system starts to run the PatchData task based on the configured scheduling period.
    4. On the Monitor PatchData page, you can view the PatchData task status, date, number of parallel periods, PatchData job name, and stopped tasks. You can also view details about the PatchData task.
      Figure 5 Querying PatchData details

    5. Run the following SQL statement to check whether destination table B contains historical data of source table A:
      SELECT * FROM B