Compute
Elastic Cloud Server
Huawei Cloud Flexus
Bare Metal Server
Auto Scaling
Image Management Service
Dedicated Host
FunctionGraph
Cloud Phone Host
Huawei Cloud EulerOS
Networking
Virtual Private Cloud
Elastic IP
Elastic Load Balance
NAT Gateway
Direct Connect
Virtual Private Network
VPC Endpoint
Cloud Connect
Enterprise Router
Enterprise Switch
Global Accelerator
Management & Governance
Cloud Eye
Identity and Access Management
Cloud Trace Service
Resource Formation Service
Tag Management Service
Log Tank Service
Config
OneAccess
Resource Access Manager
Simple Message Notification
Application Performance Management
Application Operations Management
Organizations
Optimization Advisor
IAM Identity Center
Cloud Operations Center
Resource Governance Center
Migration
Server Migration Service
Object Storage Migration Service
Cloud Data Migration
Migration Center
Cloud Ecosystem
KooGallery
Partner Center
User Support
My Account
Billing Center
Cost Center
Resource Center
Enterprise Management
Service Tickets
HUAWEI CLOUD (International) FAQs
ICP Filing
Support Plans
My Credentials
Customer Operation Capabilities
Partner Support Plans
Professional Services
Analytics
MapReduce Service
Data Lake Insight
CloudTable Service
Cloud Search Service
Data Lake Visualization
Data Ingestion Service
GaussDB(DWS)
DataArts Studio
Data Lake Factory
DataArts Lake Formation
IoT
IoT Device Access
Others
Product Pricing Details
System Permissions
Console Quick Start
Common FAQs
Instructions for Associating with a HUAWEI CLOUD Partner
Message Center
Security & Compliance
Security Technologies and Applications
Web Application Firewall
Host Security Service
Cloud Firewall
SecMaster
Anti-DDoS Service
Data Encryption Workshop
Database Security Service
Cloud Bastion Host
Data Security Center
Cloud Certificate Manager
Edge Security
Situation Awareness
Managed Threat Detection
Blockchain
Blockchain Service
Web3 Node Engine Service
Media Services
Media Processing Center
Video On Demand
Live
SparkRTC
MetaStudio
Storage
Object Storage Service
Elastic Volume Service
Cloud Backup and Recovery
Storage Disaster Recovery Service
Scalable File Service Turbo
Scalable File Service
Volume Backup Service
Cloud Server Backup Service
Data Express Service
Dedicated Distributed Storage Service
Containers
Cloud Container Engine
SoftWare Repository for Container
Application Service Mesh
Ubiquitous Cloud Native Service
Cloud Container Instance
Databases
Relational Database Service
Document Database Service
Data Admin Service
Data Replication Service
GeminiDB
GaussDB
Distributed Database Middleware
Database and Application Migration UGO
TaurusDB
Middleware
Distributed Cache Service
API Gateway
Distributed Message Service for Kafka
Distributed Message Service for RabbitMQ
Distributed Message Service for RocketMQ
Cloud Service Engine
Multi-Site High Availability Service
EventGrid
Dedicated Cloud
Dedicated Computing Cluster
Business Applications
Workspace
ROMA Connect
Message & SMS
Domain Name Service
Edge Data Center Management
Meeting
AI
Face Recognition Service
Graph Engine Service
Content Moderation
Image Recognition
Optical Character Recognition
ModelArts
ImageSearch
Conversational Bot Service
Speech Interaction Service
Huawei HiLens
Video Intelligent Analysis Service
Developer Tools
SDK Developer Guide
API Request Signing Guide
Terraform
Koo Command Line Interface
Content Delivery & Edge Computing
Content Delivery Network
Intelligent EdgeFabric
CloudPond
Intelligent EdgeCloud
Solutions
SAP Cloud
High Performance Computing
Developer Services
ServiceStage
CodeArts
CodeArts PerfTest
CodeArts Req
CodeArts Pipeline
CodeArts Build
CodeArts Deploy
CodeArts Artifact
CodeArts TestPlan
CodeArts Check
CodeArts Repo
Cloud Application Engine
MacroVerse aPaaS
KooMessage
KooPhone
KooDrive
On this page

Using PatchData

Updated on 2024-08-30 GMT+08:00

Scenario

In the migration of a project, if you want to supplement historical business data in a previous period and view details of the historical data, PatchData can meet your requirements.

A job executes a scheduling task to generate a series of instances in a certain period of time. This series of instances are called PatchData. PatchData can be used to fix the job instances that have data errors in the historical records or to build job records for debugging programs.

NOTE:
  • In addition to SQL scripts, PatchData supports other nodes.
  • If the content of a SQL script changes, the PatchData job runs the latest script.
  • When you use PatchData, if the variable in the SQL statement is DATE, enter ${DATE} in the script. The script parameter DATE is then automatically added to the job parameters, and its value can be an EL expression. If the variable is a time variable, enter the expression of the DateUtil embedded object. The platform automatically converts the expression into a historical date. For details about how to use EL expressions, see EL Expressions.
  • PatchData jobs support script parameters and global environment variables as well as job parameters.

Constraints

PatchData is available only when periodic scheduling is configured for the data development job.

Example

Scenario

Among the product data tables of a company, there is a source data table A that records the product sales amount. To import the historical product sales amount to the destination table B, you can create a PatchData job.

Table 1 lists the source and destination tables.
Table 1 Source and destination tables

Source Table

Destination Table

A

B

Procedure

  1. Prepare the source and destination tables. To facilitate subsequent job execution and verification, you need to create a source DWS table and a destination DWS table and insert data into the tables.
    1. Create a DWS table. You can create a DWS SQL script on the DataArts Factory console of DataArts Studio and run the following SQL statements:
      /* Create tables. */
      CREATE TABLE A (PRODUCT_ID INT, SALES INT, DATE DATE);
      CREATE TABLE B (PRODUCT_ID INT, SALES INT, DATE DATE);
    2. Insert sample data into the source data table. You can create a DWS SQL script on the DataArts Factory console of DataArts Studio and run the following SQL statements:
      /* Insert sample historical data into the source table. */
      INSERT INTO A VALUES ('1','60', '2022-03-01');
      INSERT INTO A VALUES ('2','80', '2022-03-01');
      INSERT INTO A VALUES ('1','50', '2022-02-28');
      INSERT INTO A VALUES ('2','55', '2022-02-28');
      INSERT INTO A VALUES ('1','60', '2022-02-27');
      INSERT INTO A VALUES ('2','45', '2022-02-27');
  2. Develop a PatchData script. Ensure that the script expression contains a time variable. (For example, if the variable in the SQL statement is DATE, enter ${DATE} in the script.) You can set the expression for script parameter DATE in job parameter settings in 3.
    On the Develop Script page, enter following statement in the editor:
    INSERT INTO B (SELECT * FROM A WHERE DATE = ${DATE})
    Figure 1 Developing a script

    After compiling the script, save it and submit the latest version.

  3. Develop a PatchData batch processing job. When developing the job, you need to configure the node attributes and scheduling period.

    In the left navigation pane of the DataArts Factory console, choose Data Development > Develop Job.

    Figure 2 Node parameters

    NOTE:
    • If the job-associated SQL script uses a parameter, the parameter name (such as DATE) is displayed. Set the parameter value in the text box next to the parameter name. The parameter value can be an EL expression. For details about EL expressions, see Expression Overview.

      If the parameter is time, view the example expression of the DateUtil embedded object. The platform automatically replaces the parameter with the historical date of the patch data (determined by the service date of the patch data).

      You can also directly enter a SQL expression.

    • If the parameters of the associated SQL script change, you can click to synchronize the change or click to edit the parameters.
    • The following is an example of script parameters:
      Example: #{DateUtil.format(DateUtil.addDays(Job.planTime,-1),'yyyy-MM-dd')}
      • Job.planTime indicates the planned job time, and yyyy-MM-dd indicates the time format.
      • If the planned job time is March 2, the previous day is March 1. The planned job time will be replaced by the configured patch data service date.
      • The Job.planTime is converted into a time in the yyyy-MM-dd format using an expression.

    Configure the scheduling period of the PatchData job. Click Scheduling Setup and set Scheduling Frequency to Every day.

    Figure 3 Configuring the scheduling period

    NOTE:
    • If Scheduling Frequency is set to Every day, the job is scheduled every day, and a PatchData instance is generated. You can view the statuses of PatchData instances on the Monitor Instance page. On the Monitor Instance page, view the instance information about the job and perform more operations on instances as required.
    • The job scheduling time takes effect from March 9, 2023, and the job is scheduled at 02:00 every day.
    • Run the following SQL statement to check whether destination table B contains data of source table A:

      SELECT * FROM B

    After configuring the parameters, save and submit the latest version of the job and test the job.

    Click Execute to run the job.

  4. Create a PatchData task.
    After creating a periodic job, you need to configure PatchData for the job.
    1. In the left navigation pane of DataArts Factory, choose Monitoring > Job Monitoring.
    2. Click the Batch Job Monitoring tab. In the Operation column of the job, choose More > Configure PatchData. The Configure PatchData page is displayed.
      If you want to supplement historical data from February 27, 2023 to March 1, 2023, set Date to Feb 28, 2023 00:00:00 – Mar 02, 2023 23:59:59. The system automatically transfers the configured date to the planned job time. In the expression of the script time variable DATE, the defined time is the planned job time minus one day. That is, the time of the day before the planned job time is the time range (Feb 27, 2023 to Mar 1, 2023) for PatchData.
      Figure 4 Configuring PatchData

      Table 2 Description

      Parameter

      Description

      PatchData Name

      Name of the automatically generated PatchData task. The value can be modified.

      Job Name

      Name of the job that requires PatchData, which is automatically displayed

      Date

      Period of time when PatchData is required. This date is transferred to the planned job time. When the job is executed, the planned job time is replaced by the time in the PatchData.

      NOTE:

      PatchData can be configured for a job multiple times. However, avoid configuring PatchData multiple times on the same date to prevent data duplication or disorder.

      If you select Patch data in reverse order of date, the patch data of each day is in positive sequence.

      NOTE:
      • This function is applicable when the data of each day is not coupled with each other.
      • The PatchData job will ignore the dependencies between the job instances created before this date.

      Parallel Periods

      Number of instances to be executed at the same time. A maximum of five instances can be executed at the same time.

      NOTE:

      Set this parameter based on the site requirements. For example, if a CDM job instance is used, data cannot be supplemented at the same time. The value of this parameter can only be set to 1.

      Upstream or Downstream Job

      This parameter is optional. Select the downstream jobs (jobs that depend on the current job) that require PatchData. You can select multiple jobs.

    3. Click OK. The system starts to run the PatchData task based on the configured scheduling period.
    4. On the Monitor PatchData page, you can view the PatchData task status, date, number of parallel periods, PatchData job name, and stopped tasks. You can also view logs of the PatchData task.
      Figure 5 Querying PatchData details

    5. Run the following SQL statement to check whether destination table B contains historical data of source table A:
      SELECT * FROM B

We use cookies to improve our site and your experience. By continuing to browse our site you accept our cookie policy. Find out more

Feedback

Feedback

Feedback

0/500

Selected Content

Submit selected content with the feedback