Overview of Offline Jobs

Background

Cloud Data Migration (CDM) is Huawei Cloud's previous-generation data integration service. It provides stable data migration and synchronization. However, as data development scenarios become increasingly complex and the scheduling scale continues to grow, CDM jobs have the following pain points due to coupled management and execution:

CDM is isolated from DataArts Factory and DataArts Quality. Processes need to be manually assembled.
A single cluster has an obvious bottleneck when processing a large number of concurrent requests. As a result, queues may be stacked. Cross-cluster execution is not supported.
Read and write links cannot be flexibly combined.
New functions are unavailable in the old architecture, such as job scheduling agencies and data encryption and decryption.

The new DataArts Migration (offline jobs) addresses these issues as jobs are managed in the data development engine. CDM clusters only function as computing resource pools that can be scaled out.

Core Advantages of Offline Jobs

**Table 1** Comparison between CDM jobs and DataArts Migration offline jobs
Dimension	CDM Job	DataArts Migration Offline Job
Process orchestration	Independent tasks which need to be manually connected	Drag-and-drop operator that runs on the same canvas as data development operators
Function evolution	Functions will no longer be updated.	New functions are available to offline data migration jobs first.
Scheduling mode	Single-cluster queuing	Hybrid scheduling across CDM nodes
Read/Write policy	Fixed pairs of read and write links	Decoupled read and write, with flexible source-destination pairs

This function is in OBT (or restricted use). To use this function, submit a service ticket.

How It Works

Management plane:
Job metadata, scheduling dependencies, parameter variables, and scheduling identities are hosted in DataArts Studio.
Execution plane:
1. The directed acyclic graph (DAG) of a job is parsed to generate executable CDM subtasks.
2. The CDM subtasks are randomly distributed to CDM clusters for execution.
3. Resources are released immediately after subtasks are complete, and logs and task monitoring metrics are sent back to the O&M center.

Constraints

To use DataArts Migration (offline jobs), ensure that the CDM-Instance cluster version is 24.4.x (2.10.0.400) or later.

Compatibility: To ensure the scheduling stability, data transmission performance, and compatibility of new features for offline jobs, ensure that the version of the associated CDM cluster has been upgraded to 24.4.x or later.

Risks: If the cluster version is earlier than 2.10.0.400, APIs may not match or functions may be limited. You are advised to check and upgrade the cluster version before enabling data migration capabilities.

Functions

DataArts Migration (offline jobs) can synchronize data between various types of on-premises data sources in a wide range of scenarios. You can synchronize all or incremental data as needed.

Figure 1 How an offline processing migration job works
Click to enlarge

Synchronization Scenarios

DataArts Migration (offline jobs) supports synchronization scenarios of multiple topology types. You can plan synchronization based on your requirements.

Single table synchronization
A table in an instance can be synchronized to another instance.

Figure 2 Single table synchronization
Entire database synchronization
Multiple tables of multiple databases in an instance can be synchronized to multiple databases in another instance.

Figure 3 Entire database synchronization
Database and table shard synchronization
Multiple table shards of multiple databases in multiple instances can be synchronized to a database table in another instance.
Figure 4 Database and table shard synchronization