Updated on 2026-01-28 GMT+08:00

Overview of Offline Jobs

Background

Cloud Data Migration (CDM) is Huawei Cloud's previous-generation data integration service. It provides stable data migration and synchronization. However, as data development scenarios become increasingly complex and the scheduling scale continues to grow, CDM jobs have the following pain points due to coupled management and execution:

  • CDM is isolated from DataArts Factory and DataArts Quality. Processes need to be manually assembled.
  • A single cluster has an obvious bottleneck when processing a large number of concurrent requests. As a result, queues may be stacked. Cross-cluster execution is not supported.
  • Read and write links cannot be flexibly combined.
  • New functions are unavailable in the old architecture, such as job scheduling agencies and data encryption and decryption.

The new DataArts Migration (offline jobs) addresses these issues as jobs are managed in the data development engine. CDM clusters only function as computing resource pools that can be scaled out.

Core Advantages of Offline Jobs

Table 1 Comparison between CDM jobs and DataArts Migration offline jobs

Dimension

CDM Job

DataArts Migration Offline Job

Process orchestration

Independent tasks which need to be manually connected

Drag-and-drop operator that runs on the same canvas as data development operators

Function evolution

Functions will no longer be updated.

New functions are available to offline data migration jobs first.

Scheduling mode

Single-cluster queuing

Hybrid scheduling across CDM nodes

Read/Write policy

Fixed pairs of read and write links

Decoupled read and write, with flexible source-destination pairs

This function is in OBT (or restricted use). To use this function, submit a service ticket.

How It Works

  • Management plane:

    Job metadata, scheduling dependencies, parameter variables, and scheduling identities are hosted in DataArts Studio.

  • Execution plane:
    1. The directed acyclic graph (DAG) of a job is parsed to generate executable CDM subtasks.
    2. The CDM subtasks are randomly distributed to CDM clusters for execution.
    3. Resources are released immediately after subtasks are complete, and logs and task monitoring metrics are sent back to the O&M center.

Functions

DataArts Migration (offline jobs) can synchronize data between various types of on-premises data sources in a wide range of scenarios. You can synchronize all or incremental data as needed.

Figure 1 How an offline processing migration job works

Synchronization Scenarios

DataArts Migration (offline jobs) supports synchronization scenarios of multiple topology types. You can plan synchronization based on your requirements.

  • Single table synchronization

    A table in an instance can be synchronized to another instance.

    Figure 2 Single table synchronization

  • Entire database synchronization

    Multiple tables of multiple databases in an instance can be synchronized to multiple databases in another instance.

    Figure 3 Entire database synchronization

  • Database and table shard synchronization
    Multiple table shards of multiple databases in multiple instances can be synchronized to a database table in another instance.
    Figure 4 Database and table shard synchronization

Video Tutorial

The UI may vary depending on the version. This tutorial is for reference only.