Overview of Offline Jobs
Background
Cloud Data Migration (CDM) is Huawei Cloud's previous-generation data integration service. It provides stable data migration and synchronization. However, as data development scenarios become increasingly complex and the scheduling scale continues to grow, CDM jobs have the following pain points due to coupled management and execution:
- CDM is isolated from DataArts Factory and DataArts Quality. Processes need to be manually assembled.
- A single cluster has an obvious bottleneck when processing a large number of concurrent requests. As a result, queues may be stacked. Cross-cluster execution is not supported.
- Read and write links cannot be flexibly combined.
- New functions are unavailable in the old architecture, such as job scheduling agencies and data encryption and decryption.
The new DataArts Migration (offline jobs) addresses these issues as jobs are managed in the data development engine. CDM clusters only function as computing resource pools that can be scaled out.
Core Advantages of Offline Jobs
|
Dimension |
CDM Job |
DataArts Migration Offline Job |
|---|---|---|
|
Process orchestration |
Independent tasks which need to be manually connected |
Drag-and-drop operator that runs on the same canvas as data development operators |
|
Function evolution |
Functions will no longer be updated. |
New functions are available to offline data migration jobs first. |
|
Scheduling mode |
Single-cluster queuing |
Hybrid scheduling across CDM nodes |
|
Read/Write policy |
Fixed pairs of read and write links |
Decoupled read and write, with flexible source-destination pairs |
This function is in OBT (or restricted use). To use this function, submit a service ticket.
How It Works
- Management plane:
Job metadata, scheduling dependencies, parameter variables, and scheduling identities are hosted in DataArts Studio.
- Execution plane:
- The directed acyclic graph (DAG) of a job is parsed to generate executable CDM subtasks.
- The CDM subtasks are randomly distributed to CDM clusters for execution.
- Resources are released immediately after subtasks are complete, and logs and task monitoring metrics are sent back to the O&M center.
Functions
DataArts Migration (offline jobs) can synchronize data between various types of on-premises data sources in a wide range of scenarios. You can synchronize all or incremental data as needed.
Synchronization Scenarios
DataArts Migration (offline jobs) supports synchronization scenarios of multiple topology types. You can plan synchronization based on your requirements.
- Single table synchronization
A table in an instance can be synchronized to another instance.
Figure 2 Single table synchronization
- Entire database synchronization
Multiple tables of multiple databases in an instance can be synchronized to multiple databases in another instance.
Figure 3 Entire database synchronization
- Database and table shard synchronization
Multiple table shards of multiple databases in multiple instances can be synchronized to a database table in another instance.Figure 4 Database and table shard synchronization
Video Tutorial
The UI may vary depending on the version. This tutorial is for reference only.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot