- What's New
- Function Overview
- Service Overview
- Getting Started
-
User Guide
- Permissions Management
- Managing Clusters
-
Managing Links
- Supported Data Sources
- Creating Links
- Managing Drivers
- Managing Agents
- Managing Cluster Configurations
- Link to a Common Relational Database
- Link to an RDS for MySQL/MySQL Database
- Link to an Oracle Database
- Link to a Database Shard
- Link to DLI
- Link to Hive
- Link to HBase
- Link to HDFS
- Link to OBS
- Link to an FTP or SFTP Server
- Link to Redis/DCS
- Link to DDS
- Link to CloudTable
- Link to MongoDB
- Link to Cassandra
- Link to Kafka
- Link to DMS Kafka
- Link to Elasticsearch/CSS
- Managing Jobs
- Auditing
-
Tutorials
- Creating an MRS Hive Link
- Creating a MySQL Link
- Migrating Data from MySQL to MRS Hive
- Migrating Data from MySQL to OBS
- Migrating Data from MySQL to DWS
- Migrating an Entire MySQL Database to RDS
- Migrating Data from Oracle to CSS
- Migrating Data from Oracle to DWS
- Migrating Data from OBS to CSS
- Migrating Data from OBS to DLI
- Migrating Data from MRS HDFS to OBS
- Migrating the Entire Elasticsearch Database to CSS
- More Cases and Practices
-
Advanced Data Migration Guidance
- Incremental Migration
- Using Macro Variables of Date and Time
- Migration in Transaction Mode
- Encryption and Decryption During File Migration
- MD5 Verification
- Field Conversion
- Migrating Files with Specified Names
- Regular Expressions for Separating Semi-structured Text
- Recording the Time When Data Is Written to the Database
- File Formats
-
Best Practices
-
Advanced Data Migration Guidance
- Incremental Migration
- Using Macro Variables of Date and Time
- Migration in Transaction Mode
- Encryption and Decryption During File Migration
- MD5 Verification
- Field Conversion
- Migrating Files with Specified Names
- Regular Expressions for Separating Semi-structured Text
- Recording the Time When Data Is Written to the Database
- File Formats
- Scheduling a CDM Job by Transferring Parameters Using DataArts Factory
- Incremental Migration on CDM Supported by DLF
- Creating Table Migration Jobs in Batches Using CDM Nodes
- Case: Trade Data Statistics and Analysis
-
Advanced Data Migration Guidance
- Performance White Paper
- Security White Paper
-
API Reference
- Before You Start
- API Overview
- Calling APIs
- Application Example
- API
-
Public Data Structures
-
Link Parameter Description
- Link to a Relational Database
- Link to OBS
- Link to OSS on Alibaba Cloud
- Link to KODO/COS
- Link to HDFS
- Link to HBase
- Link to CloudTable
- Link to Hive
- Link to an FTP or SFTP Server
- Link to MongoDB
- Link to Redis/DCS (to Be Brought Offline)
- Link to NAS/SFS (to Be Brought Offline)
- Link to Kafka
- Link to Elasticsearch/Cloud Search Service
- Link to DLI
- Link to CloudTable OpenTSDB
- Link to Amazon S3
- Link to DMS Kafka
- Source Job Parameters
- Destination Job Parameters
- Job Parameter Description
-
Link Parameter Description
- Permissions Policies and Supported Actions
- Appendix
-
FAQs
-
General
- What Are the Differences Between CDM and Other Data Migration Services?
- What Are the Advantages of CDM?
- What Are the Security Protection Mechanisms of CDM?
- How Do I Reduce the Cost of Using CDM?
- Why Am I Billed Pay per Use When I Have Purchased a Yearly/Monthly CDM Incremental Package?
- How Do I Check the Remaining Validity Period of a Package?
- Will My Data Be Retained If My Package Expires or My Pay-per-Use Resources Are in Arrears?
- Can CDM Be Shared by Different Tenants?
- Can I Upgrade a CDM Cluster?
- How Is the Migration Performance of CDM?
- What Is the Number of Concurrent Jobs for Different CDM Cluster Versions?
-
Functions
- Does CDM Support Incremental Data Migration?
- Does CDM Support Field Conversion?
- What Component Versions Are Recommended for Migrating Hadoop Data Sources?
- What Data Formats Are Supported When the Data Source Is Hive?
- Can I Synchronize Jobs to Other Clusters?
- Can I Create Jobs in Batches?
- Can I Schedule Jobs in Batches?
- How Do I Back Up CDM Jobs?
- How Do I Configure the Connection If Only Some Nodes in the HANA Cluster Can Communicate with the CDM Cluster?
- How Do I Use Java to Invoke CDM RESTful APIs to Create Data Migration Jobs?
- How Do I Connect the On-Premises Intranet or Third-Party Private Network to CDM?
- Does CDM Support Parameters or Variables?
- How Do I Set the Number of Concurrent Extractors for a CDM Migration Job?
- Does CDM Support Real-Time Migration of Dynamic Data?
- Can I Stop CDM Clusters?
- How Do I Obtain the Current Time Using an Expression?
-
Troubleshooting
- What Should I Do If the Log Prompts that the Date Format Fails to Be Parsed?
- What Can I Do If the Map Field Tab Page Cannot Display All Columns?
- How Do I Select Distribution Columns When Using CDM to Migrate Data to DWS?
- What Do I Do If the Error Message "value too long for type character varying" Is Displayed When I Migrate Data to DWS?
- What Can I Do If Error Message "Unable to execute the SQL statement" Is Displayed When I Import Data from OBS to SQL Server?
- What Should I Do If the Cluster List Is Empty, I Have No Access Permission, or My Operation Is Denied?
- Why Is Error ORA-01555 Reported During Migration from Oracle to DWS?
- What Should I Do If the MongoDB Connection Migration Fails?
- What Should I Do If a Hive Migration Job Is Suspended for a Long Period of Time?
- What Should I Do If an Error Is Reported Because the Field Type Mapping Does Not Match During Data Migration Using CDM?
- What Should I Do If a JDBC Connection Timeout Error Is Reported During MySQL Migration?
- What Should I Do If a CDM Migration Job Fails After a Link from Hive to DWS Is Created?
- How Do I Use CDM to Export MySQL Data to an SQL File and Upload the File to an OBS Bucket?
- What Should I Do If CDM Fails to Migrate Data from OBS to DLI?
- What Should I Do If Error Message "Configuration Item [linkConfig.createBackendLinks] Does Not Exist" Is Displayed During Data Link Creation or Error Message "Configuration Item [throttlingConfig.concurrentSubJobs] Does Not Exist" Is Displayed During Job Creation?
- What Should I Do If Message "CORE_0031:Connect time out. (Cdm.0523)" Is Displayed During the Creation of an MRS Hive Link?
- What Should I Do If Message "CDM Does Not Support Auto Creation of an Empty Table with No Column" Is Displayed When I Enable Auto Table Creation?
- What Should I Do If I Cannot Obtain the Schema Name When Creating an Oracle Relational Database Migration Job?
-
General
Performance Tuning
Overview
In addition to increasing the source read speed, improving the destination write performance, and increasing the bandwidth, you can accelerate migration using the following methods:
- Use a CDM cluster of higher specifications
The NIC bandwidth and maximum number of concurrent extractors vary depending on the CDM cluster specifications. If you want to migrate data faster, or the metrics of your CDM cluster (such as the CPU usage, disk usage, and memory usage) are often high, you may need a CDM cluster with higher specifications for data migration.
- Use multiple CDM clusters
In some scenarios, you are advised to use multiple CDM clusters to share workloads to improve migration efficiency and stability. The following are some examples:
- Multiple CDM clusters are required for different purposes or by multiple business departments. For example, you may need one CDM cluster for running data migration jobs and another one as an agent for DataArts Studio Management Center.
- You want to migrate a large number of tables. In this case, you can use multiple CDM clusters to run jobs simultaneously to improve migration efficiency.
- The CPU usage, disk usage, and memory usage of the in-use CDM cluster are often high. In this case, you are advised to use multiple CDM clusters to shared workloads.
- Avoid running too many CDM jobs simultaneously
If the number of CDM jobs that run concurrently exceeds the maximum concurrent extractors for the CDM cluster, some jobs will be queued, and the migration will be prolonged.
Avoid running too many jobs simultaneously, which may cause slow migration due to insufficient resources.
- Change concurrent extractors
If the number of tasks is small, adjusting the number of concurrent extractors is the best way to improve performance. You can set the number of concurrent extractors for a job and the maximum number of concurrent extractors for a cluster.
CDM migrates data through data migration jobs. It works in the following way:- When data migration jobs are submitted, CDM splits each job into multiple tasks based on the Concurrent Extractors parameter in the job configuration.
NOTE:
Jobs for different data sources may be split based on different dimensions. Some jobs may not be split based on the Concurrent Extractors parameter.
- CDM submits the tasks to the running pool in sequence. The maximum number of tasks (defined by Maximum Concurrent Extractors) run concurrently. Excess tasks are queued.
By setting appropriate values for parameters Concurrent Extractors and Maximum Concurrent Extractors, you can accelerate migration. For details about how to change Concurrent Extractors, see Changing Concurrent Extractors.
- When data migration jobs are submitted, CDM splits each job into multiple tasks based on the Concurrent Extractors parameter in the job configuration.
Changing Concurrent Extractors
- The maximum number of concurrent extractors for a cluster varies depending on the CDM cluster flavor. You are advised to set the maximum number of concurrent extractors to twice the number of vCPUs of the CDM cluster.
Table 1 Maximum number of concurrent extractors for a CDM cluster Flavor
vCPUs/Memory
Maximum Concurrent Extractors
cdm.large
8 vCPUs, 16 GB
16
cdm.xlarge
16 vCPUs, 32 GB
32
cdm.4xlarge
32 vCPUs, 64 GB
64
Figure 1 Setting Maximum Concurrent Extractors for a CDM cluster - Configure the number of concurrent extractors based on the following rules:
- When data is to be migrated to files, CDM does not support multiple concurrent tasks. In this case, set a single process to extract data.
- If each row of the table contains less than or equal to 1 MB data, data can be extracted concurrently. If each row contains more than 1 MB data, it is recommended that data be extracted in a single thread.
- Set Concurrent Extractors for a job based on Maximum Concurrent Extractors for the cluster. It is recommended that Concurrent Extractors is less than Maximum Concurrent Extractors.
Figure 2 Setting Concurrent Extractors for a job
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.