Updated on 2024-04-03 GMT+08:00

How Migration Jobs Work

Data Migration Model

Figure 1 shows the simplified migration model used by CDM.

Figure 1 Migration model used by CDM
CDM migrates data through data migration jobs. It works in the following way:
  1. When data migration jobs are submitted, CDM splits each job into multiple tasks based on the Concurrent Extractors parameter in the job configuration.

    Jobs for different data sources may be split based on different dimensions. Some jobs may not be split based on the Concurrent Extractors parameter.

  2. CDM submits the tasks to the running pool in sequence. Tasks (defined by Maximum Concurrent Extractors) run concurrently. Excess tasks are queued.

Factors Affecting Migration Performance

According to the migration model, the migration speed is affected by factors such as the source read speed, network bandwidth, destination write performance, and CDM cluster and job configuration.

Table 1 Factors affecting migration performance

Factor

Description

Service-related factors

Concurrent extractors of a job

The number of concurrent extractors can be set for a CDM job during the job creation.

Setting a proper value for this parameter can effectively improve the migration speed. If the value is too small, migration will be too slow. If the value is too large, the migration job is overloaded and may fail.

  • When data is to be migrated to files, CDM does not support multiple concurrent tasks. In this case, set a single process to extract data.
  • If each row of the table contains less than or equal to 1 MB data, data can be extracted concurrently. If each row contains more than 1 MB data, it is recommended that data be extracted in a single thread.

Maximum concurrent extractors of a cluster

Setting a proper value for this parameter can effectively improve the migration speed. If the value is too small, migration will be too slow. If the value is too large, the source is overloaded and the system may be unstable.

The maximum concurrent extractors vary depending on the CDM cluster flavor. The upper limit is twice the number of vCPUs. The following are the maximum concurrent extractors of some flavors:

  • cdm.large: 16
  • cdm.xlarge: 32
  • cdm.4xlarge: 128

Service model

If the number of CDM jobs that run concurrently exceeds the maximum concurrent extractors for the CDM cluster, some jobs will be queued, and the migration will be prolonged.

Avoid running too many jobs simultaneously, which may cause slow migration due to insufficient resources.

Data model

The migration speed is also affected by the data structure. The following are some examples:

  • The wider a table is and the more string types the table has, the slower the migration is.
  • A large file is migrated more quickly than multiple small files whose total size is the same as the large file.
  • The more content a message has and the higher bandwidth it uses, the less transactions per second (TPS) are.

Source read speed

It depends on the performance of the data source at the source.

For details about how to increase the read speed, see the documents of data sources at the source.

Network bandwidth

The CDM cluster can communicate with the data source through an intranet, public network VPN, NAT, or Direct Connect.

  • If they communicate through an intranet, the network bandwidth varies depending on the CDM instance flavor.
    • For cdm.large instances, the baseline and maximum bandwidths of the CDM cluster NIC are 0.8 and 3 Gbit/s, respectively.
    • For cdm.xlarge instances, the baseline and maximum bandwidths of the CDM cluster NIC are 4 and 10 Gbit/s, respectively.
    • For cdm.4xlarge instances, the baseline and maximum bandwidths of the CDM cluster NIC are 36 and 40 Gbit/s, respectively.
  • If they communicate through the Internet, the network bandwidth is subject to the Internet bandwidth. The bandwidth for the CDM cluster depends on the EIP bound to the CDM cluster, and the bandwidth for the data source depends on the Internet bandwidth.
  • If they communicate through a VPN, NAT, or Direct Connect, the network bandwidth is subject to the VPN, NAT, or Direct Connect bandwidth.

Destination write performance

It depends on the performance of the data source at the destination.

For details about how to improve the performance, see the documents of data sources at the destination.