Updated on 2024-04-03 GMT+08:00

Reference: Job Splitting Dimensions

CDM splits jobs for different data sources based on different dimensions. Table 1 lists the splitting dimensions.

Table 1 Job splitting dimensions for different data sources

Data Source Category

Data Source

Job Splitting Rule

Data warehouse

GaussDB(DWS)

  • Jobs can be split based on table fields.
  • Jobs cannot be split based on table partitions.

Data Lake Insight (DLI)

  • Jobs can be split based on the partitioning information of partitioned tables.
  • Jobs cannot be split based on non-partitioned tables.

Hadoop

MRS HDFS

Jobs can be split based on files.

MRS HBase

Jobs can be split based on HBase regions.

MRS Hive

  • When the read mode is HDFS, jobs can be split based on Hive files.
  • When the read mode is JDBC, jobs cannot be split.

FusionInsight HDFS

Jobs can be split based on files.

FusionInsight HBase

Jobs can be split based on HBase regions.

FusionInsight Hive

  • When the read mode is HDFS, jobs can be split based on Hive files.
  • When the read mode is JDBC, jobs cannot be split.

Apache HDFS

Jobs can be split based on files.

Apache HBase

Jobs can be split based on HBase regions.

Apache Hive

  • When the read mode is HDFS, jobs can be split based on Hive files.
  • When the read mode is JDBC, jobs cannot be split.

Object storage

Object Storage Service (OBS)

Jobs can be split based on files.

File system

FTP

Jobs can be split based on files.

SFTP

Jobs can be split based on files.

HTTP

Jobs can be split based on files.

Relational database

RDS for MySQL

  • Jobs can be split based on table fields.
  • Jobs can be split based on table partitions only when Extract by Partition is configured.

RDS for PostgreSQL

  • Jobs can be split based on table fields.
  • Jobs can be split based on table partitions only when Extract by Partition is configured.

RDS for SQL Server

  • Jobs can be split based on table fields.
  • Jobs can be split based on table partitions only when Extract by Partition is configured.

MySQL

  • Jobs can be split based on table fields.
  • Jobs can be split based on table partitions only when Extract by Partition is configured.

PostgreSQL

  • Jobs can be split based on table fields.
  • Jobs can be split based on table partitions only when Extract by Partition is configured.

Microsoft SQL Server

  • Jobs can be split based on table fields.
  • Jobs cannot be split based on table partitions.

Oracle

  • Jobs can be split based on table fields.
  • Jobs can be split based on table partitions only when Extract by Partition is configured.

SAP HANA

  • Jobs can be split based on table fields.
  • Jobs cannot be split based on table partitions.

Database shard

Each backend connects to a subjob, which can be split based on primary keys.

NoSQL

Distributed Cache Service (DCS)

Jobs cannot be split.

Redis

Jobs cannot be split.

Document Database Service (DDS)

Jobs cannot be split.

MongoDB

Jobs cannot be split.

Cassandra

Jobs can be split based on the token range of Cassandra.

Message system

Data Ingestion Service (DIS)

Jobs can be split based on topics.

Apache Kafka

Jobs can be split based on topics.

DMS Kafka

Jobs can be split based on topics.

MRS Kafka

Jobs can be split based on topics.

Search

Elasticsearch

Jobs cannot be split.

Cloud Search Service (CSS)

Jobs cannot be split.