OpenSource ClickHouse

OpenSource ClickHouse is an open-source, high-performance, and distributed columnar database management system designed for real-time analysis of large-scale data.

DataArts Migration can migrate OpenSource ClickHouse data efficiently.

Preparation and Constraints

Network requirements
The OpenSource ClickHouse data source can communicate with CDM. This ensures smooth data transmission. For details, see Enabling Network Connectivity.
Required permissions
- Read permission: To enable database users of DataArts Migration to read data from OpenSource ClickHouse, you need to assign the users the read-only permission or at least the SELECT permission of ClickHouse.
- Write permission: To enable database users of DataArts Migration to write data to OpenSource ClickHouse, you need to assign the users the write permission or at least the INSERT, CREATE TABLE, and DELETE permissions.
Enabling ports
ClickHouse JDBC port (8123): Port 8123 allows DataArts Migration to connect to ClickHouse through JDBC.

Supported Data Types

The field types supported by different ClickHouse versions vary. The following table lists the field types supported by the open-source ClickHouse version 21.3.4.25. DataArts Migration is compatible with the following field types and their common variants so that it can correctly read and write various types of data.

Category	ClickHouse Field Type	Read	Write
Numeric	Int8	√	√
	Int16	√	√
	Int32	√	√
	Int64	√	√
	Int128	√	√
	UInt8	√	√
	UInt16	√	√
	UInt32	√	√
	UInt64	√	√
	UInt128	√	√
	Float32	√	√
	Float64	√	√
	Decimal	√	√
Character	String	√	√
Character	FixedString	√	√
Time	Date	√	√
	DateTime	√	√
	DateTime64	√	√
Boolean	Boolean	√	√
Array	Array	√	√
Tuple	Tuple	x	x
IP	IPv4	√	√
IP	IPv6	√	√
Enumeration	Enum8	√	√
Enumeration	Enum16	√	√
Nested	Nested	x	x

Supported Migration Scenarios

DataArts Migration supports the following modes for synchronizing on-premises data:

Single table synchronization
DataArts Migration supports table/file synchronization in data ingestion into a data lake or data migration to the cloud.
Database and table shard synchronization
DataArts Migration supports synchronization of data from multiple databases and tables in data ingestion into a data lake or data migration to the cloud.
Entire DB migration
DataArts Migrations supports synchronization of data from an on-premises database in data ingestion into a data lake or data migration to the cloud.

Database and table shard synchronization and entire DB migration are not supported in all regions. The following table lists the supported OpenSource ClickHouse migration scenarios.

Supported Migration Scenario	Single Table Read	Single Table Write	Database/Table Shard Read	Database/Table Shard Write	Entire DB Read	Entire DB Write
Supported	√	√	x	√	x	x

Core Capabilities

Connection configuration

Configuration Item	Supported	Description
SSL encryption	√	SSL encryption ensures secure data transmission.
Connection configuration optimization	√	Connection configuration such as connectTimeout can be optimized to improve connection performance.

Read capabilities

Configuration Item	Supported	Description
Shard concurrency	√	Horizontal sharding based on primary keys or common fields and multi-thread concurrent extraction significantly improve the throughput and efficiency.
Dirty data processing	√	Abnormal data can be written to the dirty data bucket to prevent job failures caused by a small amount of abnormal data.
Custom fields	√	You can add computed columns, constant columns, or masking functions for tasks to meet personalized service requirements.
Incremental read	√	Where conditions can be used to deliver query requests for incremental data reading.
Stream and batch reading	Batch reading	Data can be read from a large static dataset in batches and then centrally processed.

Write capabilities

Configuration Item	Supported	Description
Pre- and post-import processing	√	Operations such as preSql can clean and process data before and after data import.
Concurrent write	√	Concurrent write improves efficiency.
Optimization of the number of written rows	√	You can set the number of rows written by each request in the connection to properly control the amount of data to be transmitted. This improves performance and prevents a transmission delay or the system from being overloaded when there is a large amount of data.
Dirty data processing	x	Abnormal data cannot be written to the dirty data bucket to prevent job failures caused by a small amount of abnormal data.