MRS ClickHouse
Huawei Cloud MRS ClickHouse is a columnar database oriented to online analysis and processing. It is independent of the Hadoop big data system and features compression rate and fast query performance. In addition, ClickHouse supports SQL query and provides good query performance, especially the aggregation analysis and query performance based on large and wide tables. The query speed is one order of magnitude faster than that of other analytical databases.
DataArts Migration supports Huawei Cloud MRS ClickHouse and the stable transmission and processing of MRS ClickHouse data ingested into and exported from data lakes.
Preparation and Constraints
- Network requirements
The MRS ClickHouse data source can communicate with CDM. This ensures smooth data transmission. For details, see Enabling Network Connectivity.
- Required permissions
- Read permission: Grant the read-only permission of ClickHouse to the IAM user or user group of DataArts Migration through a system policy such as MRS ReadOnlyAccess. You can also create a custom policy to grant read permissions such as SELECT.
- Write permission: Grant the write permission of ClickHouse to the IAM user or user group of DataArts Migration through a system policy such as MRS CommonOperations and MRS FullAccess. You can also create a custom policy to grant write permissions such as INSERT INTO TABLE and CREATE TABLE.
- Enabling ports: When configuring the MRS ClickHouse data source, ensure that the following ports have been enabled in the security group or network so that DataArts Migration can access MRS.
Table 1 Service ports Service
Port
Port Number
Usage
ClickHouse
TCP
9000
ClickHouse TCP port, which is used by the client to connect to the ClickHouse server
HTTP
8123
ClickHouse HTTP port, which is used by the client to connect to the ClickHouse server
HTTPS
8443
ClickHouse HTTPS port, which is used by the client to connect to the ClickHouse server
MRS Manager
TCP
28443
Used to download the MRS cluster configuration
TCP
20009
Used for CAS authentication
TCP
20029
Used by Manager to communicate with and manage other components
Kerberos (KDC)
TCP&&UDP
21730
Used for Kerberos authentication
TCP&&UDP
21731
Used for Kerberos authentication
TCP&&UDP
21732
Used for Kerberos authentication
Zookeeper
TCP
2181
ZooKeeper service port used for communications between the client and ZooKeeper cluster
Supported Data Types
The field types supported by different ClickHouse versions vary. The following table lists the field types supported by the open-source ClickHouse version 21.3.4.25. DataArts Migration is compatible with the following field types and their common variants so that it can correctly read and write various types of data.
| Category | ClickHouse Field Type | Read | Write |
|---|---|---|---|
| Numeric | Int8 | √ | √ |
| Int16 | √ | √ | |
| Int32 | √ | √ | |
| Int64 | √ | √ | |
| Int128 | √ | √ | |
| UInt8 | √ | √ | |
| UInt16 | √ | √ | |
| UInt32 | √ | √ | |
| UInt64 | √ | √ | |
| UInt128 | √ | √ | |
| Float32 | √ | √ | |
| Float64 | √ | √ | |
| Decimal | √ | √ | |
| Character | String | √ | √ |
| FixedString | √ | √ | |
| Time | Date | √ | √ |
| DateTime | √ | √ | |
| DateTime64 | √ | √ | |
| Boolean | Boolean | √ | √ |
| Array | Array | √ | √ |
| Tuple | Tuple | x | x |
| IP | IPv4 | √ | √ |
| IPv6 | √ | √ | |
| Enumeration | Enum8 | √ | √ |
| Enum16 | √ | √ | |
| Nested | Nested | x | x |
Supported Migration Scenarios
DataArts Migration supports the following offline synchronization modes:
- Single table synchronization
DataArts Migration supports table/file synchronization in data ingestion into a data lake or data migration to the cloud.
- Database and table shard synchronization
DataArts Migration supports synchronization of data from multiple databases and tables in data ingestion into a data lake or data migration to the cloud.
- Entire DB migration
DataArts Migrations supports synchronization of data from an on-premises database in data ingestion into a data lake or data migration to the cloud.
Database and table shard synchronization and entire DB migration are not supported in all regions. The following table lists the supported MRS ClickHouse migration scenarios.
| Supported Migration Scenario | Single Table Read | Single Table Write | Database/Table Shard Read | Database/Table Shard Write | Entire DB Read | Entire DB Write |
|---|---|---|---|---|---|---|
| Supported | √ | √ | x | √ | x | x |
Core Capabilities
- Connection configuration
Configuration Item
Supported
Description
Kerberos authentication
√
Kerberos authentication is used to access MRS clusters.
Connection configuration optimization
√
Connection configuration such as connectTimeout can be optimized to improve connection performance.
- Read capabilities
Configuration Item
Supported
Description
Shard concurrency
√
Horizontal sharding based on primary keys or common fields and multi-thread concurrent extraction significantly improve the throughput and efficiency.
Dirty data processing
√
Abnormal data can be written to the dirty data bucket to prevent job failures caused by a small amount of abnormal data.
Custom fields
√
You can add computed columns, constant columns, or masking functions for tasks to meet personalized service requirements.
Incremental read
√
Where conditions can be used to deliver query requests for incremental data reading.
Stream and batch reading
Batch reading
Data can be read from a large static dataset in batches and then centrally processed.
- Write capabilities
Configuration Item
Supported
Description
Pre- and post-import processing
√
Operations such as preSql can clean and process data before and after data import.
Concurrent write
√
Concurrent write improves efficiency.
Optimization of the number of written rows
√
You can set the number of rows written by each request in the connection to properly control the amount of data to be transmitted. This improves performance and prevents a transmission delay or the system from being overloaded when there is a large amount of data.
Dirty data processing
x
Abnormal data cannot be written to the dirty data bucket to prevent job failures caused by a small amount of abnormal data.
Creating a Data Source
Create a data source in Management Center. For details, see Configuring Data Connection Parameters.
Creating an Offline Data Migration Job
Create a ClickHouse migration job in DataArts Factory. For details, see Creating an Offline Processing Migration Job.
Best Practices
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot