Doris
Doris is a high-performance and highly scalable distributed analytical database. It supports real-time data write and fast query, and is suitable for multi-dimensional analysis and report generation of massive amounts of data.
DataArts Migration can efficiently migrate data from and to MRS Doris and CloudTable Doris on Huawei Cloud.
How It Works
Doris Reader reads data using native JDBC, and DorisWriter writes data using JDBC or StreamLoad efficiently.
Preparation and Constraints
- Network requirements
The Doris data source can communicate with CDM. This ensures smooth data transmission. For details, see Enabling Network Connectivity.
- Required permissions
- MRS Doris read and write permissions
- Read permission: Grant the read-only permission of MRS Doris to the IAM user or user group of DataArts Migration through a system policy such as MRS ReadOnlyAccess. You can also create a custom policy to grant read permissions such as SELECT.
- Write permission: Grant the write permission of MRS Doris to the IAM user or user group of DataArts Migration through a system policy such as MRS CommonOperations and MRS FullAccess. You can also create a custom policy to grant write permissions such as INSERT INTO TABLE and CREATE TABLE.
- CloudTable Doris read and write permissions
- Read permission: Grant the ReadOnlyAccess system policy of CloudTable to the IAM user or user group of DataArts Migration, or create a custom policy to grant read permissions such as SELECT.
- Write permission: Grant the CommonOperations or FullAccess system policy of CloudTable to the IAM user or user group of DataArts Migration, or create a custom policy to grant write permissions such as INSERT INTO TABLE and CREATE TABLE.
- MRS Doris read and write permissions
- Enabling ports
- JDBC port (9030): Ensure that the JDBC port 9030 of the Doris service has been enabled so that DataArts Migration can connect to the Doris database through JDBC and read and write data.
- StreamLoad port (8030): If data is written using StreamLoad, ensure that the StreamLoad port 8030 of the Doris service has been enabled so that DataArts Migration can write data to Doris efficiently.
- StreamLoad port (8050): If data is written using StreamLoad and HTTPS encryption is enabled, ensure that the StreamLoad HTTPS port 8050 of the Doris service has been enabled so that DataArts Migration can write data to Doris securely.
Driver Usage
- The MySQL driver is recommended.
- Version mapping between Doris and the driver:
- Doris versions earlier than 2.0: MySQL 5.x driver is required.
- Doris version 2.0 and later: MySQL 8.0.27 driver is required.
Supported Field Types
Different Doris versions support different data types. The following table lists the supported Doris fields. For details about all the field types supported by Doris of each version, see the official Doris documentation.
| Category | Field Type | Read | Write |
|---|---|---|---|
| Numeric | SMALLINT | √ | √ |
| INT | √ | √ | |
| BIGINT | √ | √ | |
| LARGEINT | √ | √ | |
| FLOAT | √ | √ | |
| DOUBLE | √ | √ | |
| DECIMAL | √ | √ | |
| DECIMALV3 | √ | √ | |
| Time | DATE | √ | √ |
| DATETIME | √ | √ | |
| DATEV2 | √ | √ | |
| DATETIMEV2 | √ | √ | |
| Character | CHAR | √ | √ |
| VARCHAR | √ | √ | |
| STRING | √ | √ | |
| VARCHAR | √ | √ | |
| TEXT | √ | √ | |
| Other | POINT | x | x |
| JSON | √ | √ | |
| ARRAY | x | x | |
| JSONB | x | x | |
| HLL | x | x | |
| BITMAP | x | x | |
| QUANTILE_STATE | x | x |
Supported Migration Scenarios
DataArts Migration supports the following offline synchronization modes:
- Single table synchronization
DataArts Migration supports table/file synchronization in data ingestion into a data lake or data migration to the cloud.
- Database and table shard synchronization
DataArts Migration supports synchronization of data from multiple databases and tables in data ingestion into a data lake or data migration to the cloud.
- Entire DB migration
DataArts Migrations supports synchronization of data from an on-premises database in data ingestion into a data lake or data migration to the cloud.
Database and table shard synchronization and entire DB migration are not supported in all regions. The following table lists the supported Doris migration scenarios.
| Supported Migration Scenario | Single Table Read | Single Table Write | Database/Table Shard Read | Database/Table Shard Write | Entire DB Read | Entire DB Write |
|---|---|---|---|---|---|---|
| Supported | √ | √ | x | √ | x | x |
Core Capabilities
- Connection configuration
Configuration Item
Supported
Description
Supported protocols
JDBC/
Streamload
DataArts Migration can exchange data with Doris through JDBC or StreamLoad.
JDBC is suitable for general database operations.
StreamLoad provides more efficient data write and is suitable for quick import of a large amount of data.
HTTPS support
√
DataArts Migration can exchange data with Doris over HTTPS, ensuring data security and integrity during transmission.
Connection configuration optimization
√
Connection configuration such as connectTimeout can be optimized to improve connection performance.
- Read capabilities
Configuration Item
Supported
Description
Incremental read
√
Incremental read can be read through where conditions or SQL statements.
Read mode
√
The database table mode and SQL statements are supported. The database table mode can be used to read data from a specified table. SQL statements can be used to flexibly query data, meeting complex requirements.
Shard concurrency
√
Horizontal sharding based on common fields or partitions and multi-thread concurrent extraction significantly improve the throughput and efficiency.
Custom fields
√
You can add computed columns, constant columns, or masking functions for tasks to meet personalized service requirements.
Dirty data processing
√
Abnormal data can be written to the dirty data bucket to prevent job failures caused by a small amount of abnormal data.
- Write capabilities
Configuration Item
Supported
Description
Write mode
JDBC/
STREAM_LOAD
DataArts Migration can exchange data with Doris through JDBC or StreamLoad. JDBC is suitable for general database operations.
StreamLoad provides more efficient data write and is suitable for quick import of a large amount of data.
Pre- and post-import processing
√
Operations such as preSql and truncate can clean and process data before and after data import.
Optimization of the number of written rows
√
In JDBC mode, you can tune the Batch Size parameter in the connection configuration to optimize write performance.
In STREAM_LOAD mode, streamload configuration parameters can be used to optimize write performance.
Dirty data processing
x
Abnormal data cannot be written to the dirty data bucket to prevent job failures caused by a small amount of abnormal data.
Concurrent write
√
Concurrent write improves efficiency.
Creating a Data Source
Create a data source in Management Center. For details, see Configuring Data Connection Parameters.
Creating an Offline Data Migration Job
Create a Doris migration job in DataArts Factory. For details, see Creating an Offline Processing Migration Job.
Best Practices
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot