Performance Tuning

Setting Synchronization Task Parameters

max_full_sync_task_threads_num: number of full synchronization threads. By default, it is set to half of the vCPUs on FE nodes. Increasing this value will make full synchronization faster, but more vCPUs and memory of OLTP and OLAP will be consumed. Set an appropriate value for this parameter based on the system load when executing a full synchronization task. If multiple full synchronization tasks are executed at the same time, decrease this parameter value.

max_incremental_sync_task_threads_num: number of incremental synchronization threads. By default, it is set to half of the vCPUs on FE nodes. A larger value indicates that more threads are used for incremental synchronization, more resources are consumed, and the synchronization latency is shorter. If there are more than five synchronization tasks on an instance, reduce the number of synchronization threads for each task.
expect_tablet_size: expected size of source data stored in each bucket, in GB. The default value is 3. If most tables in a database have less than 3 GB of data and there are only a few tables with more data, decrease this value.
expect_tablet_num_for_one_partition: expected default number of buckets in each partition. If this parameter is set to 0, the number of buckets is calculated based on the data size. The default value is 2. If there is no data in a table, this default value will be used. If there is data in a table, the number of buckets is calculated as follows: Data size/Value of expect_tablet_size. If a partition key is specified for table synchronization, you need to evaluate the number of buckets required for data in each partition. The number of buckets for a table is calculated as follows: Number of partitions x Number of buckets in each partition.

Improving Query Performance

Tuning SQL Performance

Do not use SELECT *. Remove redundant columns and functions.

Using Query Cache

Method: Connect to an HTAP instance through DAS and run SET GLOBAL enable_query_cache=true;

Purpose: Speed up frequently executed aggregate queries.

Using Sorting Keys

Principles:

Choose columns that are frequently used as query criteria as sorting keys.

The order of sorting keys should be determined based on their usage frequency and data cardinality, with priority given to high cardinality.
When the proportion of multiple columns queried separately is almost the same, specify sorting keys in a certain order in the table. Then create a materialized view separately and use other columns as sorting keys.
Set no more than five sorting keys. Keep column lengths under 36 bytes to avoid varchar truncation. Columns of the FLOAT, DOUBLE, or BIT data type are not supported.

Method: When creating a synchronization task, use table synchronization to add sorting keys.

Purpose: Reduce the amount of data to be scanned.

Example:

Create an order table on the OLTP side.

CREATE TABLE `orders` (
`O_ORDERKEY` bigint NOT NULL,
`O_CUSTKEY` bigint NOT NULL,
`O_ORDERSTATUS` char(1) COLLATE utf8mb4_bin NOT NULL,
`O_TOTALPRICE` decimal(15,2) NOT NULL,
`O_ORDERDATE` date NOT NULL,
`O_ORDERPRIORITY` char(15) COLLATE utf8mb4_bin NOT NULL,
`O_CLERK` char(15) COLLATE utf8mb4_bin NOT NULL,
`O_SHIPPRIORITY` bigint NOT NULL,
`O_COMMENT` varchar(79) COLLATE utf8mb4_bin NOT NULL,
PRIMARY KEY (`O_ORDERKEY`)
);

By default, no processing is performed. When data is synchronized to an OLAP table, the primary key is used as the sorting key by default. The table definition is as follows:

CREATE TABLE `orders` (
`O_ORDERKEY` bigint(20) NOT NULL COMMENT "",
`O_CUSTKEY` bigint(20) NOT NULL COMMENT "",
`O_ORDERSTATUS` varchar(4) NOT NULL COMMENT "",
`O_TOTALPRICE` decimal(15, 2) NOT NULL COMMENT "",
`O_ORDERDATE` date NOT NULL COMMENT "",
`O_ORDERPRIORITY` varchar(60) NOT NULL COMMENT "",
`O_CLERK` varchar(60) NOT NULL COMMENT "",
`O_SHIPPRIORITY` bigint(20) NOT NULL COMMENT "",
`O_COMMENT` varchar(316) NOT NULL COMMENT ""
) ENGINE=OLAP
PRIMARY KEY(`O_ORDERKEY`)
DISTRIBUTED BY HASH(`O_ORDERKEY`)
PROPERTIES (
"replication_num" = "1",
"in_memory" = "false",
"enable_persistent_index" = "false",
"replicated_storage" = "true",
"compression" = "LZ4"
);

However, when queries are performed on the OLAP side, the O_CUSTKEY column is used to collect statistics on orders.

select count(*) from orders where O_CUSTKEY = 123;

In this case, you can adjust the definition of the OLAP table. Specifically, use O_CUSTKEY as the sorting key (the primary key model uses the primary key as the sorting key by default). You can execute ORDER BY (O_CUSTKEY); to specify the sorting key of the table separately when creating a data synchronization task. The adjusted table structure is as follows:

CREATE TABLE `orders` (
`O_ORDERKEY` bigint(20) NOT NULL COMMENT "",
`O_CUSTKEY` bigint(20) NOT NULL COMMENT "",
`O_ORDERSTATUS` varchar(4) NOT NULL COMMENT "",
`O_TOTALPRICE` decimal(15, 2) NOT NULL COMMENT "",
`O_ORDERDATE` date NOT NULL COMMENT "",
`O_ORDERPRIORITY` varchar(60) NOT NULL COMMENT "",
`O_CLERK` varchar(60) NOT NULL COMMENT "",
`O_SHIPPRIORITY` bigint(20) NOT NULL COMMENT "",
`O_COMMENT` varchar(316) NOT NULL COMMENT ""
) ENGINE=OLAP
PRIMARY KEY(`O_ORDERKEY`)
DISTRIBUTED BY HASH(`O_ORDERKEY`)
ORDER BY(`O_CUSTKEY`)
PROPERTIES (
"replication_num" = "1",
"in_memory" = "false",
"enable_persistent_index" = "false",
"replicated_storage" = "true",
"compression" = "LZ4"
);

Using Partitions

Principles:

A time column whose value does not change is often used for WHERE filtering. Use the column to create partitions. You are advised to create partitions by month.
The number of partitions cannot exceed 1,024. The data volume of a single partition cannot exceed 100 GB.
When adding partitions, evaluate the number of buckets in a single partition. It is recommended that a single bucket store 1 to 10 GB of data.

Method: When creating a synchronization task, use table synchronization to add partitions.

Using Buckets

Adding buckets
Principles:
- Choose columns that are frequently used for query and have high cardinality.
- Ensure that there is no data skew.
Method: When creating a synchronization task, use table synchronization to add buckets.

Purpose: Reduce the amount of data to be scanned.
Number of buckets and data volume
Principles: The data volume of each bucket should be between 1 GB and 10 GB.

Method:
- When creating a synchronization task, set the following parameters for database synchronization:
  expect_tablet_size: data volume of a bucket, in GB. The default value is 3.
  
  expect_tablet_num_for_one_partition: number of buckets. Buckets are created based on the data volume of the table. The default value is 0.
- Specify the number of buckets and data volume through table synchronization settings.
```
distributed by (column1) buckets 3;
```
- When setting partition keys, set the number of buckets. By default, the number of buckets in each partition is the same as that in the table, which may cause too many table shards.
Purpose: Reduce memory usage, and improve data merging efficiency and concurrency.

Example:

The following order table exists on the OLTP side.

CREATE TABLE `orders` (
`O_ORDERKEY` bigint NOT NULL,
`O_CUSTKEY` bigint NOT NULL,
`O_ORDERSTATUS` char(1) COLLATE utf8mb4_bin NOT NULL,
`O_TOTALPRICE` decimal(15,2) NOT NULL,
`O_ORDERDATE` date NOT NULL,
`O_ORDERPRIORITY` char(15) COLLATE utf8mb4_bin NOT NULL,
`O_CLERK` char(15) COLLATE utf8mb4_bin NOT NULL,
`O_SHIPPRIORITY` bigint NOT NULL,
`O_COMMENT` varchar(79) COLLATE utf8mb4_bin NOT NULL,
PRIMARY KEY (`O_ORDERKEY`)
);

By default, no processing is performed. When data is synchronized to an OLAP table, the primary key is used for hash bucketing by default. The definition is as follows:

CREATE TABLE `orders` (
`O_ORDERKEY` bigint(20) NOT NULL COMMENT "",
`O_CUSTKEY` bigint(20) NOT NULL COMMENT "",
`O_ORDERSTATUS` varchar(4) NOT NULL COMMENT "",
`O_TOTALPRICE` decimal(15, 2) NOT NULL COMMENT "",
`O_ORDERDATE` date NOT NULL COMMENT "",
`O_ORDERPRIORITY` varchar(60) NOT NULL COMMENT "",
`O_CLERK` varchar(60) NOT NULL COMMENT "",
`O_SHIPPRIORITY` bigint(20) NOT NULL COMMENT "",
`O_COMMENT` varchar(316) NOT NULL COMMENT ""
) ENGINE=OLAP
PRIMARY KEY(`O_ORDERKEY`)
DISTRIBUTED BY HASH(`O_ORDERKEY`)
PROPERTIES (
"replication_num" = "1",
"in_memory" = "false",
"enable_persistent_index" = "false",
"replicated_storage" = "true",
"compression" = "LZ4"
);

When queries are performed on the OLAP side, the O_CUSTKEY and O_ORDERDATE columns are used to collect statistics on orders. Data is partitioned by date to avoid significant data skew.

select count(*) from orders where O_CUSTKEY = 123 and O_ORDERDATE='2025-01-01';

In this case, you can adjust the definition of the OLAP table. Specifically, use O_ORDERDATE as the partition key, O_CUSTKEY as the bucketing key using the DISTRIBUTED BY (O_CUSTKEY) clause, and O_CUSTKEY and O_ORDERDATE as the sorting keys of the table. The partition key and bucketing key must be part of the primary key. So, you need to adjust the primary key using the KEY COLUMNS(O_ORDERKEY, O_ORDERDATE, O_CUSTKEY) clause. The adjusted table structure is as follows:

CREATE TABLE `orders` (
`O_ORDERKEY` bigint(20) NOT NULL COMMENT "",
`O_ORDERDATE` date NOT NULL COMMENT "",
`O_CUSTKEY` bigint(20) NOT NULL COMMENT "",
`O_ORDERSTATUS` varchar(4) NOT NULL COMMENT "",
`O_TOTALPRICE` decimal(15, 2) NOT NULL COMMENT "",
`O_ORDERPRIORITY` varchar(60) NOT NULL COMMENT "",
`O_CLERK` varchar(60) NOT NULL COMMENT "",
`O_SHIPPRIORITY` bigint(20) NOT NULL COMMENT "",
`O_COMMENT` varchar(316) NOT NULL COMMENT ""
) ENGINE=OLAP
PRIMARY KEY(`O_ORDERKEY`, `O_ORDERDATE`, `O_CUSTKEY`)
PARTITION BY date_trunc('month', O_ORDERDATE)
DISTRIBUTED BY HASH(`O_CUSTKEY`)
ORDER BY(`O_CUSTKEY`, `O_ORDERDATE`)
PROPERTIES (
"replication_num" = "1",
"in_memory" = "false",
"enable_persistent_index" = "false",
"replicated_storage" = "true",
"compression" = "LZ4"
);

Using Indexes

Bitmap indexes
Principles: Bitmap indexes are often used for filtering and are not in sorting keys. They are suitable for columns with a cardinality of around 10,000 to 100,000 or combined queries of multiple low cardinality columns.

Method:
- Add an index.
```
CREATE INDEX column1_index1 ON table (column1) USING BITMAP;
```
- View the progress.
```
SHOW ALTER TABLE COLUMN [FROM db_name];
```
- View the created index.
```
SHOW { INDEX[ES] | KEY[S] } FROM [db_name.]table_name [FROM db_name];
```
- Drop the index.
```
DROP INDEX index_name ON [db_name.]table_name;
```
Constraints: The FLOAT, DOUBLE, BOOLEAN, and DECIMAL types are not supported.

Bloom filter indexes
Principles: Bloom filter indexes are suitable for columns with a cardinality of more than 100,000. The columns have very low duplication rates.

Method:
- Add an index.
```
ALTER TABLE table1 SET ("bloom_filter_columns" = "k1,k2,v1");
```
- View the progress.
```
SHOW ALTER TABLE
```
Purpose: Reduce the amount of data to be scanned.