Enabling Request Distribution Among Row Store and Column Store Nodes for Complex OLAP Queries_HTAP Analysis (Standard Edition)_User Guide

Scenarios

OLTP workloads involve both read and write requests in most cases. Write requests are processed by the primary node, and read requests are processed by read replicas or the primary node. OLAP workloads involve only read requests in most cases. Read requests are processed by HTAP nodes.

If your system handles both OLTP and OLAP workloads, you can use column-store indexes to distribute OLAP requests to HTAP nodes and OLTP requests to read replicas through a proxy address for optimal efficiency.

Read replicas: process read requests based on row store. Row store nodes deliver higher performance when processing OLTP read requests.
HTAP nodes: process read requests based on column store. HTAP nodes outperform read replicas when processing OLAP read requests (such as complex SQL and analytic SQL queries).

Request Distribution Methods

Automatic request distribution and manual request distribution are supported.

**Table 1** Request distribution methods
Method	Description	Constraint
Automatic request distribution	After this function is enabled, a proxy instance distributes requests based on the actual execution cost of a SQL statement to maximize the query performance. Criteria: Requests that fall below or reach the execution cost threshold of a SQL statement are distributed to read replicas (or the primary node). If there are multiple read replicas, the system distributes requests among them based on their assigned weights. Requests that exceed the execution cost threshold of a SQL statement are distributed to HTAP nodes. If there are multiple HTAP nodes, the system distributes requests among them based on their assigned weights.	Currently, only single-node standard HTAP instances support request distribution among row store and column store nodes. Automatic request distribution among row store and column store nodes only supports the weighted routing policy. Automatic request distribution among row store and column store nodes only supports the read/write proxy mode. The read-only proxy mode is not supported. Automatic request distribution among row store and column store nodes only supports eventual consistency. Global consistency and session consistency are not supported. Automatic request distribution among row store and column store nodes does not support session-level connection pooling. SELECT FOR UPDATE is not supported. Custom variables are not supported. Prepared statements are not supported.
Manual request distribution	If automatic distribution does not work as expected, you can use the HINT syntax to forcibly execute a row store or column store execution plan.	-

Constraints

Databases whose names contain Chinese characters cannot be synchronized. The destination database and task names cannot contain Chinese characters, and the destination database name must contain at least three characters.

Prerequisites

Parameters have been configured for a TaurusDB instance according to the following table.

**Table 2** Parameter description
Parameter	Value	How to Modify	Description
binlog_expire_logs_seconds	86400	Modifying Parameters of a DB Instance	It is recommended that the binlog retention period be greater than one day. 86,400s = 60 (seconds) x 60 (minutes) x 24 (hours). This prevents incremental replication failures caused by a short binlog retention period.
rds_global_sql_log_bin	ON	How Do I Enable and View Binlog of My TaurusDB Instance?	To use this parameter, ensure that the kernel version of your TaurusDB instance is 2.0.45.230900 or later. For details about how to check the kernel version, see How Can I Check the Version of a TaurusDB Instance?

Databases and tables have been created for the TaurusDB instance.

Procedure

Step 1: Buy a Standard HTAP Instance

Log in to the management console.
Click in the upper left corner and select a region and project.
Click in the upper left corner of the page and choose Databases > TaurusDB.
On the Instances page, locate a TaurusDB instance and click its name to access the Basic Information page.
In the navigation pane, choose HTAP Analysis. On the displayed page, click Create HTAP Instance.

Set parameters for the standard HTAP instance.

Basic configuration

Figure 1 Basic configuration
Click to enlarge

**Table 3** Basic configuration
Parameter	Description
Billing Mode	Select Pay-per-use or Yearly/Monthly.

Resource selection

Figure 2 Resource selection
Click to enlarge

**Table 4** Resource selection
Parameter	Description
Instance Edition	Only Standard is available. This edition is developed based on open-source StarRocks.
HTAP Instance Type	Select Single or Cluster. Single: There is only one FE node and one BE node. It is used only for function experience and testing and does not ensure SLA. Cluster: There are at least three FE or BE nodes and at most 10 FE or BE nodes.
Storage Type	Select Extreme SSD or Ultra-high I/O. Extreme SSD: uses 25GE network and RDMA technologies to provide you with up to 1,000 MB/s throughput per disk and sub-millisecond latency. Ultra-high I/O: uses disk striping to balance I/O loads among multiple disks to improve read/write bandwidth. The maximum throughput is 1.7 GB/s.
AZ Type	Only Single-AZ is available.
AZ	Select an AZ as needed.
Time Zone	Select a time zone for your instance based on the region hosting your instance. The time zone is selected during instance creation and cannot be changed after the instance is created.
Security Group	The default value is the security group of the TaurusDB instance. You are advised to keep the security group consistent with that of the TaurusDB instance.

Instance options

Figure 3 Instance options
Click to enlarge

**Table 5** Instance options
Parameter	Description
Instance Specifications	Only general-enhanced is available.
Backend Node Specifications	Select the BE node specifications. The BE nodes are for data storage and SQL computing.
Backend Node Storage (GB)	Select the storage for BE nodes. The default storage is 50 GB and can be expanded to up to 32,000 GB.
Backend Nodes	A single-node instance has only one BE node. A cluster instance has 3 to 10 BE nodes. You can apply for a maximum of 10 nodes at once.
Frontend Node Specifications	Select the FE node specifications. The FE nodes manage metadata, manage client connections, and plan and schedule queries.
Frontend Node Storage (GB)	Select the storage for FE nodes. The default storage is 50 GB and can be expanded to up to 1,000 GB.
Frontend Nodes	A single-node instance has only one FE node. A cluster instance has 3 to 10 FE nodes. You can apply for a maximum of 10 nodes at once.

Instance configuration

Figure 4 Instance configuration
Click to enlarge

**Table 6** Instance configuration
Parameter	Description
DB Instance Name	The instance name must start with a letter and consist of 4 to 64 characters. Only letters (case-sensitive), digits, hyphens (-), and underscores (_) are allowed.
Administrator Password	The password must consist of 8 to 32 characters and contain at least three types of the following characters: uppercase letters, lowercase letters, digits, and special characters (~!@#%^*-_=+?,()&$\|.). Enter a strong password and periodically change it to improve security and defend against threats such as brute force cracking attempts.
Confirm Password	Enter the administrator password again.

Required duration and quantity

Figure 5 Required duration and quantity
Click to enlarge

**Table 7** Required duration and quantity
Parameter	Description
Required Duration	This parameter is only available for yearly/monthly instances. The system will automatically calculate the fee based on the selected required duration. The longer the required duration is, the larger discount you will enjoy.
Auto-renew	By default, this option is not selected. If you select this option, the auto-renewal cycle is determined by the selected required duration.
Quantity	You can only buy one instance at a time.

Click Next in the lower right corner.
- For pay-per-use instances, go to 8.
- For yearly/monthly instances, go to 9.
Confirm the information and click Submit.
To modify the instance information, click Previous.
Confirm the order.
- To modify the instance information, click Previous.
- If you do not need to modify your settings, click Pay Now. On the order page, complete the payment.
On the HTAP instance list page, view and manage the HTAP instance.

Step 2: Synchronize TaurusDB Instance Data to the Standard HTAP Instance

On the Instances page, locate a TaurusDB instance and click its name to access the Basic Information page.
In the navigation pane, choose HTAP Analysis.
Click the name of an HTAP instance to access the Basic Information page.
In the navigation pane, choose Data Synchronization. On the displayed page, click Create Synchronization Task.

Configure required parameters.

Figure 6 Creating a synchronization task
Click to enlarge

**Table 8** Parameter description
Parameter	Description
Synchronize Read Replica Data	Controls whether to synchronize full data from a read replica. During a full synchronization, ensure that the read replica is available, or the synchronization will fail and you will need to perform the synchronization again. Yes: Full data is synchronized from the selected read replica, preventing query load on the primary node during a full synchronization. If there is only one read replica, this node is selected by default. No: No synchronization is performed.
Instance-level Synchronization	Controls whether to synchronize multiple databases. Yes: A synchronization task can synchronize multiple or all databases. No: A synchronization task can synchronize only one database.
Database Synchronization Scope	This parameter is only available when Instance-level Synchronization is set to Yes. All databases: All databases are synchronized by default. You do not need to specify any database name. Some databases: You need to specify two or more database names.
Parameters Settings	Retain the default values, unless otherwise specified.
Synchronization Task Name	The name can contain 3 to 128 characters. Only letters, digits, underscores (_) are allowed.
Destination Database	The name can contain 3 to 128 characters. Only letters, digits, underscores (_) are allowed. This parameter is not displayed when Database Synchronization Scope is set to All databases. When Assign Requests to Row and Column Store Nodes is enabled, the source database name must be the same as the destination database name.
Database to be Synchronized	Select a database that the data will be synchronized to from the drop-down list. You can modify the database parameters of the HTAP instance as required. The drop-down list is hidden when Database Synchronization Scope is set to All databases.
Synchronization Scope	Select All Tables or Some Tables. If you select Some tables for instance-level synchronization, you need to enter the range of tables to be synchronized. You can use table name wildcards to simplify the operation.
Wildcard Supported by Table Name	This parameter is only displayed when Instance-level Synchronization is set to Yes and Synchronization Scope is set to Some tables. In an instance-level synchronization scenario, you can determine whether table names in the blacklist or whitelist support wildcards * and ?. The wildcard * matches zero or more characters, and the wildcard ? matches exactly one character.
Synchronization Scope: Whitelist	If Synchronization Scope is set to Some Tables, you need to configure tables for the blacklist or whitelist. You can set either a blacklist or a whitelist. If you select the whitelist, only the tables in the whitelist are synchronized. If you select the blacklist, the tables in the blacklist are not synchronized. The tables to be synchronized must contain primary keys or a non-empty unique key, or they cannot be synchronized to the HTAP instance. Extra disk space may be used during backend data combination and query. You are advised to reserve 50% of the disk space for the system. When setting the table blacklist or whitelist, you can enter multiple tables in the search box at once. The tables can be separated by commas (,), spaces, or line breaks (\n). After entering multiple tables, you need to click . These tables will be selected by default and displayed in the Selected Table area.
Configure Table Operations	Enable or disable it as required. Enabled: Select a synchronized table on the left and perform operations on its columns. The operations include order by, key columns, distributed by, partition by, data_model, buckets, replication_num, and enable_persistent_index. Separate multiple operations by semicolons (;). For details about the syntax, see Table 9. Disabled: No operations are required. NOTICE: By default, when data is synchronized from an OLTP table to an OLAP table, the primary key of the OLTP table is used as the primary key and sorting key, and for hash bucketing. Partitions are not synchronized. If the default settings do not meet the performance requirements of OLAP workloads, you can adjust the sorting key, bucketing key, and partitions of an OLAP table by referring to Performance Tuning. You can adjust specific tables using table synchronization settings.

**Table 9** Operation syntax
Operation Type	Syntax
order by	order by (column1, column2) or order by column1,column2
key columns	key columns (column1, column2) or key columns column1,column2
distributed by	distributed by (column1, column2) buckets 3 NOTE: buckets is optional. If it is not set, the default value is used.
partition by	There are expression partitioning and list partitioning. For details, see Partitioning Syntax Supported by Table Synchronization Settings.
data_model	Specifies the table type. The value can be primary key, duplicate key, or unique key. Syntax: data_model=primary key, data_model=duplicate key, or data_model=unique key
replication_num	replication_num=3 NOTE: The value cannot exceed the number of BE nodes, or the verification fails.
enable_persistent_index	Specifies whether to make the index persistent. Syntax: enable_persistent_index=true or enable_persistent_index=false
Combined scenario	data_model=duplicate key;key columns column1, column2;

After the settings are complete, click Create Synchronization Task.
Click Back to Synchronization List to return to the data synchronization page. A synchronization task to be synchronized is generated. The task status is Synchronization Stage: Waiting for synchronization.
Figure 7 Viewing the task status
To start the task, click Synchronize in the Operation column.
If the task status changes to Synchronization Stage: Incremental synchronization in progress, the data synchronization is complete.
- During the full synchronization, if some tables fail to be synchronized, an alarm will be generated and those tables will be skipped. The remaining tables will continue to be synchronized. After the full synchronization is complete, the incremental synchronization starts. You can repair tables that failed to be synchronized during the incremental synchronization.
- When you start a synchronization task, it begins with a full synchronization, during which tables are locked using the FLUSH TABLES WITH READ LOCK command. To minimize the impact on workloads, start synchronization tasks during off-peak hours.
- During service tests, if you want to suspend a synchronization task, click Stop. The suspension duration cannot exceed the binlog retention period set for the source primary TaurusDB instance. If the suspension duration exceeds the binlog retention period, the synchronization task cannot continue. You need to delete the task and create a new one. Do not suspend synchronization tasks in the production environment to prevent data inconsistency between OLTP and OLAP.

Step 3: Enable Assign Requests to Row and Column Store Nodes

Automatic request distribution and manual request distribution are supported.

Method 1: automatic request distribution

In the HTAP instance list, click the name of an HTAP instance to access the Basic Information page.
In the navigation pane, choose Database Proxy.
Figure 8 Database Proxy page
Click Create Proxy Instance.

In the displayed dialog box, configure related parameters.

**Table 10** Parameter description
Parameter	Description
Proxy Instance Name	The name must start with a letter and consist of 4 to 64 characters. Only letters (case-sensitive), digits, hyphens (-), and underscores (_) are allowed.
Proxy Mode	Only Read/Write is supported. In this mode, all write requests are only forwarded to the primary node, and all read requests are forwarded to the selected nodes based on the read weights. The default read weight of the primary node is 100.
Consistency Level	Only Eventual consistency is supported. Select this option if you want to offload read requests from the primary node to read replicas. When data is updated, you may not immediately obtain the most current data, but the data will become consistent after a certain period of time. Reducing read load on the primary node can improve instance performance. In read-only mode, only eventual consistency is available because write operations cannot be performed.
Routing Policy	Only Weighted is supported. Read requests are assigned to nodes based on the weights you specify.
Proxy Instance Specifications	You can select the proxy instance specifications as needed. Kunpeng general computing-plus: 2 vCPUs \| 4 GB, 4 vCPUs \| 8 GB, and 8 vCPUs \| 16 GB General-enhanced: 2 vCPUs \| 4 GB, 4 vCPUs \| 8 GB, and 8 vCPUs \| 16 GB
Subnet	You can specify a subnet when enabling read/write splitting. To use this function, submit a service ticket. If the subnet where the TaurusDB instance is located is a secondary CIDR block, cross-subnet read/write splitting is not supported. The proxy instance must be in the same subnet as the TaurusDB instance.
Proxy Instance Nodes	The default value is 2. Enter an integer from 2 to 16. Number of recommended proxy instance nodes = (Number of vCPUs of the primary node + Total number of vCPUs of all read replicas)/(4 x Number of vCPUs of the proxy instance), rounded up.
Associate New Nodes	After Associate New Nodes is enabled, new read replicas will be automatically associated with the proxy instance.
New Node Weight	If Routing Policy is Weighted, you need to set read weights of the new nodes. The default weight of a node is 100. Nodes with higher weights process more read requests.
Auto Assign Requests to Column Store or Row Store Nodes	After this function is enabled, you need to set the execution cost threshold of a SQL statement. The proxy instance uses this threshold to determine whether to distribute requests to column store or row store nodes. If the estimated execution cost of a requested SQL statement exceeds the threshold, the request is distributed to a column store node (HTAP node). If the estimated execution cost of a requested SQL statement falls below or reaches the threshold, the request is distributed to a row store node (read replica). This parameter is only available when Proxy Mode is set to Read/Write, Consistency Level to Eventual consistency, and Routing Policy to Weighted, and the subnet is the same as that of the TaurusDB instance.
Database Nodes	Select the primary node along with at least one read replica and one HTAP node to handle requests. Example: You have selected one primary node, one read replica, and two HTAP instances, and configured their read weights to 100, 100, 100, and 300, respectively. After Auto Assign Requests to Column Store or Row Store Nodes is enabled, write requests are distributed to the primary node. If the estimated execution cost of a requested SQL statement falls below or reaches the threshold (for example, 50000), read requests will be distributed to the primary node and the read replica in the ratio of 1:1, that is, the primary node and the read replica process 50% of read requests respectively. If the estimated execution cost of a requested SQL statement exceeds the threshold (for example, 50000), read requests will be distributed to the two HTAP nodes in the ratio of 1:3, that is, the two HTAP instances process 25% and 75% of read requests respectively.

Click OK.

Set the threshold for automatic request distribution.

After Auto Assign Requests to Column Store or Row Store Nodes is enabled, you need to set the maximum execution cost of a SQL statement. Then the proxy instance uses this threshold to determine whether to distribute requests to column store or row store nodes. If the estimated execution cost of a requested SQL statement exceeds the threshold, the request is distributed to a column store node (HTAP node). If the estimated execution cost of a requested SQL statement falls below or reaches the threshold, the request is distributed to a row store node (read replica).

The threshold is determined by the parameter in the following table. You can change the parameter value on the Parameters page of the proxy instance.

**Table 11** Parameter description
Parameter	Description
looseImciApThreshold	Specifies the maximum execution cost of a SQL statement. The default value is 50000. NOTE: After Auto Assign Requests to Column Store or Row Store Nodes is enabled, if the execution cost of a SQL statement is greater than the threshold, the request is distributed to an HTAP node.

You can run the following command on the primary node to query the accurate execution cost of the previous SQL statement, and determine whether to adjust the value of looseImciApThreshold.

explain format=json <SQL_command>

Example:

explain format=json select sbtest1.*,sbtest2.* from sbtest1 inner join sbtest2 where sbtest1.id < 100 and sbtest2.k > 11;

The query results show that the execution cost of the SELECT statement is 1.97.

To distribute the SQL query to the column store plan on a column store node, set looseImciApThreshold to 1.

Method 2: manual request distribution

If automatic distribution does not work as expected, you can use the HINT syntax to forcibly execute a row store or column store execution plan.

The HINT syntax takes effect only for a specified SQL statement. It does not affect other connections or other SQL statements in the same connection.
To run the HINT syntax on a MySQL client earlier than 5.7.7, add the --comments option when connecting to the DB engine. You can run the mysql --version command to view the MySQL client version.

When a proxy instance is used to auto assign requests to column store or row store nodes, you can run the HINT syntax to force the proxy instance to distribute SQL statements to column store nodes, regardless of the looseImciApThreshold setting.
Method: Add /* FORCE_IMCI_NODES */ before a SQL keyword.

Example:

/*FORCE_IMCI_NODES*/ SELECT COUNT(*) FROM t1;
When a proxy instance is used to auto assign requests to column store or row store nodes, you can run the HINT syntax to force the proxy instance to distribute SQL statements to row store nodes, regardless of the looseImciApThreshold setting.
Method: Add /*FORCE_SLAVE*/ or /*FORCE_MASTER*/ before a SQL keyword.

Example:

Distributing a SQL statement to a read replica:

/*FORCE_SLAVE*/ SELECT COUNT(*) FROM t1;

Distributing a SQL statement to the primary node:

/*FORCE_MASTER*/ SELECT COUNT(*) FROM t1;

Step 4: Connect to the TaurusDB Instance and Perform OLAP Queries

Connect to the TaurusDB instance through a proxy address and perform OLAP queries. For details, see Connecting to a DB Instance Through JDBC.

Enabling Request Distribution Among Row Store and Column Store Nodes for Complex OLAP Queries