Updated on 2024-05-21 GMT+08:00

Overview

Parallel query (PQ) reduces the processing time of analytical queries to satisfy the low latency requirements of enterprise-class applications. It distributes a query task to multiple CPU cores for computation to shorten the query time. Theoretically, the performance improvement of parallel query is positively correlated with the number of CPU cores. The more CPU cores are used, the higher the performance improvement is.

The following figure shows the count(*) process for a table based on parallel query. Table data is divided into blocks and distributed to multiple cores for parallel computing. Each core processes some data to obtain an intermediate count(*) result, and all the intermediate results are aggregated to obtain the final result.

Figure 1 Parallel query

Prerequisites

The engine version of GaussDB(for MySQL) is MySQL 8.0.22 or later.

Scenarios

Parallel query is mainly suitable for SELECT statements to query large tables, multiple tables, and a large amount of data. This feature does not benefit extremely short queries.

  • Lightweight analysis

    The SQL statements for report queries are complex and time-consuming. Parallel query can improve the efficiency of a single query.

  • More available system resources

    Parallel query requires more system resources. You can enable parallel query to improve resource utilization and query efficiency only when the system has a large number of CPUs, low I/O loads, and sufficient memory resources.

  • Frequent data queries

    For data-intensive queries, you can use parallel query to improve query processing efficiency, ease network traffic, and reduce pressure on compute nodes.

Both read replicas and primary nodes support parallel query. Parallel query consumes a lot of computing resources (such as CPU and memory). To ensure instance stability, parallel query does not take effect on primary nodes of instance 2.0.42.230600 and later versions by default. To use parallel query, contact customer service.