Updated on 2024-09-06 GMT+08:00

Overview

What Is Parallel Query?

Parallel query (PQ) reduces the processing time of analytical queries to satisfy the low latency requirements of enterprise-class applications. It distributes a query task to multiple CPU cores for computation to shorten the query time. Theoretically, the performance improvement of parallel query is positively correlated with the number of CPU cores. The more CPU cores are used, the higher the performance improvement is.

The following figure shows the count(*) process for a table based on parallel query. Table data is divided into blocks and distributed to multiple cores for parallel computing. Each core processes some data to obtain an intermediate count(*) result, and all the intermediate results are aggregated to obtain the final result.

Figure 1 How PQ works

Scenarios

Parallel query is mainly suitable for SELECT statements to query large tables, multiple tables, and a large amount of data. This feature does not benefit extremely short queries.

  • Lightweight analysis

    The SQL statements for report queries are complex and time-consuming. Parallel query can improve the efficiency of a single query.

  • More available system resources

    Parallel query requires more system resources. You can enable parallel query to improve resource utilization and query efficiency only when the system has a large number of CPUs, low I/O loads, and sufficient memory resources.

  • Frequent data queries

    For data-intensive queries, you can use parallel query to improve query processing efficiency, ease network traffic, and reduce pressure on compute nodes.