Updated on 2024-10-14 GMT+08:00

Query Execution Process

The process from receiving SQL statements to the statement execution by the SQL engine is shown in Figure 1 and described in Table 1. The texts in red are steps where database administrators can optimize queries.

Figure 1 Execution process of query-related SQL statements by the SQL engine
Table 1 Execution process of query-related SQL statements by the SQL engine

Step

Description

1. Perform syntax and lexical parsing.

Converts the input SQL statements from the string data type to the formatted structure stmt based on the specified SQL statement rules.

2. Perform semantic parsing.

Converts the formatted structure obtained from the previous step into objects that can be recognized by the database.

3. Rewrite the query statements.

Converts the output of the previous step into the structure that optimizes the query execution.

4. Optimize the query.

Determines the execution mode of SQL statements (the execution plan) based on the result obtained from the previous step and the internal database statistics. For details about how the internal database statistics and GUC parameters affect the query optimization (execution plan), see Optimizing Queries Using Statistics and Optimizing Queries Using GUC Parameters.

5. Perform the query.

Executes the SQL statements based on the execution path specified in the previous step. Selecting a proper underlying storage mode improves the query execution efficiency. For details, see Optimizing Queries Using the Underlying Storage.

Optimizing Queries Using Statistics

The GaussDB optimizer is a typical Cost-based Optimization (CBO). By using CBO, the database calculates the number of tuples and the execution cost for each execution step under each execution plan based on the number of table tuples, column width, NULL record ratio, and characteristic values, such as distinct, MCV, and HB values, and certain cost calculation methods. The database then selects the execution plan that takes the lowest cost for the overall execution or for the return of the first tuple. These characteristic values are the statistics, which is the core for optimizing a query. Accurate statistics helps the planner select the most appropriate query plan. Generally, you can collect statistics of a table or that of some columns in a table using ANALYZE. You are advised to periodically execute ANALYZE or execute it immediately after you modified most contents in a table.

Optimizing Queries Using GUC Parameters

Optimizing queries aims to select an efficient execution mode.

Take the following SQL statement as an example:

1
2
select count(1) 
from customer inner join store_sales on (ss_customer_sk = c_customer_sk);

During execution of customer inner join store_sales, GaussDB supports nested loop, merge join, and hash join. The optimizer estimates the result set sizes and the execution cost for each join mode based on the statistics on the customer and store_sales tables. It then compares the costs and selects the one costing the least.

As described in the preceding content, the execution cost is calculated based on certain methods and statistics. If the actual execution cost cannot be accurately estimated, you need to optimize the execution plan by setting the GUC parameters.

Optimizing Queries Using the Underlying Storage

GaussDB supports row- and column-store tables. The selection of an underlying storage mode strongly depends on specific customer service scenarios. You are advised to use column-store tables for computing service scenarios (mainly involving association and aggregation operations) and row-store tables for service scenarios, such as point queries and massive UPDATE or DELETE executions.

Optimization methods of each storage mode will be described in detail below.

Optimizing Queries by Rewriting SQL Statements

Besides the preceding methods that improve the performance of the execution plan generated by the SQL engine, database administrators can also enhance SQL statement performance by rewriting SQL statements while retaining the original service logic based on the execution mechanism of the database and abundant practices.

This requires that database administrators know the customer services well and have professional knowledge of SQL statements. Below chapters will describe some common SQL rewriting scenarios.