Using the Vectorized Executor for Tuning

The GaussDB database supports row executors and vectorized executors for processing row-store tables and column-store tables, respectively. Column-store tables and vectorized executors have the following advantages:

More data is read in one batch at a time, saving I/O resources.
There are a large number of records in a batch, and the CPU cache hit rate increases.
The number of function calls is small in pipeline mode.
A batch of data is processed at a time, which is efficient.

GaussDB achieves better query performance in complex analytical queries. However, column-store tables do not perform well in data insertion and update. Therefore, column-store tables cannot be used for services with frequent data insertion and update.

To improve the query performance of row-store tables in complex analytical queries, GaussDB provides vectorized executors for processing row-store tables. You can set try_vector_engine_strategy to convert query statements containing row-store tables into vectorized execution plans for execution.

The conversion is not applicable to all query scenarios. If a query statement contains operations such as expression calculation, multi-table join, and aggregation, the performance can be improved by converting the statement to a vectorized execution plan. Theoretically, converting a row-store table to a vectorized execution plan causes conversion overheads and performance deterioration. After the foregoing expression calculation, join operation, and aggregation operations are converted into vectorized execution plans, performance can be improved. The performance improvement must be higher than the overheads generated by the conversion. This determines whether the conversion is required.

Take TPCH Q1 as an example. When a row executor is used, the execution time of the scan operators is 405210 ms, and the execution time of the aggregation operation is 2618964 ms. After a vectorized executor is used, the execution time of the scan operators (SeqScan and VectorAdapter) is 470840 ms, and the execution time of the aggregation operation is 212384 ms. As such, the query performance is improved.

Execution plan of the TPCH Q1 row executor:

                                                                QUERY PLAN                                                                 
-------------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=43539570.49..43539570.50 rows=6 width=260) (actual time=3024174.439..3024174.439 rows=4 loops=1)
   Sort Key: l_returnflag, l_linestatus
   Sort Method: quicksort  Memory: 25kB
   ->  HashAggregate  (cost=43539570.30..43539570.41 rows=6 width=260) (actual time=3024174.396..3024174.403 rows=4 loops=1)
         Group By Key: l_returnflag, l_linestatus
         ->  Seq Scan on lineitem  (cost=0.00..19904554.46 rows=590875396 width=28) (actual time=0.016..405210.038 rows=596140342 loops=1)
               Filter: (l_shipdate <= '1998-10-01 00:00:00'::timestamp without time zone)
               Rows Removed by Filter: 3897560
 Total runtime: 3024174.578 ms
(9 rows)

Execution plan of the TPCH Q1 vectorized executor:

                                                                             QUERY PLAN                                                                             
--------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Row Adapter  (cost=43825808.18..43825808.18 rows=6 width=298) (actual time=683224.925..683224.927 rows=4 loops=1)
   ->  Vector Sort  (cost=43825808.16..43825808.18 rows=6 width=298) (actual time=683224.919..683224.919 rows=4 loops=1)
         Sort Key: l_returnflag, l_linestatus
         Sort Method: quicksort  Memory: 3kB
         ->  Vector Sonic Hash Aggregate  (cost=43825807.98..43825808.08 rows=6 width=298) (actual time=683224.837..683224.837 rows=4 loops=1)
               Group By Key: l_returnflag, l_linestatus
               ->  Vector Adapter(type: BATCH MODE)  (cost=19966853.54..19966853.54 rows=596473861 width=66) (actual time=0.982..470840.274 rows=596140342 loops=1)
                     Filter: (l_shipdate <= '1998-10-01 00:00:00'::timestamp without time zone)
                     Rows Removed by Filter: 3897560
                     ->  Seq Scan on lineitem  (cost=0.00..19966853.54 rows=596473861 width=66) (actual time=0.364..199301.737 rows=600037902 loops=1)
 Total runtime: 683225.564 ms
(11 rows)

Parent topic: SQL Optimization

Previous topic: Checking the Implicit Conversion Performance

Next topic: Optimization Cases