Updated on 2024-06-03 GMT+08:00

Optimizing Operators

Background

A query statement needs to go through multiple operator procedures to generate the final result. Sometimes, the overall query performance deteriorates due to long execution time of certain operators, which are regarded as bottleneck operators. In this case, you need to run the EXPLAIN ANALYZE or EXPLAIN PERFORMANCE command to view the bottleneck operators, and then perform optimization.

For example, in the following execution process, the execution time of the Hashagg operator accounts for about 66% [(51016-13535)/56476 ≈ 66%] of the total execution time. Therefore, the Hashagg operator is the bottleneck operator for this query. Optimize this operator first.

Example

Example 1: Scan the base table. For queries requiring large volume of data filtering, such as point queries or queries that need range scanning, a full table scan using SeqScan will take a long time. To facilitate scanning, you can create indexes on the condition column and select IndexScan for index scanning.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
gaussdb=#  explain (analyze on,costs off) select * from t1 where c2=10004;
 id |          operation           |     A-time      | A-rows | Peak Memory | A-width 
----+------------------------------+-----------------+--------+-------------+---------
  1 | ->  Streaming (type: GATHER) | 20.040          |      5 | 85KB        | 
  2 |    ->  Seq Scan on t1        | [17.239,17.376] |      5 | [18KB,18KB] | 
(2 rows)
 
 Predicate Information (identified by plan id) 
-----------------------------------------------
   2 --Seq Scan on t1
         Filter: (c2 = 10004)
         Rows Removed by Filter: 90002
(3 rows)
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
gaussdb=#  create index idx on t1(c2);
CREATE INDEX
gaussdb=#  explain (analyze on,costs off) select * from t1 where c2=10004;
 id |             operation             |    A-time     | A-rows | Peak Memory | A-width 
----+-----------------------------------+---------------+--------+-------------+---------
  1 | ->  Streaming (type: GATHER)      | 3.206         |      5 | 85KB        | 
  2 |    ->  Index Scan using idx on t1 | [0.122,0.146] |      5 | [73KB,73KB] | 
(2 rows)
 
 Predicate Information (identified by plan id) 
-----------------------------------------------
   2 --Index Scan using idx on t1
         Index Cond: (c2 = 10004)
(2 rows)

In this example, the full table scan filters large amounts of data and returns 5 records. After an index has been created on the c2 column, the scanning efficiency is significantly boosted and the duration of IndexScan is reduced from 20 ms to 3 ms.

Example 2: If NestLoop is used for joining tables with a large number of rows, the join may take a long time. In the following example, NestLoop takes 5s. If enable_mergejoin is set to off to disable merge join and enable_nestloop is set to off to disable NestLoop so that the optimizer selects hash join, the join duration is reduced to 86 ms.

gaussdb=#  explain analyze select count(*) from t2,t1 where t1.c1=t2.c2;
 id |                    operation                     |       A-time        |  A-rows  | E-rows | Peak Memory | A-width | E-width | E-costs 
----+--------------------------------------------------+---------------------+----------+--------+-------------+---------+---------+---------
  1 | ->  Aggregate                                    | 5070.296            |        1 |      1 | 14KB        |         |       8 | 2148.49
  2 |    ->  Streaming (type: GATHER)                  | 5070.219            |        2 |      2 | 81KB        |         |       8 | 2148.49
  3 |       ->  Aggregate                              | [4828.705,5062.289] |        2 |      2 | [11KB,11KB] |         |       8 | 2148.40
  4 |          ->  Nested Loop (5,6)                   | [4828.565,5062.142] |      996 |     40 | [4KB,4KB]   |         |       0 | 2148.34
  5 |             ->  Seq Scan on t1                   | [13.574,14.508]     |    90007 |  20000 | [15KB,15KB] |         |       4 | 184.00
  6 |             ->  Materialize                      | [1508.956,1579.488] | 22413670 |     20 | [35KB,36KB] |         |       4 | 14.37
  7 |                ->  Streaming(type: REDISTRIBUTE) | [55.825,56.842]     |      498 |     20 | [44KB,44KB] |         |       4 | 14.31
  8 |                   ->  Seq Scan on t2             | [0.105,0.132]       |      498 |     20 | [13KB,13KB] |         |       4 | 13.13
(8 rows)
 
 Predicate Information (identified by plan id) 
-----------------------------------------------
   4 --Nested Loop (5,6)
         Join Filter: (t2.c2 = t1.c1)
         Rows Removed by Join Filter: 22412672
(3 rows)

After the parameters are set:

gaussdb=#  set enable_mergejoin=off;
SET
gaussdb=#  set enable_nestloop=off;
SET
gaussdb=#  explain analyze select count(*) from t2,t1 where t1.c1=t2.c2;
 id |                    operation                     |     A-time      | A-rows | E-rows |  Peak Memory  | A-width | E-width | E-costs 
----+--------------------------------------------------+-----------------+--------+--------+---------------+---------+---------+---------
  1 | ->  Aggregate                                    | 92.911          |      1 |      1 | 14KB          |         |       8 | 224.45
  2 |    ->  Streaming (type: GATHER)                  | 92.855          |      2 |      2 | 81KB          |         |       8 | 224.45
  3 |       ->  Aggregate                              | [84.295,87.102] |      2 |      2 | [11KB,11KB]   |         |       8 | 224.36
  4 |          ->  Hash Join (5,6)                     | [84.171,86.966] |    996 |     40 | [6KB,6KB]     |         |       0 | 224.30
  5 |             ->  Seq Scan on t1                   | [11.885,13.103] |  90007 |  20000 | [15KB,15KB]   |         |       4 | 184.00
  6 |             ->  Hash                             | [55.895,56.072] |    498 |     21 | [292KB,292KB] | [20,20] |       4 | 14.31
  7 |                ->  Streaming(type: REDISTRIBUTE) | [55.601,55.771] |    498 |     20 | [44KB,44KB]   |         |       4 | 14.31
  8 |                   ->  Seq Scan on t2             | [0.118,0.143]   |    498 |     20 | [13KB,13KB]   |         |       4 | 13.13
(8 rows)
 
 Predicate Information (identified by plan id) 
-----------------------------------------------
   4 --Hash Join (5,6)
         Hash Cond: (t1.c1 = t2.c2)
(2 rows)

Example 3: Generally, query performance can be improved by selecting HashAgg. If Sort and GroupAgg are used for a large result set, you need to set enable_sort to off. HashAgg consumes less time than Sort and GroupAgg.

gaussdb=#  explain analyze select count(*) from t1 group by c2;
 id |             operation              |     A-time      | A-rows | E-rows | Peak Memory | E-memory |     A-width     | E-width | E-costs 
----+------------------------------------+-----------------+--------+--------+-------------+----------+-----------------+---------+---------
  1 | ->  GroupAggregate                 | 244.817         |  40000 |   5000 | 15KB        |          |                 |      12 | 2131.52
  2 |    ->  Sort                        | 156.344         |  40000 |  10000 | 5603KB      |          |                 |      12 | 2131.52
  3 |       ->  Streaming (type: GATHER) | 91.595          |  40000 |  10000 | 82KB        |          |                 |      12 | 1442.14
  4 |          ->  GroupAggregate        | [90.317,96.852] |  40000 |  10000 | [12KB,12KB] | 16MB     |                 |      12 | 973.39
  5 |             ->  Sort               | [59.775,64.724] |  90007 |  20000 | [5MB,5MB]   | 16MB     | [896220,903920] |       4 | 873.39
  6 |                ->  Seq Scan on t1  | [18.092,21.033] |  90007 |  20000 | [12KB,12KB] | 1MB      |                 |       4 | 184.00
(6 rows)

After the parameters are set:

gaussdb=#  set enable_sort=off;
SET
gaussdb=#  explain analyze select count(*) from t1 group by c2;
 id |            operation            |     A-time      | A-rows | E-rows | Peak Memory | E-memory | A-width | E-width | E-costs 
----+---------------------------------+-----------------+--------+--------+-------------+----------+---------+---------+---------
  1 | ->  HashAggregate               | 228.260         |  40000 |   5000 | 6663KB      |          |         |      12 | 752.75
  2 |    ->  Streaming (type: GATHER) | 95.506          |  40000 |  10000 | 82KB        |          |         |      12 | 752.75
  3 |       ->  HashAggregate         | [63.974,71.290] |  40000 |  10000 | [3MB,3MB]   | 16MB     | [20,20] |      12 | 284.00
  4 |          ->  Seq Scan on t1     | [17.578,21.204] |  90007 |  20000 | [12KB,12KB] | 1MB      |         |       4 | 184.00
(4 rows)