Updated on 2024-05-07 GMT+08:00

Optimizing Operators

Context

A query statement needs to go through multiple operator procedures to generate the final result. Sometimes, the overall query performance deteriorates due to long execution time of certain operators, which are regarded as bottleneck operators. In this case, you need to execute the EXPLAIN ANALYZE or EXPLAIN PERFORMANCE command to view the bottleneck operators, and then perform optimization.

For example, in the following execution process, the execution time of the Hashagg operator accounts for about 66% [(51016-13535)/56476 ≈ 66%] of the total execution time. Therefore, the Hashagg operator is the bottleneck operator for this query. Optimize this operator first.

Examples

Example 1: Scan the base table. For queries requiring large volume of data filtering, such as point queries or queries that need range scanning, a full table scan using SeqScan will take a long time. To facilitate scanning, you can create indexes on the condition column and select IndexScan for index scanning.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
gaussdb=#  explain (analyze on, costs off) select * from t1 where c1=10004;
 id |     operation      |  A-time  | A-rows | Peak Memory | A-width 
----+--------------------+----------+--------+-------------+---------
  1 | ->  Seq Scan on t1 | 2053.069 |      7 | 64KB        | 
(1 row)
 
 Predicate Information (identified by plan id) 
-----------------------------------------------
   1 --Seq Scan on t1
         Filter: (c1 = 10004)
         Rows Removed by Filter: 110000
(3 rows)
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
gaussdb=#  create index idx on t1(c1);
CREATE INDEX
gaussdb=#  explain (analyze on, costs off) select * from t1 where c1=10004;
 id |           operation            | A-time | A-rows | Peak Memory | A-width 
----+--------------------------------+--------+--------+-------------+---------
  1 | ->  Index Scan using idx on t1 | 0.227  |      7 | 77KB        | 
(1 row)
 
 Predicate Information (identified by plan id) 
-----------------------------------------------
   1 --Index Scan using idx on t1
         Index Cond: (c1 = 10004)
(2 rows)

In this example, the full table scan filters large amounts of data and returns 7 records. After an index has been created on the c1 column, the scanning efficiency is significantly boosted and the duration of IndexScan is reduced from 2s to 0.2 ms.

Example 2: If NestLoop is used for joining tables with a large number of rows, the join may take a long time. In the following example, NestLoop takes 27s. If enable_mergejoin is set to off to disable merge join and enable_nestloop is set to off to disable NestLoop so that the optimizer selects hash join, the join duration is reduced to 2s.

gaussdb=#  explain analyze select count(*) from t1,t2 where t1.c1=t2.c2;
 id |          operation          |  A-time   |  A-rows  | E-rows | Peak Memory | A-width | E-width |         E-costs          
----+-----------------------------+-----------+----------+--------+-------------+---------+---------+--------------------------
  1 | ->  Aggregate               | 27544.545 |        1 |      1 | 15KB        |         |       8 | 1046289.218..1046289.228
  2 |    ->  Nested Loop (3,4)    | 27544.028 |     1992 |   3133 | 8KB         |         |       0 | 0.000..1046281.385
  3 |       ->  Seq Scan on t1    | 2043.884  |   110007 | 118967 | 61KB        |         |       4 | 0.000..157587.670
  4 |       ->  Materialize       | 6877.119  | 54783486 |    498 | 65KB        |         |       4 | 0.000..11.470
  5 |          ->  Seq Scan on t2 | 0.430     |      498 |    498 | 57KB        |         |       4 | 0.000..8.980
(5 rows)

After the parameters are set:

gaussdb=#  set enable_mergejoin=off;
SET
gaussdb=#  set enable_nestloop=off;
SET
gaussdb=#  explain analyze select count(*) from t1,t2 where t1.c1=t2.c2;
 id |          operation          |  A-time  | A-rows | E-rows | Peak Memory | A-width | E-width |        E-costs         
----+-----------------------------+----------+--------+--------+-------------+---------+---------+------------------------
  1 | ->  Aggregate               | 2103.025 |      1 |      1 | 15KB        |         |       8 | 158088.164..158088.174
  2 |    ->  Hash Join (3,4)      | 2102.536 |   1992 |   3133 | 10KB        |         |       0 | 15.205..158080.331
  3 |       ->  Seq Scan on t1    | 2063.595 | 110007 | 118967 | 61KB        |         |       4 | 0.000..157587.670
  4 |       ->  Hash              | 0.753    |    498 |    498 | 296KB       |         |       4 | 8.980..8.980
  5 |          ->  Seq Scan on t2 | 0.480    |    498 |    498 | 57KB        |         |       4 | 0.000..8.980
(5 rows)

3. Generally, query performance can be improved by selecting HashAgg. If Sort and GroupAgg are used for a large result set, you need to set enable_sort to off. HashAgg consumes less time than Sort and GroupAgg.

gaussdb=#  explain analyze select count(*) from t1 group by c1;
 id |        operation         |  A-time  | A-rows | E-rows | Peak Memory | A-width | E-width |        E-costs         
----+--------------------------+----------+--------+--------+-------------+---------+---------+------------------------
  1 | ->  GroupAggregate       | 2417.004 |  60000 |  18909 | 17KB        |         |      12 | 167616.708..168698.051
  2 |    ->  Sort              | 2304.329 | 200016 | 118967 | 26466KB     |         |       4 | 167616.708..167914.126
  3 |       ->  Seq Scan on t1 | 2125.464 | 200016 | 118967 | 58KB        |         |       4 | 0.000..157587.670
(3 rows) 

After the parameters are set:

gaussdb=#  set enable_sort=off;
SET
gaussdb=#  explain analyze select count(*) from t1 group by c1;
 id |       operation       |  A-time  | A-rows | E-rows | Peak Memory | A-width | E-width |        E-costs         
----+-----------------------+----------+--------+--------+-------------+---------+---------+------------------------
  1 | ->  HashAggregate     | 2324.062 |  60000 |  39912 | 10066KB     |         |       4 | 159297.545..159696.665
  2 |    ->  Seq Scan on t1 | 2131.319 | 200016 | 193303 | 58KB        |         |       4 | 0.000..158331.030
(2 rows)