Updated on 2025-03-13 GMT+08:00

Execution Plans

The following SQL statement in pretty mode is used as an example:

1
2
3
4
5
gaussdb=# DROP TABLE IF EXISTS t1;
gaussdb=# DROP TABLE IF EXISTS t2;
gaussdb=# CREATE TABLE t1 (c1 int, c2 int);
gaussdb=# CREATE TABLE t2 (c1 int, c2 int);
gaussdb=# SET explain_perf_mode = pretty;
Run the EXPLAIN command and the output is as follows:
gaussdb=#  EXPLAIN SELECT * FROM t1,t2 WHERE t1.c1 = t2.c2;
 id |                operation                | E-rows | E-width | E-costs
----+-----------------------------------------+--------+---------+---------
  1 | ->  Streaming (type: GATHER)            |     20 |      16 | 28.69
  2 |    ->  Hash Join (3,5)                  |     20 |      16 | 27.75
  3 |       ->  Streaming(type: REDISTRIBUTE) |     20 |       8 | 14.31
  4 |          ->  Seq Scan on t2             |     20 |       8 | 13.13
  5 |       ->  Hash                          |     21 |       8 | 13.13
  6 |          ->  Seq Scan on t1             |     20 |       8 | 13.13
(6 rows)

 Predicate Information (identified by plan id)
-----------------------------------------------
   2 --Hash Join (3,5)
         Hash Cond: (t2.c2 = t1.c1)
(2 rows)

-- Drop the table.
gaussdb=# DROP TABLE t1,t2;
  • Interpretation of the execution plan column (horizontal):
    • id: execution operator node ID.
    • operation: name of an execution operator.
    • E-rows: number of output rows estimated by each operator.
    • E-width: estimated width of an output tuple of each operator.
    • E-costs: execution cost estimated by each operator.
      • E-costs is measured by the optimizer based on an overhead unit. Usually, fetching a disk page sequence is defined as a unit. Other overhead parameters are set based on the unit.
      • The overhead of each node (specified by E-costs) includes the overheads of all its child nodes.
      • Such an overhead reflects only what the optimizer is concerned about, but does not consider the time for passing result rows to the client. Although the time may play a very important role in the actual total time, it is ignored by the optimizer, because it cannot be changed by modifying the plan.
  • Interpretation of the execution plan layers (vertical). The plan tree is from bottom to top. In pretty mode, operators at each layer are indented with ->. The bottom layer is generally a scan operator.
    • Operators id-4 and id-6 are the scan operators at the bottom layer.
      1. Seq Scan on t1

        The table scan operator scans the table t1 using Seq Scan. At this layer, data in the table t1 is read from a buffer or disk, and then transferred to the upper-layer node for hash join calculation.

      2. Seq Scan on t2

        The table scan operator scans the table t2 using Seq Scan. At this layer, data in the table t2 is read from a buffer or disk, and then transferred to the upper-layer node for calculation.

    • Layer 2: Hash

      The hash operator calculates the hash value of the operator transferred from the lower layer and generates a hash table for subsequent hash join operations.

    • Layer 2: Streaming(type: REDISTRIBUTE)

      Streaming is a special operator that implements the core data shuffle function of the distributed architecture. Streaming (type: REDISTRIBUTE) redistributes data to all DNs based on the selected column. This operator redistributes data of t2 to each DN based on the join column to prepare for t1 join. For details about the Streaming operator, see the subsequent operator description.

    • Layer 3: Hash Join (3,5)

      The hash join operator is used to join data in the t1 and t2 tables in hash join mode and output the result data. (3,5) indicates that the result of the operator whose ID is 3 is joined with the result of the operator whose ID is 5.

    • Layer 4: Streaming (type: GATHER)

      Streaming (type: GATHER) collects data from DNs for the CN.