Distributed Plan

Currently, the GaussDB optimizer can use three methods to develop statement execution policies in the distributed framework: generating a statement pushdown plan, a distributed execution plan, or a distributed execution plan for sending statements.

A statement pushdown plan pushes query statements from a CN down to DNs for execution and returns the execution results to the CN.
In a distributed execution plan, a CN compiles and optimizes query statements, generates a plan tree, and then sends the plan tree to DNs for execution. After the statements have been executed, execution results will be returned to the CN.
A distributed execution plan for sending statements pushes queries that can be pushed down (mostly base table scanning statements) to DNs for execution. Then, the plan obtains the intermediate results and sends them to the CN, on which the remaining queries are to be executed.

Statement pushdown plan:

gaussdb=#CREATE TABLE t1(c1 int,c2 int);
gaussdb=#CREATE TABLE t2(c1 int,c2 int);
gaussdb=#EXPLAIN SELECT * FROM t1;
                    QUERY PLAN
--------------------------------------------------
 Data Node Scan  (cost=0.00..0.00 rows=0 width=0)
   Node/s: All datanodes
(2 rows)

Interpretation of plan columns:

Data Node Scan
The SQL statement is pushed down to DNs for plan generation and execution.
Node/s: All datanodes
The SQL statement will be executed on all DNs.

Run the EXPLAIN VERBOSE command to view the delivered SQL statements.

gaussdb=#EXPLAIN VERBOSE SELECT * FROM t1;
                    QUERY PLAN
--------------------------------------------------
 Data Node Scan  (cost=0.00..0.00 rows=0 width=0)
   Output: t1.c1, t1.c2
   Node/s: All datanodes
   Remote query: SELECT c1, c2 FROM public.t1
(4 rows)

The execution plan contains the Remote query column, which describes the delivered SQL statement. To further view the plan after the SQL statement is pushed down to DNs, set the max_datanode_for_plan parameter. This parameter indicates the number of plans on DNs. The number of plans on DNs is determined by the smaller one between the number of DNs in the cluster and the value of this parameter.

gaussdb=#SET max_datanode_for_plan = 1;
gaussdb=#EXPLAIN VERBOSE SELECT * FROM t1;
                        QUERY PLAN
-----------------------------------------------------------
 Data Node Scan  (cost=0.00..0.00 rows=0 width=0)
   Output: t1.c1, t1.c2
   Node/s: All datanodes
   Remote query: SELECT c1, c2 FROM public.t1

 Remote SQL: SELECT c1, c2 FROM public.t1
 Datanode Name: datanode1
   Seq Scan on public.t1  (cost=0.00..1.06 rows=6 width=8)
     Output: c1, c2

(10 rows)

The plan on datanode1 is displayed.

Distributed execution plan:

If an SQL statement cannot be pushed down, a distributed execution plan is generated first.

gaussdb=# SET query_mem = '256MB';
SET
gaussdb=#EXPLAIN SELECT * FROM t1 JOIN t2 ON t1.c1 = t1.c2;
 id |              operation               | E-rows | E-memory | E-width | E-costs
----+--------------------------------------+--------+----------+---------+---------
  1 | ->  Streaming (type: GATHER)         |     10 |          |      16 | 2.79
  2 |    ->  Nested Loop (3,5)             |     10 | 1MB      |      16 | 2.32
  3 |       ->  Streaming(type: BROADCAST) |      2 | 2MB      |       8 | 1.22
  4 |          ->  Seq Scan on t1          |      1 | 1MB      |       8 | 1.06
  5 |       ->  Seq Scan on t2             |     10 | 1MB      |       8 | 1.05
(5 rows)

 Predicate Information (identified by plan id)
-----------------------------------------------
   4 --Seq Scan on t1
         Filter: (c1 = c2)
(2 rows)

   ====== Query Summary =====
---------------------------------
 System available mem: 8193638KB
 Query Max mem: 8280473KB
 Query estimated mem: 5152KB
(3 rows)

When query_mem is set to 0, the information in Query Summary is not displayed.

A distributed execution plan contains a special streaming operator. It implements the core data shuffle function of the distributed architecture. Streaming has three types, which correspond to different data shuffle functions in the distributed architecture:

Streaming (type: GATHER): The CN collects data from DNs.
Streaming (type: REDISTRIBUTE): Data is redistributed to all the DNs based on selected columns.
Streaming (type: BROADCAST): Data on the current DN is broadcast to other DNs.

Distributed execution plan for sending statements:

If neither of the two execution plans can be generated, an execution plan for sending statements is generated.

gaussdb=#ANALYZE t1;
ANALYZE
gaussdb=# EXPLAIN SELECT * FROM t1 limit 1;
                                     QUERY PLAN
-------------------------------------------------------------------------------------
 Limit  (cost=0.00..0.00 rows=1 width=15)
   ->  Data Node Scan on "__REMOTE_LIMIT_QUERY__"  (cost=0.00..0.00 rows=1 width=15)
         Node/s: All datanodes
(3 rows)


-- Drop the table.
gaussdb=#DROP TABLE t1,t2;

A CN with the LIMIT statement cannot simply send statements to DNs and collect data, which is inconsistent with the semantics of the LIMIT statement. The execution plan distributes the LIMIT 1 statement to all DNs. After the intermediate result is collected to the CN, the CN performs LIMIT processing.

Parent topic: Execution Plan Details

Previous topic: Execution Plans

Next topic: Execution Information