Help Center/ GaussDB/ Developer Guide(Distributed_V2.0-8.x)/ SQL Optimization/ Optimization Cases/ Case: Adjusting the GUC Parameter costbased_rewrite_rule for Cost-based Query Rewriting

Updated on 2025-05-29 GMT+08:00

View PDF

Case: Adjusting the GUC Parameter costbased_rewrite_rule for Cost-based Query Rewriting

Query rewriting allows the optimizer to select different execution paths. However, performance will not be improved after some rules rewrite the SQL statements. A rewriting error may result in a 1,000-fold performance difference. Therefore, in the query rewriting phase, the cost-based evaluation policy needs to be supported for such rules to determine whether to apply the rules to rewrite SQL statements. These rules are controlled by the GUC parameter costbased_rewrite_rule. For details about the controlled rules, see Table 1.

**Table 1** Control rules supported by GUC costbased_rewrite_rule
Parameter	Description
none	Does not use any cost-based query rewriting policy.
pullup_subquery	The cost-based query rewriting policy is enabled for the rewriting rules of simple subquery expansion.
pullup_sublink_any_exists	The cost-based query rewriting policy is enabled for the rewriting rules of non-correlated sublinks of the ANY type and related sublinks of the [NOT] EXISTS type in a single or AND condition.
pullup_not_in_sublink	The cost-based query rewriting policy is enabled for the rewriting rules of non-correlated sublinks of the NOT IN type in a single or AND condition. This parameter is mutually exclusive with the disable_pullup_not_in_sublink parameter of the GUC parameter rewrite_rule. When this parameter is enabled, the disable_pullup_not_in_sublink parameter of the GUC parameter rewrite_rule does not take effect.
pullup_expr_sublink	This parameter takes effect in the rewriting scenarios of expression sublinks, non-correlated sublinks of the ANY type, and related sublinks of [NOT] the EXISTS type in the OR condition. This parameter is mutually exclusive with the disable_pullup_expr_sublink, enable_sublink_pullup_enhanced, and magicset parameters of the GUC parameter rewrite_rule. When this parameter is enabled, the related parameters of the GUC parameter rewrite_rule do not take effect.
intargetlist	The cost-based query rewriting policy is enabled for the rewriting rules of the sublinks of related expression in the TargetList. This parameter is mutually exclusive with the intargetlist and magicset parameters of the GUC parameter rewrite_rule. When this parameter is enabled, the intargetlist and magicset parameters of the GUC parameter rewrite_rule do not take effect.
enable_sublink_pullup_enhanced	The cost-based query rewriting policy is enabled for the rewriting rules of expression sublinks in the enhanced scenario. This parameter is affected by the pullup_expr_sublink parameter. This parameter takes effect only when the pullup_expr_sublink parameter is enabled for the expression sublink rewriting scenario in the AND condition. This parameter is mutually exclusive with the enable_sublink_pullup_enhanced parameter of the GUC parameter rewrite_rule. When this parameter is enabled, the enable_sublink_pullup_enhanced parameter of the GUC parameter rewrite_rule does not take effect.

The following uses typical cases to describe the application scenarios of the query rewriting rules controlled by the preceding parameters.

In distributed scenarios, the cost-based rewriting policy preferentially selects the STREAM plan. If the STREAM plan cannot be generated before and after the rewriting, the STREAM plan is degraded to the PGXC plan. However, if the generated PGXC plan is pushed down to DNs, the cost information on the CNs is inaccurate. Therefore, when a plan is generated in this scenario, query rewriting is performed based on rules. See an example in Using Rule-based Query Rewriting when PGXC Plans Are Generated.

Preparing the Case Environment

To demonstrate rule application scenarios, you need to prepare the following table creation statements:

SET client_min_messages = warning;
SET CLIENT_ENCODING = 'UTF8';
-- Clean the environment.
DROP SCHEMA IF EXISTS costbased_rule_test cascade;
CREATE SCHEMA costbased_rule_test;
SET current_schema = costbased_rule_test;
SET enable_codegen = off;
DROP TABLE IF EXISTS costbased_rule_test.ct1;
DROP TABLE IF EXISTS costbased_rule_test.ct2;
DROP TABLE IF EXISTS costbased_rule_test.ct3;
DROP TABLE IF EXISTS costbased_rule_test.ct4;
-- Create a test table.
CREATE TABLE ct1 (a INT, b INT, c INT, d INT);
CREATE TABLE ct2 (a INT, b INT, c INT, d INT);
CREATE TABLE ct3 (a INT, b INT, c INT, d INT);
CREATE TABLE ct4 (a INT, b INT, c INT, d INT);
CREATE INDEX idx_ct1_b ON ct1(b);
CREATE INDEX idx_ct2_c ON ct2(c);
CREATE INDEX idx_ct3_c ON ct3(c);
-- Insert data.
INSERT INTO ct1 (a, b, c) VALUES (generate_series(1, 100), generate_series(200, 300), left(random()::int, 100));
INSERT INTO ct2 VALUES(1,2,3,4),(3,4,5,6);
INSERT INTO ct3 (a, b, c, d) VALUES (generate_series(1, 10), generate_series(20, 30), left(random()::int, 10), left(random()::int, 10));
-- Update statistics.
ANALYZE ct1;
ANALYZE ct2;
ANALYZE ct3;

Rewriting Rules of Non-correlated Sublinks in the Having Condition

SQL statements can be rewritten when the Having condition contains non-correlated expression sublinks with AGG. If the cost-based query rewriting evaluation policy is disabled, sublinks are rewritten based on rules. The generation plan is as follows:

gaussdb=# SET rewrite_rule=enable_sublink_pullup_enhanced;
SET
gaussdb=# SET costbased_rewrite_rule=none;
SET
gaussdb=#  EXPLAIN SELECT ct2.a, SUM(ct2.b) AS value FROM ct2,ct1 WHERE ct2.a = ct1.b GROUP BY ct2.a HAVING SUM(ct2.b) >= (SELECT AVG(ct2.b) FROM ct2,ct1 WHERE ct2.a = ct1.b) ORDER BY value DESC;
                                                                    QUERY PLAN                                                       

-------------------------------------------------------------------------------------------------------------------------------------
-------------
 Streaming (type: GATHER)  (cost=5.31..5.37 rows=3 width=12)
   Merge Sort Key: (pg_catalog.sum((sum(costbased_rule_test.ct2.b)))) DESC
   Node/s: All datanodes
   ->  Sort  (cost=5.24..5.25 rows=3 width=12)
         Sort Key: (pg_catalog.sum((sum(costbased_rule_test.ct2.b)))) DESC
         ->  Nested Loop  (cost=5.13..5.23 rows=1 width=12)
               Join Filter: (((pg_catalog.sum((sum(costbased_rule_test.ct2.b)))))::numeric >= (pg_catalog.avg((avg(costbased_rule_tes
t.ct2.b)))))
               ->  HashAggregate  (cost=2.79..2.80 rows=2 width=24)
                     Group By Key: costbased_rule_test.ct2.a
                     ->  Streaming(type: REDISTRIBUTE)  (cost=2.66..2.76 rows=6 width=24)
                           Spawn on: All datanodes
                           ->  HashAggregate  (cost=2.66..2.68 rows=6 width=24)
                                 Group By Key: costbased_rule_test.ct2.a
                                 ->  Merge Join  (cost=1.46..2.16 rows=200 width=8)
                                       Merge Cond: (costbased_rule_test.ct1.b = costbased_rule_test.ct2.a)
                                       ->  Index Only Scan using idx_ct1_b on ct1  (cost=0.00..90.75 rows=10100 width=4)
                                       ->  Sort  (cost=1.29..1.30 rows=6 width=8)
                                             Sort Key: costbased_rule_test.ct2.a
                                             ->  Streaming(type: BROADCAST)  (cost=0.00..1.28 rows=6 width=8)
                                                   Spawn on: All datanodes
                                                   ->  Seq Scan on ct2  (cost=0.00..1.01 rows=2 width=8)
               ->  Materialize  (cost=2.33..2.41 rows=3 width=32)
                     ->  Aggregate  (cost=2.33..2.40 rows=3 width=36)
                           ->  Streaming(type: BROADCAST)  (cost=2.33..2.40 rows=9 width=36)
                                 Spawn on: All datanodes
                                 ->  Aggregate  (cost=2.33..2.34 rows=3 width=36)
                                       ->  Merge Join  (cost=1.46..2.16 rows=200 width=4)
                                             Merge Cond: (costbased_rule_test.ct1.b = costbased_rule_test.ct2.a)
                                             ->  Index Only Scan using idx_ct1_b on ct1  (cost=0.00..90.75 rows=10100 width=4)
                                             ->  Sort  (cost=1.29..1.30 rows=6 width=8)
                                                   Sort Key: costbased_rule_test.ct2.a
                                                   ->  Streaming(type: BROADCAST)  (cost=0.00..1.28 rows=6 width=8)
                                                         Spawn on: All datanodes
                                                         ->  Seq Scan on ct2  (cost=0.00..1.01 rows=2 width=8)
(34 rows)

When costbased_rewrite_rule is set to pullup_expr_sublink,enable_sublink_pullup_enhanced, the optimizer evaluates whether to rewrite sublinks based on the cost. In this scenario, the plan generated without rewriting SQL statements is better. The generated plan is as follows:

gaussdb=# SET rewrite_rule=enable_sublink_pullup_enhanced;
SET
gaussdb=# SET costbased_rewrite_rule=pullup_expr_sublink,enable_sublink_pullup_enhanced;
SET
gaussdb=# EXPLAIN SELECT ct2.a, SUM(ct2.b) AS value FROM ct2,ct1 WHERE ct2.a = ct1.b GROUP BY ct2.a HAVING SUM(ct2.b) >= (SELECT AVG(ct2.b) FROM ct2,ct1 WHERE ct2.a = ct1.b) ORDER BY value DESC;
                                                     QUERY PLAN                                                   

------------------------------------------------------------------------------------------------------------------
---
 Sort  (cost=5.35..5.35 rows=2 width=24)
   Sort Key: (pg_catalog.sum((sum(costbased_rule_test.ct2.b)))) DESC
   InitPlan 1 (returns $0)
     ->  Streaming (type: GATHER)  (cost=2.52..2.58 rows=3 width=36)
           Node/s: All datanodes
           ->  Aggregate  (cost=2.33..2.40 rows=3 width=36)
                 ->  Streaming(type: BROADCAST)  (cost=2.33..2.40 rows=9 width=36)
                       Spawn on: All datanodes
                       ->  Aggregate  (cost=2.33..2.34 rows=3 width=36)
                             ->  Merge Join  (cost=1.46..2.16 rows=200 width=4)
                                   Merge Cond: (costbased_rule_test.ct1.b = costbased_rule_test.ct2.a)
                                   ->  Index Only Scan using idx_ct1_b on ct1  (cost=0.00..90.75 rows=10100 width=
4)
                                   ->  Sort  (cost=1.29..1.30 rows=6 width=8)
                                         Sort Key: costbased_rule_test.ct2.a
                                         ->  Streaming(type: BROADCAST)  (cost=0.00..1.28 rows=6 width=8)
                                               Spawn on: All datanodes
                                               ->  Seq Scan on ct2  (cost=0.00..1.01 rows=2 width=8)
   ->  HashAggregate  (cost=2.79..2.94 rows=2 width=24)
         Group By Key: costbased_rule_test.ct2.a
         Filter: ((pg_catalog.sum((sum(costbased_rule_test.ct2.b))))::numeric >= $0)
         ->  Streaming (type: GATHER)  (cost=2.79..2.94 rows=6 width=24)
               Node/s: All datanodes
               ->  HashAggregate  (cost=2.66..2.69 rows=6 width=24)
                     Group By Key: costbased_rule_test.ct2.a
                     ->  Merge Join  (cost=1.46..2.16 rows=200 width=8)
                           Merge Cond: (costbased_rule_test.ct1.b = costbased_rule_test.ct2.a)
                           ->  Index Only Scan using idx_ct1_b on ct1  (cost=0.00..90.75 rows=10100 width=4)
                           ->  Sort  (cost=1.29..1.30 rows=6 width=8)
                                 Sort Key: costbased_rule_test.ct2.a
                                 ->  Streaming(type: BROADCAST)  (cost=0.00..1.28 rows=6 width=8)
                                       Spawn on: All datanodes
                                       ->  Seq Scan on ct2  (cost=0.00..1.01 rows=2 width=8)
(32 rows)

costbased_rewrite_rule=pullup_expr_sublink,enable_sublink_pullup_enhanced is mutually exclusive with rewrite_rule=disable_pullup_expr_sublink and rewrite_rule=enable_sublink_pullup_enhanced, and the value of costbased_rewrite_rule precedes that of rewrite_rule.

Pulling Up Non-correlated Sublinks of ANY/IN Type in the AND Condition

SQL statements can be rewritten when a single or AND condition contains non-correlated sublinks of ANY/IN type. If the cost-based query rewriting evaluation policy is disabled, sublinks are rewritten based on rules. The generation plan is as follows:

gaussdb=# SET rewrite_rule=none;
SET
gaussdb=# SET costbased_rewrite_rule=none;
SET
gaussdb=# EXPLAIN SELECT * FROM ct2 WHERE ct2.a IN (SELECT a FROM ct1 WHERE ct1.c > 500 AND ct1.a = 10);
                           QUERY PLAN
-----------------------------------------------------------------
 Streaming (type: GATHER)  (cost=0.19..66.72 rows=3 width=16)
   Node/s: datanode2
   ->  Nested Loop Semi Join  (cost=0.00..66.53 rows=3 width=16)
         ->  Seq Scan on ct2  (cost=0.00..1.01 rows=1 width=16)
               Filter: (a = 10)
         ->  Seq Scan on ct1  (cost=0.00..65.50 rows=1 width=4)
               Filter: ((c > 500) AND (a = 10))
(7 rows)

When costbased_rewrite_rule is set to pullup_sublink_any_exists, the optimizer evaluates whether to rewrite sublinks based on the cost. In this scenario, the plan generated without rewriting SQL statements is better. The generated plan is as follows:

gaussdb=# SET rewrite_rule=none;
SET
gaussdb=# SET costbased_rewrite_rule=pullup_sublink_any_exists;
SET
gaussdb=# EXPLAIN SELECT * FROM ct2 WHERE ct2.a IN (SELECT a FROM ct1 WHERE ct1.c > 500 AND ct1.a = 10);
                                    QUERY PLAN
-----------------------------------------------------------------------------------
 Streaming (type: GATHER)  (cost=65.59..66.66 rows=1 width=16)
   Node/s: All datanodes
   ->  Seq Scan on ct2  (cost=65.52..66.54 rows=1 width=16)
         Filter: (hashed SubPlan 1)
         SubPlan 1
           ->  Materialize  (cost=0.00..65.52 rows=9 width=4)
                 ->  Streaming(type: BROADCAST)  (cost=0.00..65.51 rows=3 width=4)
                       Spawn on: datanode2
                       ->  Seq Scan on ct1  (cost=0.00..65.50 rows=1 width=4)
                             Filter: ((c > 500) AND (a = 10))
(10 rows)

This scenario is controlled only by the GUC parameter costbased_rewrite_rule=pullup_sublink_any_exists. The GUC parameter rewrite_rule does not have related control parameters.

Pulling Up Related Sublinks of [NOT]EXISTS Type in the AND Condition

SQL statements can be rewritten when a single or AND condition contains related sublinks of [NOT]EXISTS type. If the cost-based query rewriting evaluation policy is disabled, sublinks are rewritten based on rules. The generation plan is as follows:

gaussdb=# SET rewrite_rule=none;
SET
gaussdb=# SET costbased_rewrite_rule=none;
SET
gaussdb=# EXPLAIN SELECT * FROM ct2 WHERE EXISTS (SELECT 1 FROM ct3 WHERE ct3.b = ct2.c AND ct3.c < 1000);
                                        QUERY PLAN                                        
------------------------------------------------------------------------------------------
 Streaming (type: GATHER)  (cost=4.66..5.95 rows=2 width=16)
   Node/s: All datanodes
   ->  Nested Loop  (cost=4.60..5.82 rows=2 width=16)
         Join Filter: (ct2.c = ct3.b)
         ->  Streaming(type: REDISTRIBUTE)  (cost=0.00..1.13 rows=2 width=16)
               Spawn on: All datanodes
               ->  Seq Scan on ct2  (cost=0.00..1.01 rows=2 width=16)
         ->  Materialize  (cost=4.60..4.65 rows=11 width=4)
               ->  HashAggregate  (cost=4.60..4.64 rows=11 width=4)
                     Group By Key: ct3.b
                     ->  Streaming(type: REDISTRIBUTE)  (cost=3.55..4.57 rows=33 width=4)
                           Spawn on: All datanodes
                           ->  HashAggregate  (cost=3.55..3.66 rows=33 width=4)
                                 Group By Key: ct3.b
                                 ->  Seq Scan on ct3  (cost=0.00..3.46 rows=110 width=4)
                                       Filter: (c < 1000)
(16 rows)

gaussdb=# SET rewrite_rule=none;
SET
gaussdb=# SET costbased_rewrite_rule=pullup_sublink_any_exists;
SET
gaussdb=# EXPLAIN SELECT * FROM ct2 WHERE EXISTS (SELECT 1 FROM ct3 WHERE ct3.b = ct2.c AND ct3.c < 1000);
                                        QUERY PLAN                                         
-------------------------------------------------------------------------------------------
 Streaming (type: GATHER)  (cost=0.06..1.26 rows=1 width=16)
   Node/s: All datanodes
   ->  Seq Scan on ct2  (cost=0.00..1.14 rows=1 width=16)
         Filter: (SubPlan 1)
         SubPlan 1
           ->  Result  (cost=0.00..14.02 rows=330 width=4)
                 Filter: (ct3.b = ct2.c)
                 ->  Materialize  (cost=0.00..14.02 rows=330 width=4)
                       ->  Streaming(type: BROADCAST)  (cost=0.00..13.47 rows=330 width=4)
                             Spawn on: All datanodes
                             ->  Seq Scan on ct3  (cost=0.00..3.46 rows=110 width=4)
                                   Filter: (c < 1000)
(12 rows)

This scenario is controlled only by the GUC parameter costbased_rewrite_rule=pullup_sublink_any_exists. The GUC parameter rewrite_rule does not have related control parameters.

Pulling Up Non-correlated Sublinks of NOT IN Type in the AND Condition

SQL statements can be rewritten when a single or AND condition contains non-correlated sublinks of NOT IN type. If the cost-based query rewriting evaluation policy is disabled, sublinks are rewritten based on rules. The generation plan is as follows:

gaussdb=# SET rewrite_rule=none;
SET
gaussdb=# SET costbased_rewrite_rule=none;
SET
gaussdb=# EXPLAIN SELECT * FROM ct1 WHERE ct1.b NOT IN (SELECT a FROM ct3);
                                    QUERY PLAN                                     
-----------------------------------------------------------------------------------
 Streaming (type: GATHER)  (cost=4.00..5681.51 rows=9100 width=16)
   Node/s: All datanodes
   ->  Nested Loop Anti Join  (cost=0.00..5302.39 rows=9100 width=16)
         Join Filter: ((ct1.b = ct3.a) OR (ct1.b IS NULL) OR (ct3.a IS NULL))
         ->  Seq Scan on ct1  (cost=0.00..48.67 rows=10100 width=16)
         ->  Materialize  (cost=0.00..13.93 rows=330 width=4)
               ->  Streaming(type: BROADCAST)  (cost=0.00..13.38 rows=330 width=4)
                     Spawn on: All datanodes
                     ->  Seq Scan on ct3  (cost=0.00..3.37 rows=110 width=4)
(9 rows)

When costbased_rewrite_rule is set to pullup_not_in_sublink, the optimizer evaluates whether to rewrite sublinks based on the cost. In this scenario, the plan generated without rewriting SQL statements is better. The generated plan is as follows:

gaussdb=# SET rewrite_rule=none;
SET
gaussdb=# SET costbased_rewrite_rule=pullup_not_in_sublink;
SET
gaussdb=# EXPLAIN SELECT * FROM ct1 WHERE ct1.b NOT IN (SELECT a FROM ct3);
                                     QUERY PLAN                                     
------------------------------------------------------------------------------------
 Streaming (type: GATHER)  (cost=8.96..272.42 rows=5050 width=16)
   Node/s: All datanodes
   ->  Seq Scan on ct1  (cost=4.96..62.05 rows=5050 width=16)
         Filter: (NOT (hashed SubPlan 1))
         SubPlan 1
           ->  Materialize  (cost=0.00..4.14 rows=990 width=4)
                 ->  Streaming(type: BROADCAST)  (cost=0.00..3.59 rows=330 width=4)
                       Spawn on: All datanodes
                       ->  Seq Scan on ct3  (cost=0.00..3.37 rows=110 width=4)
(9 rows)

costbased_rewrite_rule=pullup_not_in_sublink is mutually exclusive with rewrite_rule=disable_pullup_not_in_sublink, and costbased_rewrite_rule precedes rewrite_rule.

Pulling Up Expression Sublinks in the AND Condition

SQL statements can be rewritten when a single or AND condition contains related sublinks of expression type. If both the rule-based and cost-based query rewriting evaluation policies are disabled, sublinks are not rewritten. The generation plan is as follows:

gaussdb=# SET rewrite_rule=disable_pullup_expr_sublink;
SET
gaussdb=# SET costbased_rewrite_rule=none;
SET
gaussdb=# EXPLAIN SELECT * FROM ct1 WHERE ct1.a = (SELECT a FROM ct2 WHERE ct2.b = ct1.b LIMIT 1) AND ct1.b > 50;
                                             QUERY PLAN
----------------------------------------------------------------------------------------------------
 Streaming (type: GATHER)  (cost=1.07..4454.57 rows=50 width=16)
   Node/s: All datanodes
   ->  Seq Scan on ct1  (cost=0.01..4452.44 rows=50 width=16)
         Filter: ((b > 50) AND (a = (SubPlan 1)))
         SubPlan 1
           ->  Limit  (cost=1.30..1.30 rows=3 width=8)
                 ->  Sort  (cost=1.30..1.31 rows=3 width=8)
                       Sort Key: ct2.a
                       ->  Result  (cost=0.00..1.29 rows=6 width=8)
                             Filter: (ct2.b = ct1.b)
                             ->  Materialize  (cost=0.00..1.29 rows=6 width=8)
                                   ->  Streaming(type: BROADCAST)  (cost=0.00..1.28 rows=6 width=8)
                                         Spawn on: All datanodes
                                         ->  Seq Scan on ct2  (cost=0.00..1.01 rows=2 width=8)
(14 rows)

When costbased_rewrite_rule is set to pullup_expr_sublink, the optimizer evaluates whether to rewrite sublinks based on the cost. In this scenario, the plan generated after SQL statements are rewritten is better. The generated plan is as follows:

gaussdb=# SET rewrite_rule=disable_pullup_expr_sublink;
SET
gaussdb=# SET costbased_rewrite_rule=pullup_expr_sublink;
SET
gaussdb=# EXPLAIN SELECT * FROM ct1 WHERE ct1.a = (SELECT a FROM ct2 WHERE ct2.b = ct1.b LIMIT 1) AND ct1.b > 50;
                                                          QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------
 Streaming (type: GATHER)  (cost=1.31..1.83 rows=3 width=16)
   Node/s: All datanodes
   ->  Merge Join  (cost=1.25..1.71 rows=3 width=16)
         Merge Cond: (subquery."?column?" = ct1.b)
         Join Filter: (ct1.a = subquery.a)
         ->  Sort  (cost=1.24..1.24 rows=1 width=8)
               Sort Key: subquery."?column?"
               ->  Streaming(type: REDISTRIBUTE)  (cost=1.05..1.23 rows=1 width=8)
                     Spawn on: All datanodes
                     ->  Subquery Scan on subquery  (cost=1.05..1.08 rows=1 width=8)
                           Filter: (subquery."?column?" OPERATOR(pg_catalog.=) 1::bigint)
                           ->  WindowAgg  (cost=1.05..1.07 rows=1 width=8)
                                 row_number_filter: (row_number() OVER (PARTITION BY ct2.b) OPERATOR(pg_catalog.<=) 1::bigint)
                                 ->  Sort  (cost=1.05..1.06 rows=1 width=8)
                                       Sort Key: ct2.b
                                       ->  Streaming(type: REDISTRIBUTE)  (cost=1.01..1.04 rows=1 width=8)
                                             Spawn on: All datanodes
                                             ->  Seq Scan on ct2  (cost=1.01..1.01 rows=1 width=8)
                                                   Filter: (b > 50)
         ->  Index Scan using idx_ct1_b on ct1  (cost=0.00..159.16 rows=10099 width=16)
               Index Cond: (b > 50)
(21 rows)

costbased_rewrite_rule=pullup_expr_sublink is mutually exclusive with rewrite_rule=disable_pullup_expr_sublink, rewrite_rule=enable_sublink_pullup_enhanced, and rewrite_rule=magicset, and costbased_rewrite_rule precedes rewrite_rule.

Rewriting SQL Statements when the Expression Sublinks in the AND Condition Meet the MAGICSET Pushdown Rules

SQL statements can be rewritten when a single or AND condition contains related sublinks of expression type and MagicSet pushdown rules can push down some filter criteria to the sublinks. If the cost-based query rewriting evaluation policy is disabled, sublinks are rewritten based on rules. The generation plan is as follows:

gaussdb=# SET rewrite_rule=magicset;
SET
gaussdb=# SET costbased_rewrite_rule=none;
SET
gaussdb=# EXPLAIN SELECT * FROM ct1 WHERE  ct1.a = (SELECT a FROM ct2 WHERE ct2.b = ct1.b LIMIT 1) AND ct1.c=10;
                                                       QUERY PLAN                                                        
-------------------------------------------------------------------------------------------------------------------------
 Streaming (type: GATHER)  (cost=62.95..78.78 rows=3 width=16)
   Node/s: All datanodes
   ->  Nested Loop  (cost=62.89..78.66 rows=3 width=16)
         ->  Streaming(type: REDISTRIBUTE)  (cost=58.39..58.57 rows=1 width=8)
               Spawn on: All datanodes
               ->  Subquery Scan on subquery  (cost=58.39..58.43 rows=1 width=8)
                     Filter: (subquery."?column?" OPERATOR(pg_catalog.=) 1::bigint)
                     ->  WindowAgg  (cost=58.39..58.42 rows=3 width=8)
                           row_number_filter: (row_number() OVER (PARTITION BY ct2.b) OPERATOR(pg_catalog.<=) 1::bigint)
                           ->  Sort  (cost=58.39..58.40 rows=3 width=8)
                                 Sort Key: ct2.b
                                 ->  Nested Loop Semi Join  (cost=0.00..58.38 rows=3 width=8)
                                       Join Filter: (ct2.b = costbased_rule_test.ct1.b)
                                       ->  Streaming(type: REDISTRIBUTE)  (cost=0.00..1.13 rows=2 width=8)
                                             Spawn on: All datanodes
                                             ->  Seq Scan on ct2  (cost=0.00..1.01 rows=2 width=8)
                                       ->  Materialize  (cost=0.00..57.24 rows=1 width=4)
                                             ->  Streaming(type: REDISTRIBUTE)  (cost=0.00..57.23 rows=1 width=4)
                                                   Spawn on: All datanodes
                                                   ->  Seq Scan on ct1  (cost=0.00..57.09 rows=1 width=4)
                                                         Filter: (c = 10)
         ->  Bitmap Heap Scan on ct1  (cost=4.50..20.08 rows=1 width=16)
               Recheck Cond: (b = subquery."?column?")
               Filter: ((c = 10) AND (subquery.a = a))
               ->  Bitmap Index Scan on idx_ct1_b  (cost=0.00..4.50 rows=100 width=0)
                     Index Cond: (b = subquery."?column?")
(26 rows)

When costbased_rewrite_rule is set to pullup_expr_sublink, the optimizer evaluates whether to rewrite the MagicSet pushdown conditions of sublinks based on the cost. In this scenario, the plan generated without rewriting MagicSet is better. The generated plan is as follows:

gaussdb=# SET rewrite_rule=magicset;
SET
gaussdb=# SET costbased_rewrite_rule=pullup_expr_sublink;
SET
gaussdb=# EXPLAIN SELECT * FROM ct1 WHERE  ct1.a = (SELECT a FROM ct2 WHERE ct2.b = ct1.b LIMIT 1)  AND ct1.c=10;
                                                       QUERY PLAN                                                        
-------------------------------------------------------------------------------------------------------------------------
 Streaming (type: GATHER)  (cost=5.61..21.43 rows=3 width=16)
   Node/s: All datanodes
   ->  Nested Loop  (cost=5.54..21.31 rows=3 width=16)
         ->  Streaming(type: REDISTRIBUTE)  (cost=1.05..1.22 rows=1 width=8)
               Spawn on: All datanodes
               ->  Subquery Scan on subquery  (cost=1.05..1.08 rows=1 width=8)
                     Filter: (subquery."?column?" OPERATOR(pg_catalog.=) 1::bigint)
                     ->  WindowAgg  (cost=1.05..1.07 rows=1 width=8)
                           row_number_filter: (row_number() OVER (PARTITION BY ct2.b) OPERATOR(pg_catalog.<=) 1::bigint)
                           ->  Sort  (cost=1.05..1.05 rows=1 width=8)
                                 Sort Key: ct2.b
                                 ->  Streaming(type: REDISTRIBUTE)  (cost=0.00..1.04 rows=2 width=8)
                                       Spawn on: All datanodes
                                       ->  Seq Scan on ct2  (cost=0.00..1.01 rows=2 width=8)
         ->  Bitmap Heap Scan on ct1  (cost=4.50..20.08 rows=1 width=16)
               Recheck Cond: (b = subquery."?column?")
               Filter: ((c = 10) AND (subquery.a = a))
               ->  Bitmap Index Scan on idx_ct1_b  (cost=0.00..4.50 rows=100 width=0)
                     Index Cond: (b = subquery."?column?")
(19 rows)

Rewriting SQL Statements when the Expression Sublinks in the AND Condition Meet the WINAGG Rules

SQL statements can be rewritten when a single or AND condition contains related sublinks of expression type and WinAGG rules are met. If the cost-based query rewriting evaluation policy is disabled, sublinks are rewritten based on rules. The generation plan is as follows:

gaussdb=# SET rewrite_rule=enable_sublink_pullup_enhanced;
SET
gaussdb=# SET costbased_rewrite_rule=none;
SET
gaussdb=# EXPLAIN SELECT * FROM ct1, ct2 WHERE ct1.b = ct2.c AND ct2.a = (SELECT min(b) FROM ct1 WHERE ct1.b = ct2.c);
                                                   QUERY PLAN                                                   
----------------------------------------------------------------------------------------------------------------
 Streaming (type: GATHER)  (cost=10.71..14.23 rows=1 width=36)
   Node/s: All datanodes
   ->  Subquery Scan on inner_subquery  (cost=10.64..14.11 rows=1 width=36)
         Filter: (inner_subquery.a = inner_subquery.min)
         ->  WindowAgg  (cost=10.64..13.27 rows=200 width=32)
               ->  Sort  (cost=10.64..11.02 rows=200 width=32)
                     Sort Key: ct2.c
                     ->  Streaming(type: REDISTRIBUTE)  (cost=1.46..5.22 rows=200 width=32)
                           Spawn on: All datanodes
                           ->  Merge Join  (cost=1.46..2.18 rows=200 width=32)
                                 Merge Cond: (ct1.b = ct2.c)
                                 ->  Index Scan using idx_ct1_b on ct1  (cost=0.00..150.75 rows=10100 width=16)
                                 ->  Sort  (cost=1.29..1.30 rows=3 width=16)
                                       Sort Key: ct2.c
                                       ->  Streaming(type: BROADCAST)  (cost=0.00..1.28 rows=6 width=16)
                                             Spawn on: All datanodes
                                             ->  Seq Scan on ct2  (cost=0.00..1.01 rows=2 width=16)
(17 rows)

gaussdb=# SET rewrite_rule=enable_sublink_pullup_enhanced;
SET
gaussdb=# SET costbased_rewrite_rule=pullup_expr_sublink,enable_sublink_pullup_enhanced;
SET
gaussdb=# EXPLAIN SELECT * FROM ct1, ct2 WHERE ct1.b = ct2.c AND ct2.a = (SELECT min(b) FROM ct1 WHERE ct1.b = ct2.c);
                                                          QUERY PLAN                                                          
------------------------------------------------------------------------------------------------------------------------------
 Streaming (type: GATHER)  (cost=3.55..5.99 rows=100 width=32)
   Node/s: All datanodes
   ->  Merge Join  (cost=1.49..1.87 rows=100 width=32)
         Merge Cond: (costbased_rule_test.ct1.b = ct2.c)
         ->  Index Scan using idx_ct1_b on ct1  (cost=0.00..150.75 rows=10100 width=16)
         ->  Sort  (cost=1.40..1.41 rows=3 width=16)
               Sort Key: ct2.c
               ->  Streaming(type: BROADCAST)  (cost=0.00..1.39 rows=3 width=16)
                     Spawn on: All datanodes
                     ->  Seq Scan on ct2  (cost=0.00..1.12 rows=1 width=16)
                           Filter: (a = (SubPlan 2))
                           SubPlan 2
                             ->  Result  (cost=0.10..0.11 rows=3 width=0)
                                   InitPlan 1 (returns $1)
                                     ->  Limit  (cost=0.00..0.10 rows=3 width=4)
                                           ->  Result  (cost=0.00..1009.68 rows=3 width=4)
                                                 Filter: (costbased_rule_test.ct1.b = ct2.c)
                                                 ->  Materialize  (cost=0.00..1009.68 rows=3 width=4)
                                                       ->  Streaming(type: BROADCAST)  (cost=0.00..959.18 rows=30300 width=4)
                                                             Spawn on: All datanodes
                                                             ->  Seq Scan on ct1  (cost=0.00..48.67 rows=10100 width=4)
                                                                   Filter: (b IS NOT NULL)
(22 rows)

costbased_rewrite_rule=pullup_expr_sublink,enable_sublink_pullup_enhanced is mutually exclusive with rewrite_rule=disable_pullup_expr_sublink, rewrite_rule=enable_sublink_pullup_enhanced, and rewrite_rule=magicset, and costbased_rewrite_rule precedes rewrite_rule.

Pulling Up Expression Sublinks in the OR Condition

SQL statements can be rewritten when an OR condition contains related sublinks of expression type. If the cost-based query rewriting evaluation policy is disabled, sublinks are rewritten based on rules. The generation plan is as follows:

gaussdb=# SET rewrite_rule=none;
SET
gaussdb=# SET costbased_rewrite_rule=none;
SET
gaussdb=# EXPLAIN SELECT * FROM ct2 WHERE ct2.a = (SELECT min(b) FROM ct1 WHERE ct1.b = ct2.c) OR ct2.b = (SELECT sum(a) FROM ct3 WHERE ct3.a = ct2.c);
                                            QUERY PLAN                                             
---------------------------------------------------------------------------------------------------
 Streaming (type: GATHER)  (cost=73.05..75.50 rows=3 width=16)
   Node/s: All datanodes
   ->  Nested Loop Left Join  (cost=72.99..75.38 rows=3 width=16)
         Join Filter: (ct3.a = ct2.c)
         Filter: ((ct2.a = (min((min(ct1.b))))) OR (ct2.b = (sum(ct3.a))))
         ->  Nested Loop Left Join  (cost=69.43..71.67 rows=2 width=20)
               Join Filter: (ct1.b = ct2.c)
               ->  Streaming(type: REDISTRIBUTE)  (cost=0.00..1.13 rows=2 width=16)
                     Spawn on: All datanodes
                     ->  Seq Scan on ct2  (cost=0.00..1.01 rows=2 width=16)
               ->  Materialize  (cost=69.43..70.20 rows=101 width=8)
                     ->  HashAggregate  (cost=69.43..69.77 rows=101 width=8)
                           Group By Key: ct1.b
                           ->  Streaming(type: REDISTRIBUTE)  (cost=65.51..68.93 rows=303 width=8)
                                 Spawn on: All datanodes
                                 ->  HashAggregate  (cost=65.51..66.52 rows=303 width=8)
                                       Group By Key: ct1.b
                                       ->  Seq Scan on ct1  (cost=0.00..48.67 rows=10100 width=4)
         ->  HashAggregate  (cost=3.55..3.59 rows=12 width=12)
               Group By Key: ct3.a
               ->  Seq Scan on ct3  (cost=0.00..3.37 rows=110 width=4)
(21 rows)

When costbased_rewrite_rule is set to pullup_expr_sublink, the optimizer evaluates whether to rewrite sublinks based on the cost. In this scenario, the plan generated without rewriting is better. The generated plan is as follows:

gaussdb=# SET rewrite_rule=none;
SET
gaussdb=# SET costbased_rewrite_rule=pullup_expr_sublink;
SET
gaussdb=# EXPLAIN SELECT * FROM ct2 WHERE ct2.a = (SELECT min(b) FROM ct1 WHERE ct1.b = ct2.c) OR ct2.b = (SELECT sum(a) FROM ct3 WHERE ct3.a = ct2.c);
                                                 QUERY PLAN                                                 
------------------------------------------------------------------------------------------------------------
 Streaming (type: GATHER)  (cost=3.75..5.07 rows=3 width=16)
   Node/s: All datanodes
   ->  Hash Left Join  (cost=3.68..4.95 rows=3 width=16)
         Hash Cond: (ct2.c = subquery."?column?")
         Filter: ((ct2.a = (SubPlan 2)) OR (ct2.b = subquery.sum))
         ->  Streaming(type: REDISTRIBUTE)  (cost=0.00..1.13 rows=2 width=16)
               Spawn on: All datanodes
               ->  Seq Scan on ct2  (cost=0.00..1.01 rows=2 width=16)
         ->  Hash  (cost=3.63..3.63 rows=12 width=12)
               ->  Subquery Scan on subquery  (cost=3.55..3.63 rows=12 width=12)
                     ->  HashAggregate  (cost=3.55..3.59 rows=12 width=12)
                           Group By Key: ct3.a
                           ->  Seq Scan on ct3  (cost=0.00..3.37 rows=110 width=4)
         SubPlan 2
           ->  Result  (cost=0.10..0.11 rows=3 width=0)
                 InitPlan 1 (returns $1)
                   ->  Limit  (cost=0.00..0.10 rows=3 width=4)
                         ->  Result  (cost=0.00..1009.68 rows=3 width=4)
                               Filter: (costbased_rule_test.ct1.b = ct2.c)
                               ->  Materialize  (cost=0.00..1009.68 rows=3 width=4)
                                     ->  Streaming(type: BROADCAST)  (cost=0.00..959.18 rows=30300 width=4)
                                           Spawn on: All datanodes
                                           ->  Seq Scan on ct1  (cost=0.00..48.67 rows=10100 width=4)
                                                 Filter: (b IS NOT NULL)
(24 rows)

Pulling Up the Related Sublinks of the Expression Type in TargetList

SQL statements can be rewritten when TargetList contains related sublinks of expression type. If both the rule-based and cost-based query rewriting evaluation policies are disabled, sublinks are not rewritten. The generation plan is as follows:

gaussdb=# SET rewrite_rule=none;
SET
gaussdb=# SET costbased_rewrite_rule=none;
SET
gaussdb=# EXPLAIN SELECT *, ct1.a = (SELECT a FROM ct2 WHERE ct2.b = ct1.b LIMIT 1)  FROM ct1 WHERE ct1.b >50;
                                             QUERY PLAN
----------------------------------------------------------------------------------------------------
 Streaming (type: GATHER)  (cost=4.00..4871.89 rows=10099 width=16)
   Node/s: All datanodes
   ->  Seq Scan on ct1  (cost=0.00..4451.14 rows=10099 width=16)
         Filter: (b > 50)
         SubPlan 1
           ->  Limit  (cost=1.30..1.30 rows=3 width=8)
                 ->  Sort  (cost=1.30..1.31 rows=3 width=8)
                       Sort Key: ct2.a
                       ->  Result  (cost=0.00..1.29 rows=6 width=8)
                             Filter: (ct2.b = ct1.b)
                             ->  Materialize  (cost=0.00..1.29 rows=6 width=8)
                                   ->  Streaming(type: BROADCAST)  (cost=0.00..1.28 rows=6 width=8)
                                         Spawn on: All datanodes
                                         ->  Seq Scan on ct2  (cost=0.00..1.01 rows=2 width=8)
(14 rows)

When costbased_rewrite_rule is set to intargetlist, the optimizer evaluates whether to rewrite sublinks based on the cost. In this scenario, the plan generated without rewriting is better. The generated plan is as follows:

gaussdb=# SET rewrite_rule=none;
SET
gaussdb=# SET costbased_rewrite_rule=intargetlist;
SET
gaussdb=# EXPLAIN SELECT *, ct1.a = (SELECT a FROM ct2 WHERE ct2.b = ct1.b LIMIT 1)  FROM ct1 WHERE ct1.b >50;
                                                          QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------
 Streaming (type: GATHER)  (cost=5.36..496.38 rows=10099 width=20)
   Node/s: All datanodes
   ->  Hash Left Join  (cost=1.36..75.63 rows=10099 width=20)
         Hash Cond: (ct1.b = subquery."?column?")
         ->  Seq Scan on ct1  (cost=0.00..57.09 rows=10099 width=16)
               Filter: (b > 50)
         ->  Hash  (cost=1.35..1.35 rows=3 width=8)
               ->  Streaming(type: BROADCAST)  (cost=1.05..1.35 rows=3 width=8)
                     Spawn on: All datanodes
                     ->  Subquery Scan on subquery  (cost=1.05..1.08 rows=1 width=8)
                           Filter: (subquery."?column?" OPERATOR(pg_catalog.=) 1::bigint)
                           ->  WindowAgg  (cost=1.05..1.07 rows=1 width=8)
                                 row_number_filter: (row_number() OVER (PARTITION BY ct2.b) OPERATOR(pg_catalog.<=) 1::bigint)
                                 ->  Sort  (cost=1.05..1.05 rows=1 width=8)
                                       Sort Key: ct2.b
                                       ->  Streaming(type: REDISTRIBUTE)  (cost=0.00..1.04 rows=2 width=8)
                                             Spawn on: All datanodes
                                             ->  Seq Scan on ct2  (cost=0.00..1.01 rows=2 width=8)
(18 rows)

costbased_rewrite_rule=intargetlist is mutually exclusive with rewrite_rule=ntargetlist and rewrite_rule=magicset, and costbased_rewrite_rule precedes rewrite_rule.

Rewriting SQL Statements when the Expression Sublinks in TargetList Meet the MAGICSET Pushdown Rules

SQL statements can be rewritten when TargetList contains related sublinks of expression type and MagicSet pushdown rules can push down some filter criteria to the sublinks. If the cost-based query rewriting evaluation policy is disabled, sublinks are rewritten based on rules. The generation plan is as follows:

gaussdb=# SET rewrite_rule=intargetlist,magicset;
SET
gaussdb=# SET costbased_rewrite_rule=none;
SET
gaussdb=# EXPLAIN SELECT *,ct1.a = (SELECT a FROM ct2 WHERE ct2.c = ct1.c LIMIT 1) FROM ct1 WHERE ct1.b=10;
                                                                    QUERY PLAN                                                       

-------------------------------------------------------------------------------------------------------------------------------------
-------------
 Streaming (type: GATHER)  (cost=27.90..46.26 rows=100 width=20)
   Node/s: All datanodes
   ->  Nested Loop Left Join  (cost=25.83..42.13 rows=100 width=20)
         Join Filter: (subquery."?column?" = costbased_rule_test.ct1.c)
         ->  Bitmap Heap Scan on ct1  (cost=4.51..19.92 rows=100 width=16)
               Recheck Cond: (b = 10)
               ->  Bitmap Index Scan on idx_ct1_b  (cost=0.00..4.50 rows=100 width=0)
                     Index Cond: (b = 10)
         ->  Materialize  (cost=21.33..21.64 rows=3 width=8)
               ->  Streaming(type: BROADCAST)  (cost=21.33..21.63 rows=3 width=8)
                     Spawn on: All datanodes
                     ->  Subquery Scan on subquery  (cost=21.33..21.36 rows=1 width=8)
                           Filter: (subquery."?column?" OPERATOR(pg_catalog.=) 1::bigint)
                           ->  WindowAgg  (cost=21.33..21.35 rows=3 width=8)
                                 row_number_filter: (row_number() OVER (PARTITION BY ct2.c) OPERATOR(pg_catalog.<=) 1::bigint)
                                 ->  Sort  (cost=21.33..21.33 rows=3 width=8)
                                       Sort Key: ct2.c
                                       ->  Nested Loop  (cost=20.16..21.32 rows=2 width=8)
                                             Join Filter: (ct2.c = costbased_rule_test.ct1.c)
                                             ->  Streaming(type: REDISTRIBUTE)  (cost=0.00..1.13 rows=2 width=8)
                                                   Spawn on: All datanodes
                                                   ->  Seq Scan on ct2  (cost=0.00..1.01 rows=2 width=8)
                                             ->  Materialize  (cost=20.16..20.18 rows=1 width=4)
                                                   ->  HashAggregate  (cost=20.16..20.17 rows=1 width=4)
                                                         Group By Key: costbased_rule_test.ct1.c
                                                         ->  Streaming(type: REDISTRIBUTE)  (cost=20.00..20.16 rows=3 width=4)
                                                               Spawn on: All datanodes
                                                               ->  HashAggregate  (cost=20.00..20.01 rows=3 width=4)
                                                                     Group By Key: costbased_rule_test.ct1.c
                                                                     ->  Bitmap Heap Scan on ct1  (cost=4.51..19.92 rows=100 width=4)
                                                                           Recheck Cond: (b = 10)
                                                                           ->  Bitmap Index Scan on idx_ct1_b  (cost=0.00..4.50 rows=
100 width=0)
                                                                                 Index Cond: (b = 10)
(33 rows)

When costbased_rewrite_rule is set to intargetlist, the optimizer evaluates whether to rewrite the MagicSet pushdown conditions of sublinks based on the cost. In this scenario, the plan generated without rewriting MagicSet is better. The generated plan is as follows:

gaussdb=# SET rewrite_rule=intargetlist,magicset;
SET
gaussdb=# SET costbased_rewrite_rule=intargetlist;
SET
gaussdb=# EXPLAIN SELECT *,ct1.a = (SELECT a FROM ct2 WHERE ct2.c = ct1.c LIMIT 1) FROM ct1 WHERE ct1.b=10;
                                                          QUERY PLAN                                                           
-------------------------------------------------------------------------------------------------------------------------------
 Streaming (type: GATHER)  (cost=7.93..25.74 rows=100 width=20)
   Node/s: All datanodes
   ->  Hash Left Join  (cost=5.87..21.62 rows=100 width=20)
         Hash Cond: (ct1.c = subquery."?column?")
         ->  Bitmap Heap Scan on ct1  (cost=4.51..19.92 rows=100 width=16)
               Recheck Cond: (b = 10)
               ->  Bitmap Index Scan on idx_ct1_b  (cost=0.00..4.50 rows=100 width=0)
                     Index Cond: (b = 10)
         ->  Hash  (cost=1.35..1.35 rows=3 width=8)
               ->  Streaming(type: BROADCAST)  (cost=1.05..1.35 rows=3 width=8)
                     Spawn on: All datanodes
                     ->  Subquery Scan on subquery  (cost=1.05..1.08 rows=1 width=8)
                           Filter: (subquery."?column?" OPERATOR(pg_catalog.=) 1::bigint)
                           ->  WindowAgg  (cost=1.05..1.07 rows=1 width=8)
                                 row_number_filter: (row_number() OVER (PARTITION BY ct2.c) OPERATOR(pg_catalog.<=) 1::bigint)
                                 ->  Sort  (cost=1.05..1.05 rows=1 width=8)
                                       Sort Key: ct2.c
                                       ->  Streaming(type: REDISTRIBUTE)  (cost=0.00..1.04 rows=2 width=8)
                                             Spawn on: All datanodes
                                             ->  Seq Scan on ct2  (cost=0.00..1.01 rows=2 width=8)
(20 rows)

costbased_rewrite_rule=intargetlist is mutually exclusive with rewrite_rule=intargetlist and rewrite_rule=magicset, and costbased_rewrite_rule precedes rewrite_rule.

Rewriting SQL Statements when Simple Subqueries Exist

SQL statements can be rewritten when simple subqueries exist. If the cost-based query rewriting evaluation policy is disabled, sublinks are rewritten based on rules. The generation plan is as follows:

gaussdb=# SET rewrite_rule=none;
SET
gaussdb=# SET costbased_rewrite_rule=none;
SET
gaussdb=# EXPLAIN SELECT * FROM ct2,(SELECT ct1.* FROM ct1,ct2,ct3 WHERE ct1.c > 10 AND ct1.a = 10) ct WHERE ct.a = ct2.d;
                                            QUERY PLAN
---------------------------------------------------------------------------------------------------
 Streaming (type: GATHER)  (cost=4.00..100.52 rows=660 width=32)
   Node/s: All datanodes
   ->  Nested Loop  (cost=0.00..73.02 rows=660 width=32)
         ->  Nested Loop  (cost=0.00..69.52 rows=110 width=16)
               ->  Streaming(type: BROADCAST)  (cost=0.00..65.78 rows=3 width=16)
                     Spawn on: datanode2
                     ->  Seq Scan on ct1  (cost=0.00..65.50 rows=1 width=16)
                           Filter: ((c > 10) AND (a = 10))
               ->  Seq Scan on ct3  (cost=0.00..3.37 rows=110 width=0)
         ->  Materialize  (cost=0.00..2.58 rows=6 width=16)
               ->  Streaming(type: BROADCAST)  (cost=0.00..2.57 rows=6 width=16)
                     Spawn on: All datanodes
                     ->  Nested Loop  (cost=0.00..2.30 rows=2 width=16)
                           ->  Seq Scan on ct2  (cost=0.00..1.01 rows=2 width=0)
                           ->  Materialize  (cost=0.00..1.29 rows=3 width=16)
                                 ->  Streaming(type: BROADCAST)  (cost=0.00..1.28 rows=3 width=16)
                                       Spawn on: All datanodes
                                       ->  Seq Scan on ct2  (cost=0.00..1.01 rows=1 width=16)
                                             Filter: (d = 10)
(19 rows)

When costbased_rewrite_rule is set to pullup_subquery, the optimizer evaluates whether to rewrite simple subqueries based on the cost. In this scenario, the plan generated without rewriting simple subqueries is better. The generated plan is as follows:

gaussdb=# SET rewrite_rule=none;
SET
gaussdb=# SET costbased_rewrite_rule=pullup_subquery;
SET
gaussdb=# EXPLAIN SELECT * FROM ct2,(SELECT ct1.* FROM ct1,ct2,ct3 WHERE ct1.c > 10 AND ct1.a = 10) ct WHERE ct.a = ct2.d;
                                             QUERY PLAN
----------------------------------------------------------------------------------------------------
 Streaming (type: GATHER)  (cost=1.37..73.72 rows=4 width=32)
   Node/s: All datanodes
   ->  Hash Join  (cost=1.31..73.60 rows=4 width=32)
         Hash Cond: (ct1.a = costbased_rule_test.ct2.d)
         ->  Nested Loop  (cost=0.00..71.37 rows=220 width=16)
               ->  Seq Scan on ct3  (cost=0.00..3.37 rows=110 width=0)
               ->  Materialize  (cost=0.00..67.08 rows=6 width=16)
                     ->  Streaming(type: BROADCAST)  (cost=0.00..67.07 rows=6 width=16)
                           Spawn on: All datanodes
                           ->  Nested Loop  (cost=0.00..66.80 rows=2 width=16)
                                 ->  Streaming(type: BROADCAST)  (cost=0.00..65.78 rows=3 width=16)
                                       Spawn on: datanode2
                                       ->  Seq Scan on ct1  (cost=0.00..65.50 rows=1 width=16)
                                             Filter: ((c > 10) AND (a = 10))
                                 ->  Seq Scan on ct2  (cost=0.00..1.01 rows=2 width=0)
         ->  Hash  (cost=1.28..1.28 rows=6 width=16)
               ->  Streaming(type: BROADCAST)  (cost=0.00..1.28 rows=6 width=16)
                     Spawn on: All datanodes
                     ->  Seq Scan on ct2  (cost=0.00..1.01 rows=2 width=16)
(19 rows)

This scenario is controlled only by the GUC parameter costbased_rewrite_rule=pullup_subquery. The GUC parameter rewrite_rule does not have related control parameters.

Using Rule-based Query Rewriting when PGXC Plans Are Generated

When a PGXC plan is generated, the cost of the DN execution plan is incorrectly calculated. As a result, the result of cost-based plan comparison is inaccurate, and cost-based query rewriting is degraded to rule-based query rewriting.

The following plan is better because EXISTS sublinks are pulled up after rule-based rewriting:

gaussdb=# SET enable_stream_operator=off;
SET
gaussdb=# SET costbased_rewrite_rule= pullup_sublink_any_exists;
SET
gaussdb=# EXPLAIN ANALYZE  SELECT * FROM ct3 WHERE EXISTS (SELECT 1 from ct1 WHERE ct1.a = ct3.a AND ct1.c < 1000);
                                                                     QUERY PLAN                                                                      
-----------------------------------------------------------------------------------------------------------------------------------------------------
 Hash Join  (cost=26.02..26.66 rows=110 width=16) (actual time=23.507..23.744 rows=110 loops=1)
   Hash Cond: (ct3.a = ct1.a)
   ->  Data Node Scan on ct3 "_REMOTE_TABLE_QUERY_"  (cost=0.00..0.00 rows=110 width=16) (actual time=0.741..0.817 rows=110 loops=1)
         Node/s: All datanodes
   ->  Hash  (cost=25.59..25.59 rows=34 width=4) (actual time=22.596..22.596 rows=100 loops=1)
          Buckets: 32768  Batches: 1  Memory Usage: 4kB
         ->  HashAggregate  (cost=25.25..25.59 rows=34 width=4) (actual time=22.517..22.552 rows=100 loops=1)
               Group By Key: ct1.a
               ->  Data Node Scan on ct1 "_REMOTE_TABLE_QUERY_"  (cost=0.00..0.00 rows=10100 width=4) (actual time=1.129..17.014 rows=10100 loops=1)
                     Node/s: All datanodes
 Total runtime: 24.440 ms
(11 rows)

The following plan should not be generated because it is actually slower even though it costs less.

gaussdb=# EXPLAIN ANALYZE  SELECT  * FROM ct3 WHERE EXISTS (SELECT /*+ no_expand */ 1 from ct1 WHERE ct1.a = ct3.a AND ct1.c < 1000);
                                                                 QUERY PLAN                                                                 
--------------------------------------------------------------------------------------------------------------------------------------------
 Data Node Scan on ct3 "_REMOTE_TABLE_QUERY_"  (cost=0.00..0.00 rows=55 width=16) (actual time=8.168..120.004 rows=110 loops=1)
   Node/s: All datanodes
   Coordinator quals: (SubPlan 1)
   SubPlan 1
     ->  Data Node Scan on ct1 "_REMOTE_TABLE_QUERY_"  (cost=0.00..0.00 rows=101 width=0) (actual time=119.024..119.024 rows=110 loops=110)
           Node/s: All datanodes
           Coordinator quals: (a = ct3.a)
 Total runtime: 123.777 ms
(8 rows)

Parent topic: Optimization Cases

Previous topic: Case: Modifying the GUC Parameter rewrite_rule

Next topic: Case: Using Global Secondary Indexes to Accelerate Query