文档首页/ 云数据库 GaussDB/ 开发指南(分布式_V2.0-8.x)/ SQL调优指南/ 实际调优案例/ 案例:调整基于代价的查询重写GUC参数costbased_rewrite_rule
更新时间:2024-11-12 GMT+08:00
分享

案例:调整基于代价的查询重写GUC参数costbased_rewrite_rule

查询重写能让优化器选择不同的执行路径,但部分规则对SQL的改写,并不能确保输出后的都是可以提升性能的计划。一旦改写错误可能导致千倍的性能差距,因此在查询改写阶段需要对此类规则支持基于代价的评估策略,决策是否应用规则对SQL进行改写,这些规则受GUC参数costbased_rewrite_rule控制,其受控的规则如表1所示。

表1 costbased_rewrite_rule GUC支持的参数控制规则说明列表

参数名

描述

none

不使用任何基于代价的查询改写策略。

pullup_subquery

简单子查询展开改写规则开启使用基于代价的查询改写策略。

pullup_sublink_any_exists

单个或AND条件里的ANY非相关子链接和[NOT] EXISTS相关子链接改写规则开启使用基于代价的查询改写策略。

pullup_not_in_sublink

单个或AND条件里的NOT IN非相关子链接改写规则开启使用基于代价的查询改写策略。该参数与GUC参数rewrite_rule的disable_pullup_not_in_sublink参数互斥,当开启该参数时,rewrite_rule的disable_pullup_not_in_sublink参数功能不生效。

pullup_expr_sublink

表达式子链接以及OR条件中的ANY非相关子链接和[NOT] EXISTS相关子链接改写场景生效。该参数与GUC参数rewrite_rule的disable_pullup_expr_sublink、enable_sublink_pullup_enhanced、magicset参数互斥,当开启该参数时rewrite_rule的相关参数功能不生效。

intargetlist

TargetList中的相关表达式子链接改写规则开启使用基于代价的查询改写策略。该参数与GUC参数rewrite_rule的intargetlist、magicset参数互斥,当开启该参数时,rewrite_rule的intargetlist、magicset参数功能不生效。

enable_sublink_pullup_enhanced

表达式子链接改写功能增强场景使用基于代价的查询改写策略。该参数受pullup_expr_sublink参数影响,AND条件中的表达式子链接改写场景需开启pullup_expr_sublink参数,该参数功能才生效。且与GUC参数rewrite_rule的enable_sublink_pullup_enhanced 参数互斥,当开启该参数时,rewrite_rule的enable_sublink_pullup_enhanced参数功能不生效。

对于上述参数所控制的查询重写规则,以下通过典型案例说明其使用场景。

在分布式场景下,基于代价的改写策略会优先选择STREAM计划,当改写前后均无法生成STREAM计划时会退化为PGXC计划。然而如果生成的PGXC计划完全下推到DN上,则CN上的代价信息是不准确的。因此该场景下生成计划时仍然会基于规则进行查询重写。场景示例参见生成PGXC计划时退化为基于规则的查询重写

案例环境准备

为了便于规则的使用场景演示,需准备建表语句如下:

SET client_min_messages = warning;
SET CLIENT_ENCODING = 'UTF8';
--清理环境。
DROP SCHEMA IF EXISTS costbased_rule_test cascade;
CREATE SCHEMA costbased_rule_test;
SET current_schema = costbased_rule_test;
SET enable_codegen = off;
DROP TABLE IF EXISTS costbased_rule_test.ct1;
DROP TABLE IF EXISTS costbased_rule_test.ct2;
DROP TABLE IF EXISTS costbased_rule_test.ct3;
DROP TABLE IF EXISTS costbased_rule_test.ct4;
--创建测试表。
CREATE TABLE ct1 (a INT, b INT, c INT, d INT);
CREATE TABLE ct2 (a INT, b INT, c INT, d INT);
CREATE TABLE ct3 (a INT, b INT, c INT, d INT);
CREATE TABLE ct4 (a INT, b INT, c INT, d INT);
CREATE INDEX idx_ct1_b ON ct1(b);
CREATE INDEX idx_ct2_c ON ct2(c);
CREATE INDEX idx_ct3_c ON ct3(c);
--插入数据。
INSERT INTO ct1 (a, b, c) VALUES (generate_series(1, 100), generate_series(200, 300), left(random()::int, 100));
INSERT INTO ct2 VALUES(1,2,3,4),(3,4,5,6);
INSERT INTO ct3 (a, b, c, d) VALUES (generate_series(1, 10), generate_series(20, 30), left(random()::int, 10), left(random()::int, 10));
--更新统计信息。
ANALYZE ct1;
ANALYZE ct2;
ANALYZE ct3;

Having条件中的非相关子链接改写场景

Having条件中存在带有AGG的非相关表达式子链接时可以进行SQL等价改写。如果关闭基于代价的查询改写评估策略,子链接会基于规则进行改写,生成计划如下:

gaussdb=# SET rewrite_rule=enable_sublink_pullup_enhanced;
SET
gaussdb=# SET costbased_rewrite_rule=none;
SET
gaussdb=#  EXPLAIN SELECT ct2.a, SUM(ct2.b) AS value FROM ct2,ct1 WHERE ct2.a = ct1.b GROUP BY ct2.a HAVING SUM(ct2.b) >= (SELECT AVG(ct2.b) FROM ct2,ct1 WHERE ct2.a = ct1.b) ORDER BY value DESC;
                                                                    QUERY PLAN                                                       

-------------------------------------------------------------------------------------------------------------------------------------
-------------
 Streaming (type: GATHER)  (cost=5.31..5.37 rows=3 width=12)
   Merge Sort Key: (pg_catalog.sum((sum(costbased_rule_test.ct2.b)))) DESC
   Node/s: All datanodes
   ->  Sort  (cost=5.24..5.25 rows=3 width=12)
         Sort Key: (pg_catalog.sum((sum(costbased_rule_test.ct2.b)))) DESC
         ->  Nested Loop  (cost=5.13..5.23 rows=1 width=12)
               Join Filter: (((pg_catalog.sum((sum(costbased_rule_test.ct2.b)))))::numeric >= (pg_catalog.avg((avg(costbased_rule_tes
t.ct2.b)))))
               ->  HashAggregate  (cost=2.79..2.80 rows=2 width=24)
                     Group By Key: costbased_rule_test.ct2.a
                     ->  Streaming(type: REDISTRIBUTE)  (cost=2.66..2.76 rows=6 width=24)
                           Spawn on: All datanodes
                           ->  HashAggregate  (cost=2.66..2.68 rows=6 width=24)
                                 Group By Key: costbased_rule_test.ct2.a
                                 ->  Merge Join  (cost=1.46..2.16 rows=200 width=8)
                                       Merge Cond: (costbased_rule_test.ct1.b = costbased_rule_test.ct2.a)
                                       ->  Index Only Scan using idx_ct1_b on ct1  (cost=0.00..90.75 rows=10100 width=4)
                                       ->  Sort  (cost=1.29..1.30 rows=6 width=8)
                                             Sort Key: costbased_rule_test.ct2.a
                                             ->  Streaming(type: BROADCAST)  (cost=0.00..1.28 rows=6 width=8)
                                                   Spawn on: All datanodes
                                                   ->  Seq Scan on ct2  (cost=0.00..1.01 rows=2 width=8)
               ->  Materialize  (cost=2.33..2.41 rows=3 width=32)
                     ->  Aggregate  (cost=2.33..2.40 rows=3 width=36)
                           ->  Streaming(type: BROADCAST)  (cost=2.33..2.40 rows=9 width=36)
                                 Spawn on: All datanodes
                                 ->  Aggregate  (cost=2.33..2.34 rows=3 width=36)
                                       ->  Merge Join  (cost=1.46..2.16 rows=200 width=4)
                                             Merge Cond: (costbased_rule_test.ct1.b = costbased_rule_test.ct2.a)
                                             ->  Index Only Scan using idx_ct1_b on ct1  (cost=0.00..90.75 rows=10100 width=4)
                                             ->  Sort  (cost=1.29..1.30 rows=6 width=8)
                                                   Sort Key: costbased_rule_test.ct2.a
                                                   ->  Streaming(type: BROADCAST)  (cost=0.00..1.28 rows=6 width=8)
                                                         Spawn on: All datanodes
                                                         ->  Seq Scan on ct2  (cost=0.00..1.01 rows=2 width=8)
(34 rows)

但当设置costbased_rewrite_rule参数值为pullup_expr_sublink,enable_sublink_pullup_enhanced时,优化器会基于代价评估是否对子链接进行改写,因为此场景不进行SQL改写生成的计划更优,生成计划如下:

gaussdb=# SET rewrite_rule=enable_sublink_pullup_enhanced;
SET
gaussdb=# SET costbased_rewrite_rule=pullup_expr_sublink,enable_sublink_pullup_enhanced;
SET
gaussdb=# EXPLAIN SELECT ct2.a, SUM(ct2.b) AS value FROM ct2,ct1 WHERE ct2.a = ct1.b GROUP BY ct2.a HAVING SUM(ct2.b) >= (SELECT AVG(ct2.b) FROM ct2,ct1 WHERE ct2.a = ct1.b) ORDER BY value DESC;
                                                     QUERY PLAN                                                   

------------------------------------------------------------------------------------------------------------------
---
 Sort  (cost=5.35..5.35 rows=2 width=24)
   Sort Key: (pg_catalog.sum((sum(costbased_rule_test.ct2.b)))) DESC
   InitPlan 1 (returns $0)
     ->  Streaming (type: GATHER)  (cost=2.52..2.58 rows=3 width=36)
           Node/s: All datanodes
           ->  Aggregate  (cost=2.33..2.40 rows=3 width=36)
                 ->  Streaming(type: BROADCAST)  (cost=2.33..2.40 rows=9 width=36)
                       Spawn on: All datanodes
                       ->  Aggregate  (cost=2.33..2.34 rows=3 width=36)
                             ->  Merge Join  (cost=1.46..2.16 rows=200 width=4)
                                   Merge Cond: (costbased_rule_test.ct1.b = costbased_rule_test.ct2.a)
                                   ->  Index Only Scan using idx_ct1_b on ct1  (cost=0.00..90.75 rows=10100 width=
4)
                                   ->  Sort  (cost=1.29..1.30 rows=6 width=8)
                                         Sort Key: costbased_rule_test.ct2.a
                                         ->  Streaming(type: BROADCAST)  (cost=0.00..1.28 rows=6 width=8)
                                               Spawn on: All datanodes
                                               ->  Seq Scan on ct2  (cost=0.00..1.01 rows=2 width=8)
   ->  HashAggregate  (cost=2.79..2.94 rows=2 width=24)
         Group By Key: costbased_rule_test.ct2.a
         Filter: ((pg_catalog.sum((sum(costbased_rule_test.ct2.b))))::numeric >= $0)
         ->  Streaming (type: GATHER)  (cost=2.79..2.94 rows=6 width=24)
               Node/s: All datanodes
               ->  HashAggregate  (cost=2.66..2.69 rows=6 width=24)
                     Group By Key: costbased_rule_test.ct2.a
                     ->  Merge Join  (cost=1.46..2.16 rows=200 width=8)
                           Merge Cond: (costbased_rule_test.ct1.b = costbased_rule_test.ct2.a)
                           ->  Index Only Scan using idx_ct1_b on ct1  (cost=0.00..90.75 rows=10100 width=4)
                           ->  Sort  (cost=1.29..1.30 rows=6 width=8)
                                 Sort Key: costbased_rule_test.ct2.a
                                 ->  Streaming(type: BROADCAST)  (cost=0.00..1.28 rows=6 width=8)
                                       Spawn on: All datanodes
                                       ->  Seq Scan on ct2  (cost=0.00..1.01 rows=2 width=8)
(32 rows)

costbased_rewrite_rule=pullup_expr_sublink,enable_sublink_pullup_enhanced与rewrite_rule=disable_pullup_expr_sublink、rewrite_rule=enable_sublink_pullup_enhanced互斥,且costbased_rewrite_rule的值优先级高于rewrite_rule的优先级。

AND条件中ANY/IN非相关子链接提升场景

单个或AND条件里存在ANY/IN非相关子链接时可以进行SQL等价改写。如果关闭基于代价的查询改写评估策略,子链接会基于规则进行改写,生成计划如下:

gaussdb=# SET rewrite_rule=none;
SET
gaussdb=# SET costbased_rewrite_rule=none;
SET
gaussdb=# EXPLAIN SELECT * FROM ct2 WHERE ct2.a IN (SELECT a FROM ct1 WHERE ct1.c > 500 AND ct1.a = 10);
                           QUERY PLAN
-----------------------------------------------------------------
 Streaming (type: GATHER)  (cost=0.19..66.72 rows=3 width=16)
   Node/s: datanode2
   ->  Nested Loop Semi Join  (cost=0.00..66.53 rows=3 width=16)
         ->  Seq Scan on ct2  (cost=0.00..1.01 rows=1 width=16)
               Filter: (a = 10)
         ->  Seq Scan on ct1  (cost=0.00..65.50 rows=1 width=4)
               Filter: ((c > 500) AND (a = 10))
(7 rows)

但当设置costbased_rewrite_rule参数值为pullup_sublink_any_exists时,则优化器会基于代价评估是否对子链接进行改写,因为此场景不进行SQL改写生成的计划更优,因此生成计划如下:

gaussdb=# SET rewrite_rule=none;
SET
gaussdb=# SET costbased_rewrite_rule=pullup_sublink_any_exists;
SET
gaussdb=# EXPLAIN SELECT * FROM ct2 WHERE ct2.a IN (SELECT a FROM ct1 WHERE ct1.c > 500 AND ct1.a = 10);
                                    QUERY PLAN
-----------------------------------------------------------------------------------
 Streaming (type: GATHER)  (cost=65.59..66.66 rows=1 width=16)
   Node/s: All datanodes
   ->  Seq Scan on ct2  (cost=65.52..66.54 rows=1 width=16)
         Filter: (hashed SubPlan 1)
         SubPlan 1
           ->  Materialize  (cost=0.00..65.52 rows=9 width=4)
                 ->  Streaming(type: BROADCAST)  (cost=0.00..65.51 rows=3 width=4)
                       Spawn on: datanode2
                       ->  Seq Scan on ct1  (cost=0.00..65.50 rows=1 width=4)
                             Filter: ((c > 500) AND (a = 10))
(10 rows)

该场景仅受GUC参数costbased_rewrite_rule=pullup_sublink_any_exists控制,GUC参数rewrite_rule无相关控制参数。

AND条件中[NOT]EXISTS相关子连接提升场景

单个或AND条件里存在[NOT]EXISTS相关子链接时可以进行SQL等价改写。如果关闭基于代价的查询改写评估策略,子链接会基于规则进行改写,生成计划如下:

gaussdb=# SET rewrite_rule=none;
SET
gaussdb=# SET costbased_rewrite_rule=none;
SET
gaussdb=# EXPLAIN SELECT * FROM ct2 WHERE EXISTS (SELECT 1 FROM ct3 WHERE ct3.b = ct2.c AND ct3.c < 1000);
                                        QUERY PLAN                                        
------------------------------------------------------------------------------------------
 Streaming (type: GATHER)  (cost=4.66..5.95 rows=2 width=16)
   Node/s: All datanodes
   ->  Nested Loop  (cost=4.60..5.82 rows=2 width=16)
         Join Filter: (ct2.c = ct3.b)
         ->  Streaming(type: REDISTRIBUTE)  (cost=0.00..1.13 rows=2 width=16)
               Spawn on: All datanodes
               ->  Seq Scan on ct2  (cost=0.00..1.01 rows=2 width=16)
         ->  Materialize  (cost=4.60..4.65 rows=11 width=4)
               ->  HashAggregate  (cost=4.60..4.64 rows=11 width=4)
                     Group By Key: ct3.b
                     ->  Streaming(type: REDISTRIBUTE)  (cost=3.55..4.57 rows=33 width=4)
                           Spawn on: All datanodes
                           ->  HashAggregate  (cost=3.55..3.66 rows=33 width=4)
                                 Group By Key: ct3.b
                                 ->  Seq Scan on ct3  (cost=0.00..3.46 rows=110 width=4)
                                       Filter: (c < 1000)
(16 rows)

但当设置costbased_rewrite_rule参数值为pullup_sublink_any_exists时,则优化器会基于代价评估是否对子链接进行改写,因为此场景不进行SQL改写生成的计划更优,因此生成计划如下:

gaussdb=# SET rewrite_rule=none;
SET
gaussdb=# SET costbased_rewrite_rule=pullup_sublink_any_exists;
SET
gaussdb=# EXPLAIN SELECT * FROM ct2 WHERE EXISTS (SELECT 1 FROM ct3 WHERE ct3.b = ct2.c AND ct3.c < 1000);
                                        QUERY PLAN                                         
-------------------------------------------------------------------------------------------
 Streaming (type: GATHER)  (cost=0.06..1.26 rows=1 width=16)
   Node/s: All datanodes
   ->  Seq Scan on ct2  (cost=0.00..1.14 rows=1 width=16)
         Filter: (SubPlan 1)
         SubPlan 1
           ->  Result  (cost=0.00..14.02 rows=330 width=4)
                 Filter: (ct3.b = ct2.c)
                 ->  Materialize  (cost=0.00..14.02 rows=330 width=4)
                       ->  Streaming(type: BROADCAST)  (cost=0.00..13.47 rows=330 width=4)
                             Spawn on: All datanodes
                             ->  Seq Scan on ct3  (cost=0.00..3.46 rows=110 width=4)
                                   Filter: (c < 1000)
(12 rows)

该场景仅受GUC参数costbased_rewrite_rule=pullup_sublink_any_exists控制,GUC参数rewrite_rule无相关控制参数。

AND条件中NOT IN非相关子连接提升场景

单个或AND条件里存在NOT IN非相关子链接时可以进行SQL等价改写。如果关闭基于代价的查询改写评估策略,子链接会基于规则进行改写,生成计划如下:

gaussdb=# SET rewrite_rule=none;
SET
gaussdb=# SET costbased_rewrite_rule=none;
SET
gaussdb=# EXPLAIN SELECT * FROM ct1 WHERE ct1.b NOT IN (SELECT a FROM ct3);
                                    QUERY PLAN                                     
-----------------------------------------------------------------------------------
 Streaming (type: GATHER)  (cost=4.00..5681.51 rows=9100 width=16)
   Node/s: All datanodes
   ->  Nested Loop Anti Join  (cost=0.00..5302.39 rows=9100 width=16)
         Join Filter: ((ct1.b = ct3.a) OR (ct1.b IS NULL) OR (ct3.a IS NULL))
         ->  Seq Scan on ct1  (cost=0.00..48.67 rows=10100 width=16)
         ->  Materialize  (cost=0.00..13.93 rows=330 width=4)
               ->  Streaming(type: BROADCAST)  (cost=0.00..13.38 rows=330 width=4)
                     Spawn on: All datanodes
                     ->  Seq Scan on ct3  (cost=0.00..3.37 rows=110 width=4)
(9 rows)

但当设置costbased_rewrite_rule参数值为pullup_not_in_sublink时,则优化器会基于代价评估是否对子链接进行改写,因为此场景不进行SQL改写生成的计划更优,因此生成计划如下:

gaussdb=# SET rewrite_rule=none;
SET
gaussdb=# SET costbased_rewrite_rule=pullup_not_in_sublink;
SET
gaussdb=# EXPLAIN SELECT * FROM ct1 WHERE ct1.b NOT IN (SELECT a FROM ct3);
                                     QUERY PLAN                                     
------------------------------------------------------------------------------------
 Streaming (type: GATHER)  (cost=8.96..272.42 rows=5050 width=16)
   Node/s: All datanodes
   ->  Seq Scan on ct1  (cost=4.96..62.05 rows=5050 width=16)
         Filter: (NOT (hashed SubPlan 1))
         SubPlan 1
           ->  Materialize  (cost=0.00..4.14 rows=990 width=4)
                 ->  Streaming(type: BROADCAST)  (cost=0.00..3.59 rows=330 width=4)
                       Spawn on: All datanodes
                       ->  Seq Scan on ct3  (cost=0.00..3.37 rows=110 width=4)
(9 rows)

costbased_rewrite_rule=pullup_not_in_sublink与rewrite_rule=disable_pullup_not_in_sublink互斥,且costbased_rewrite_rule的优先级高于rewrite_rule的优先级。

AND条件中表达式子链接提升场景

单个或AND条件里存在表达式类相关子链接时可以进行SQL等价改写。如果同时关闭基于规则和基于代价的查询改写评估策略,则子链接不会进行改写,生成计划如下:

gaussdb=# SET rewrite_rule=disable_pullup_expr_sublink;
SET
gaussdb=# SET costbased_rewrite_rule=none;
SET
gaussdb=# EXPLAIN SELECT * FROM ct1 WHERE ct1.a = (SELECT a FROM ct2 WHERE ct2.b = ct1.b LIMIT 1) AND ct1.b > 50;
                                             QUERY PLAN
----------------------------------------------------------------------------------------------------
 Streaming (type: GATHER)  (cost=1.07..4454.57 rows=50 width=16)
   Node/s: All datanodes
   ->  Seq Scan on ct1  (cost=0.01..4452.44 rows=50 width=16)
         Filter: ((b > 50) AND (a = (SubPlan 1)))
         SubPlan 1
           ->  Limit  (cost=1.30..1.30 rows=3 width=8)
                 ->  Sort  (cost=1.30..1.31 rows=3 width=8)
                       Sort Key: ct2.a
                       ->  Result  (cost=0.00..1.29 rows=6 width=8)
                             Filter: (ct2.b = ct1.b)
                             ->  Materialize  (cost=0.00..1.29 rows=6 width=8)
                                   ->  Streaming(type: BROADCAST)  (cost=0.00..1.28 rows=6 width=8)
                                         Spawn on: All datanodes
                                         ->  Seq Scan on ct2  (cost=0.00..1.01 rows=2 width=8)
(14 rows)

但当设置costbased_rewrite_rule参数值为pullup_expr_sublink时,则优化器会基于代价评估是否对子链接进行改写,因为此场景进行SQL改写生成的计划更优,因此生成计划如下:

gaussdb=# SET rewrite_rule=disable_pullup_expr_sublink;
SET
gaussdb=# SET costbased_rewrite_rule=pullup_expr_sublink;
SET
gaussdb=# EXPLAIN SELECT * FROM ct1 WHERE ct1.a = (SELECT a FROM ct2 WHERE ct2.b = ct1.b LIMIT 1) AND ct1.b > 50;
                                                          QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------
 Streaming (type: GATHER)  (cost=1.31..1.83 rows=3 width=16)
   Node/s: All datanodes
   ->  Merge Join  (cost=1.25..1.71 rows=3 width=16)
         Merge Cond: (subquery."?column?" = ct1.b)
         Join Filter: (ct1.a = subquery.a)
         ->  Sort  (cost=1.24..1.24 rows=1 width=8)
               Sort Key: subquery."?column?"
               ->  Streaming(type: REDISTRIBUTE)  (cost=1.05..1.23 rows=1 width=8)
                     Spawn on: All datanodes
                     ->  Subquery Scan on subquery  (cost=1.05..1.08 rows=1 width=8)
                           Filter: (subquery."?column?" OPERATOR(pg_catalog.=) 1::bigint)
                           ->  WindowAgg  (cost=1.05..1.07 rows=1 width=8)
                                 row_number_filter: (row_number() OVER (PARTITION BY ct2.b) OPERATOR(pg_catalog.<=) 1::bigint)
                                 ->  Sort  (cost=1.05..1.06 rows=1 width=8)
                                       Sort Key: ct2.b
                                       ->  Streaming(type: REDISTRIBUTE)  (cost=1.01..1.04 rows=1 width=8)
                                             Spawn on: All datanodes
                                             ->  Seq Scan on ct2  (cost=1.01..1.01 rows=1 width=8)
                                                   Filter: (b > 50)
         ->  Index Scan using idx_ct1_b on ct1  (cost=0.00..159.16 rows=10099 width=16)
               Index Cond: (b > 50)
(21 rows)

costbased_rewrite_rule=pullup_expr_sublink与rewrite_rule=disable_pullup_expr_sublink、rewrite_rule=enable_sublink_pullup_enhanced、rewrite_rule=magicset互斥,且costbased_rewrite_rule的优先级高于rewrite_rule的优先级。

AND条件中表达式子链接满足MAGICSET下推改写场景

单个或AND条件里存在表达式类相关子链接,且满足MagicSet下推规则可以将部分过滤条件下推到子链接中时可以进行SQL等价改写。如果关闭基于代价的查询改写评估策略,子链接会基于规则进行改写,生成计划如下:

gaussdb=# SET rewrite_rule=magicset;
SET
gaussdb=# SET costbased_rewrite_rule=none;
SET
gaussdb=# EXPLAIN SELECT * FROM ct1 WHERE  ct1.a = (SELECT a FROM ct2 WHERE ct2.b = ct1.b LIMIT 1) AND ct1.c=10;
                                                       QUERY PLAN                                                        
-------------------------------------------------------------------------------------------------------------------------
 Streaming (type: GATHER)  (cost=62.95..78.78 rows=3 width=16)
   Node/s: All datanodes
   ->  Nested Loop  (cost=62.89..78.66 rows=3 width=16)
         ->  Streaming(type: REDISTRIBUTE)  (cost=58.39..58.57 rows=1 width=8)
               Spawn on: All datanodes
               ->  Subquery Scan on subquery  (cost=58.39..58.43 rows=1 width=8)
                     Filter: (subquery."?column?" OPERATOR(pg_catalog.=) 1::bigint)
                     ->  WindowAgg  (cost=58.39..58.42 rows=3 width=8)
                           row_number_filter: (row_number() OVER (PARTITION BY ct2.b) OPERATOR(pg_catalog.<=) 1::bigint)
                           ->  Sort  (cost=58.39..58.40 rows=3 width=8)
                                 Sort Key: ct2.b
                                 ->  Nested Loop Semi Join  (cost=0.00..58.38 rows=3 width=8)
                                       Join Filter: (ct2.b = costbased_rule_test.ct1.b)
                                       ->  Streaming(type: REDISTRIBUTE)  (cost=0.00..1.13 rows=2 width=8)
                                             Spawn on: All datanodes
                                             ->  Seq Scan on ct2  (cost=0.00..1.01 rows=2 width=8)
                                       ->  Materialize  (cost=0.00..57.24 rows=1 width=4)
                                             ->  Streaming(type: REDISTRIBUTE)  (cost=0.00..57.23 rows=1 width=4)
                                                   Spawn on: All datanodes
                                                   ->  Seq Scan on ct1  (cost=0.00..57.09 rows=1 width=4)
                                                         Filter: (c = 10)
         ->  Bitmap Heap Scan on ct1  (cost=4.50..20.08 rows=1 width=16)
               Recheck Cond: (b = subquery."?column?")
               Filter: ((c = 10) AND (subquery.a = a))
               ->  Bitmap Index Scan on idx_ct1_b  (cost=0.00..4.50 rows=100 width=0)
                     Index Cond: (b = subquery."?column?")
(26 rows)

但当设置costbased_rewrite_rule参数值为pullup_expr_sublink时,则优化器会基于代价评估是否对子链接的MagicSet下推条件进行改写,因为此场景不进行MagicSet改写生成的计划更优,因此生成计划如下:

gaussdb=# SET rewrite_rule=magicset;
SET
gaussdb=# SET costbased_rewrite_rule=pullup_expr_sublink;
SET
gaussdb=# EXPLAIN SELECT * FROM ct1 WHERE  ct1.a = (SELECT a FROM ct2 WHERE ct2.b = ct1.b LIMIT 1)  AND ct1.c=10;
                                                       QUERY PLAN                                                        
-------------------------------------------------------------------------------------------------------------------------
 Streaming (type: GATHER)  (cost=5.61..21.43 rows=3 width=16)
   Node/s: All datanodes
   ->  Nested Loop  (cost=5.54..21.31 rows=3 width=16)
         ->  Streaming(type: REDISTRIBUTE)  (cost=1.05..1.22 rows=1 width=8)
               Spawn on: All datanodes
               ->  Subquery Scan on subquery  (cost=1.05..1.08 rows=1 width=8)
                     Filter: (subquery."?column?" OPERATOR(pg_catalog.=) 1::bigint)
                     ->  WindowAgg  (cost=1.05..1.07 rows=1 width=8)
                           row_number_filter: (row_number() OVER (PARTITION BY ct2.b) OPERATOR(pg_catalog.<=) 1::bigint)
                           ->  Sort  (cost=1.05..1.05 rows=1 width=8)
                                 Sort Key: ct2.b
                                 ->  Streaming(type: REDISTRIBUTE)  (cost=0.00..1.04 rows=2 width=8)
                                       Spawn on: All datanodes
                                       ->  Seq Scan on ct2  (cost=0.00..1.01 rows=2 width=8)
         ->  Bitmap Heap Scan on ct1  (cost=4.50..20.08 rows=1 width=16)
               Recheck Cond: (b = subquery."?column?")
               Filter: ((c = 10) AND (subquery.a = a))
               ->  Bitmap Index Scan on idx_ct1_b  (cost=0.00..4.50 rows=100 width=0)
                     Index Cond: (b = subquery."?column?")
(19 rows)

costbased_rewrite_rule=pullup_expr_sublink与rewrite_rule=disable_pullup_expr_sublink、rewrite_rule=enable_sublink_pullup_enhanced、rewrite_rule=magicset参数互斥,且costbased_rewrite_rule的优先级高于rewrite_rule的优先级。

AND条件中表达式子链接满足WINAGG改写场景

单个或AND条件里存在表达式类相关子链接,且满足WinAGG规则时可以进行SQL等价改写。如果关闭基于代价的查询改写评估策略,子链接会基于规则进行改写,生成计划如下:

gaussdb=# SET rewrite_rule=enable_sublink_pullup_enhanced;
SET
gaussdb=# SET costbased_rewrite_rule=none;
SET
gaussdb=# EXPLAIN SELECT * FROM ct1, ct2 WHERE ct1.b = ct2.c AND ct2.a = (SELECT min(b) FROM ct1 WHERE ct1.b = ct2.c);
                                                   QUERY PLAN                                                   
----------------------------------------------------------------------------------------------------------------
 Streaming (type: GATHER)  (cost=10.71..14.23 rows=1 width=36)
   Node/s: All datanodes
   ->  Subquery Scan on inner_subquery  (cost=10.64..14.11 rows=1 width=36)
         Filter: (inner_subquery.a = inner_subquery.min)
         ->  WindowAgg  (cost=10.64..13.27 rows=200 width=32)
               ->  Sort  (cost=10.64..11.02 rows=200 width=32)
                     Sort Key: ct2.c
                     ->  Streaming(type: REDISTRIBUTE)  (cost=1.46..5.22 rows=200 width=32)
                           Spawn on: All datanodes
                           ->  Merge Join  (cost=1.46..2.18 rows=200 width=32)
                                 Merge Cond: (ct1.b = ct2.c)
                                 ->  Index Scan using idx_ct1_b on ct1  (cost=0.00..150.75 rows=10100 width=16)
                                 ->  Sort  (cost=1.29..1.30 rows=3 width=16)
                                       Sort Key: ct2.c
                                       ->  Streaming(type: BROADCAST)  (cost=0.00..1.28 rows=6 width=16)
                                             Spawn on: All datanodes
                                             ->  Seq Scan on ct2  (cost=0.00..1.01 rows=2 width=16)
(17 rows)

但当设置costbased_rewrite_rule参数值为pullup_expr_sublink,enable_sublink_pullup_enhanced时,则优化器会基于代价评估是否对子链接进行改写,因为此场景不进行改写生成的计划更优,因此生成计划如下:

gaussdb=# SET rewrite_rule=enable_sublink_pullup_enhanced;
SET
gaussdb=# SET costbased_rewrite_rule=pullup_expr_sublink,enable_sublink_pullup_enhanced;
SET
gaussdb=# EXPLAIN SELECT * FROM ct1, ct2 WHERE ct1.b = ct2.c AND ct2.a = (SELECT min(b) FROM ct1 WHERE ct1.b = ct2.c);
                                                          QUERY PLAN                                                          
------------------------------------------------------------------------------------------------------------------------------
 Streaming (type: GATHER)  (cost=3.55..5.99 rows=100 width=32)
   Node/s: All datanodes
   ->  Merge Join  (cost=1.49..1.87 rows=100 width=32)
         Merge Cond: (costbased_rule_test.ct1.b = ct2.c)
         ->  Index Scan using idx_ct1_b on ct1  (cost=0.00..150.75 rows=10100 width=16)
         ->  Sort  (cost=1.40..1.41 rows=3 width=16)
               Sort Key: ct2.c
               ->  Streaming(type: BROADCAST)  (cost=0.00..1.39 rows=3 width=16)
                     Spawn on: All datanodes
                     ->  Seq Scan on ct2  (cost=0.00..1.12 rows=1 width=16)
                           Filter: (a = (SubPlan 2))
                           SubPlan 2
                             ->  Result  (cost=0.10..0.11 rows=3 width=0)
                                   InitPlan 1 (returns $1)
                                     ->  Limit  (cost=0.00..0.10 rows=3 width=4)
                                           ->  Result  (cost=0.00..1009.68 rows=3 width=4)
                                                 Filter: (costbased_rule_test.ct1.b = ct2.c)
                                                 ->  Materialize  (cost=0.00..1009.68 rows=3 width=4)
                                                       ->  Streaming(type: BROADCAST)  (cost=0.00..959.18 rows=30300 width=4)
                                                             Spawn on: All datanodes
                                                             ->  Seq Scan on ct1  (cost=0.00..48.67 rows=10100 width=4)
                                                                   Filter: (b IS NOT NULL)
(22 rows)

costbased_rewrite_rule=pullup_expr_sublink,enable_sublink_pullup_enhanced与rewrite_rule=disable_pullup_expr_sublink、rewrite_rule=enable_sublink_pullup_enhanced、rewrite_rule=magicset参数互斥,且costbased_rewrite_rule的优先级高于rewrite_rule的优先级。

OR条件中表达式子连接提升场景

OR条件里存在表达式类相关子链接,可以进行SQL等价改写。如果关闭基于代价的查询改写评估策略,子链接会基于规则进行改写,生成计划如下:

gaussdb=# SET rewrite_rule=none;
SET
gaussdb=# SET costbased_rewrite_rule=none;
SET
gaussdb=# EXPLAIN SELECT * FROM ct2 WHERE ct2.a = (SELECT min(b) FROM ct1 WHERE ct1.b = ct2.c) OR ct2.b = (SELECT sum(a) FROM ct3 WHERE ct3.a = ct2.c);
                                            QUERY PLAN                                             
---------------------------------------------------------------------------------------------------
 Streaming (type: GATHER)  (cost=73.05..75.50 rows=3 width=16)
   Node/s: All datanodes
   ->  Nested Loop Left Join  (cost=72.99..75.38 rows=3 width=16)
         Join Filter: (ct3.a = ct2.c)
         Filter: ((ct2.a = (min((min(ct1.b))))) OR (ct2.b = (sum(ct3.a))))
         ->  Nested Loop Left Join  (cost=69.43..71.67 rows=2 width=20)
               Join Filter: (ct1.b = ct2.c)
               ->  Streaming(type: REDISTRIBUTE)  (cost=0.00..1.13 rows=2 width=16)
                     Spawn on: All datanodes
                     ->  Seq Scan on ct2  (cost=0.00..1.01 rows=2 width=16)
               ->  Materialize  (cost=69.43..70.20 rows=101 width=8)
                     ->  HashAggregate  (cost=69.43..69.77 rows=101 width=8)
                           Group By Key: ct1.b
                           ->  Streaming(type: REDISTRIBUTE)  (cost=65.51..68.93 rows=303 width=8)
                                 Spawn on: All datanodes
                                 ->  HashAggregate  (cost=65.51..66.52 rows=303 width=8)
                                       Group By Key: ct1.b
                                       ->  Seq Scan on ct1  (cost=0.00..48.67 rows=10100 width=4)
         ->  HashAggregate  (cost=3.55..3.59 rows=12 width=12)
               Group By Key: ct3.a
               ->  Seq Scan on ct3  (cost=0.00..3.37 rows=110 width=4)
(21 rows)

但当设置costbased_rewrite_rule参数值为pullup_expr_sublink时,则优化器会基于代价评估是否对子链接进行改写,因为此场景不进行改写生成的计划更优,因此生成计划如下:

gaussdb=# SET rewrite_rule=none;
SET
gaussdb=# SET costbased_rewrite_rule=pullup_expr_sublink;
SET
gaussdb=# EXPLAIN SELECT * FROM ct2 WHERE ct2.a = (SELECT min(b) FROM ct1 WHERE ct1.b = ct2.c) OR ct2.b = (SELECT sum(a) FROM ct3 WHERE ct3.a = ct2.c);
                                                 QUERY PLAN                                                 
------------------------------------------------------------------------------------------------------------
 Streaming (type: GATHER)  (cost=3.75..5.07 rows=3 width=16)
   Node/s: All datanodes
   ->  Hash Left Join  (cost=3.68..4.95 rows=3 width=16)
         Hash Cond: (ct2.c = subquery."?column?")
         Filter: ((ct2.a = (SubPlan 2)) OR (ct2.b = subquery.sum))
         ->  Streaming(type: REDISTRIBUTE)  (cost=0.00..1.13 rows=2 width=16)
               Spawn on: All datanodes
               ->  Seq Scan on ct2  (cost=0.00..1.01 rows=2 width=16)
         ->  Hash  (cost=3.63..3.63 rows=12 width=12)
               ->  Subquery Scan on subquery  (cost=3.55..3.63 rows=12 width=12)
                     ->  HashAggregate  (cost=3.55..3.59 rows=12 width=12)
                           Group By Key: ct3.a
                           ->  Seq Scan on ct3  (cost=0.00..3.37 rows=110 width=4)
         SubPlan 2
           ->  Result  (cost=0.10..0.11 rows=3 width=0)
                 InitPlan 1 (returns $1)
                   ->  Limit  (cost=0.00..0.10 rows=3 width=4)
                         ->  Result  (cost=0.00..1009.68 rows=3 width=4)
                               Filter: (costbased_rule_test.ct1.b = ct2.c)
                               ->  Materialize  (cost=0.00..1009.68 rows=3 width=4)
                                     ->  Streaming(type: BROADCAST)  (cost=0.00..959.18 rows=30300 width=4)
                                           Spawn on: All datanodes
                                           ->  Seq Scan on ct1  (cost=0.00..48.67 rows=10100 width=4)
                                                 Filter: (b IS NOT NULL)
(24 rows)

costbased_rewrite_rule=pullup_expr_sublink与rewrite_rule=disable_pullup_expr_sublink、rewrite_rule=enable_sublink_pullup_enhanced、rewrite_rule=magicset参数互斥,且costbased_rewrite_rule的优先级高于rewrite_rule的优先级。

TargetList中相关表达式子连接提升场景

TargetList中存在表达式类相关子链接时可以进行SQL等价改写。如果同时关闭基于规则和基于代价的查询改写评估策略,子链接不进行改写,生成计划如下:

gaussdb=# SET rewrite_rule=none;
SET
gaussdb=# SET costbased_rewrite_rule=none;
SET
gaussdb=# EXPLAIN SELECT *, ct1.a = (SELECT a FROM ct2 WHERE ct2.b = ct1.b LIMIT 1)  FROM ct1 WHERE ct1.b >50;
                                             QUERY PLAN
----------------------------------------------------------------------------------------------------
 Streaming (type: GATHER)  (cost=4.00..4871.89 rows=10099 width=16)
   Node/s: All datanodes
   ->  Seq Scan on ct1  (cost=0.00..4451.14 rows=10099 width=16)
         Filter: (b > 50)
         SubPlan 1
           ->  Limit  (cost=1.30..1.30 rows=3 width=8)
                 ->  Sort  (cost=1.30..1.31 rows=3 width=8)
                       Sort Key: ct2.a
                       ->  Result  (cost=0.00..1.29 rows=6 width=8)
                             Filter: (ct2.b = ct1.b)
                             ->  Materialize  (cost=0.00..1.29 rows=6 width=8)
                                   ->  Streaming(type: BROADCAST)  (cost=0.00..1.28 rows=6 width=8)
                                         Spawn on: All datanodes
                                         ->  Seq Scan on ct2  (cost=0.00..1.01 rows=2 width=8)
(14 rows)

但当设置costbased_rewrite_rule参数值为intargetlist时,则优化器会基于代价评估是否对子链接进行改写,因为此场景进行改写生成的计划更优,因此生成计划如下:

gaussdb=# SET rewrite_rule=none;
SET
gaussdb=# SET costbased_rewrite_rule=intargetlist;
SET
gaussdb=# EXPLAIN SELECT *, ct1.a = (SELECT a FROM ct2 WHERE ct2.b = ct1.b LIMIT 1)  FROM ct1 WHERE ct1.b >50;
                                                          QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------
 Streaming (type: GATHER)  (cost=5.36..496.38 rows=10099 width=20)
   Node/s: All datanodes
   ->  Hash Left Join  (cost=1.36..75.63 rows=10099 width=20)
         Hash Cond: (ct1.b = subquery."?column?")
         ->  Seq Scan on ct1  (cost=0.00..57.09 rows=10099 width=16)
               Filter: (b > 50)
         ->  Hash  (cost=1.35..1.35 rows=3 width=8)
               ->  Streaming(type: BROADCAST)  (cost=1.05..1.35 rows=3 width=8)
                     Spawn on: All datanodes
                     ->  Subquery Scan on subquery  (cost=1.05..1.08 rows=1 width=8)
                           Filter: (subquery."?column?" OPERATOR(pg_catalog.=) 1::bigint)
                           ->  WindowAgg  (cost=1.05..1.07 rows=1 width=8)
                                 row_number_filter: (row_number() OVER (PARTITION BY ct2.b) OPERATOR(pg_catalog.<=) 1::bigint)
                                 ->  Sort  (cost=1.05..1.05 rows=1 width=8)
                                       Sort Key: ct2.b
                                       ->  Streaming(type: REDISTRIBUTE)  (cost=0.00..1.04 rows=2 width=8)
                                             Spawn on: All datanodes
                                             ->  Seq Scan on ct2  (cost=0.00..1.01 rows=2 width=8)
(18 rows)

costbased_rewrite_rule=intargetlist与rewrite_rule=ntargetlist、rewrite_rule=magicset互斥,且costbased_rewrite_rule的优先级高于rewrite_rule的优先级。

TargetList中表达式子链接满足MAGICSET下推改写场景

TargetList中存在表达式类相关子链接,且满足MagicSet下推规则可以将部分过滤条件下推到子链接中时可以进行SQL等价改写。如果关闭基于代价的查询改写评估策略,子链接会基于规则进行改写,生成计划如下:

gaussdb=# SET rewrite_rule=intargetlist,magicset;
SET
gaussdb=# SET costbased_rewrite_rule=none;
SET
gaussdb=# EXPLAIN SELECT *,ct1.a = (SELECT a FROM ct2 WHERE ct2.c = ct1.c LIMIT 1) FROM ct1 WHERE ct1.b=10;
                                                                    QUERY PLAN                                                       

-------------------------------------------------------------------------------------------------------------------------------------
-------------
 Streaming (type: GATHER)  (cost=27.90..46.26 rows=100 width=20)
   Node/s: All datanodes
   ->  Nested Loop Left Join  (cost=25.83..42.13 rows=100 width=20)
         Join Filter: (subquery."?column?" = costbased_rule_test.ct1.c)
         ->  Bitmap Heap Scan on ct1  (cost=4.51..19.92 rows=100 width=16)
               Recheck Cond: (b = 10)
               ->  Bitmap Index Scan on idx_ct1_b  (cost=0.00..4.50 rows=100 width=0)
                     Index Cond: (b = 10)
         ->  Materialize  (cost=21.33..21.64 rows=3 width=8)
               ->  Streaming(type: BROADCAST)  (cost=21.33..21.63 rows=3 width=8)
                     Spawn on: All datanodes
                     ->  Subquery Scan on subquery  (cost=21.33..21.36 rows=1 width=8)
                           Filter: (subquery."?column?" OPERATOR(pg_catalog.=) 1::bigint)
                           ->  WindowAgg  (cost=21.33..21.35 rows=3 width=8)
                                 row_number_filter: (row_number() OVER (PARTITION BY ct2.c) OPERATOR(pg_catalog.<=) 1::bigint)
                                 ->  Sort  (cost=21.33..21.33 rows=3 width=8)
                                       Sort Key: ct2.c
                                       ->  Nested Loop  (cost=20.16..21.32 rows=2 width=8)
                                             Join Filter: (ct2.c = costbased_rule_test.ct1.c)
                                             ->  Streaming(type: REDISTRIBUTE)  (cost=0.00..1.13 rows=2 width=8)
                                                   Spawn on: All datanodes
                                                   ->  Seq Scan on ct2  (cost=0.00..1.01 rows=2 width=8)
                                             ->  Materialize  (cost=20.16..20.18 rows=1 width=4)
                                                   ->  HashAggregate  (cost=20.16..20.17 rows=1 width=4)
                                                         Group By Key: costbased_rule_test.ct1.c
                                                         ->  Streaming(type: REDISTRIBUTE)  (cost=20.00..20.16 rows=3 width=4)
                                                               Spawn on: All datanodes
                                                               ->  HashAggregate  (cost=20.00..20.01 rows=3 width=4)
                                                                     Group By Key: costbased_rule_test.ct1.c
                                                                     ->  Bitmap Heap Scan on ct1  (cost=4.51..19.92 rows=100 width=4)
                                                                           Recheck Cond: (b = 10)
                                                                           ->  Bitmap Index Scan on idx_ct1_b  (cost=0.00..4.50 rows=
100 width=0)
                                                                                 Index Cond: (b = 10)
(33 rows)

但当设置costbased_rewrite_rule参数值为intargetlist时,则优化器会基于代价评估是否对子链接的MagicSet下推条件进行改写,因为此场景不进行MagicSet改写生成的计划更优,因此生成计划如下:

gaussdb=# SET rewrite_rule=intargetlist,magicset;
SET
gaussdb=# SET costbased_rewrite_rule=intargetlist;
SET
gaussdb=# EXPLAIN SELECT *,ct1.a = (SELECT a FROM ct2 WHERE ct2.c = ct1.c LIMIT 1) FROM ct1 WHERE ct1.b=10;
                                                          QUERY PLAN                                                           
-------------------------------------------------------------------------------------------------------------------------------
 Streaming (type: GATHER)  (cost=7.93..25.74 rows=100 width=20)
   Node/s: All datanodes
   ->  Hash Left Join  (cost=5.87..21.62 rows=100 width=20)
         Hash Cond: (ct1.c = subquery."?column?")
         ->  Bitmap Heap Scan on ct1  (cost=4.51..19.92 rows=100 width=16)
               Recheck Cond: (b = 10)
               ->  Bitmap Index Scan on idx_ct1_b  (cost=0.00..4.50 rows=100 width=0)
                     Index Cond: (b = 10)
         ->  Hash  (cost=1.35..1.35 rows=3 width=8)
               ->  Streaming(type: BROADCAST)  (cost=1.05..1.35 rows=3 width=8)
                     Spawn on: All datanodes
                     ->  Subquery Scan on subquery  (cost=1.05..1.08 rows=1 width=8)
                           Filter: (subquery."?column?" OPERATOR(pg_catalog.=) 1::bigint)
                           ->  WindowAgg  (cost=1.05..1.07 rows=1 width=8)
                                 row_number_filter: (row_number() OVER (PARTITION BY ct2.c) OPERATOR(pg_catalog.<=) 1::bigint)
                                 ->  Sort  (cost=1.05..1.05 rows=1 width=8)
                                       Sort Key: ct2.c
                                       ->  Streaming(type: REDISTRIBUTE)  (cost=0.00..1.04 rows=2 width=8)
                                             Spawn on: All datanodes
                                             ->  Seq Scan on ct2  (cost=0.00..1.01 rows=2 width=8)
(20 rows)

costbased_rewrite_rule=intargetlist与rewrite_rule=intargetlist、rewrite_rule=magicset参数互斥,且costbased_rewrite_rule的优先级高于rewrite_rule的优先级。

简单子查询展开改写场景

存在简单子查询时可以进行SQL等价改写。如果关闭基于代价的查询改写评估策略,子链接会基于规则进行改写,生成计划如下:

gaussdb=# SET rewrite_rule=none;
SET
gaussdb=# SET costbased_rewrite_rule=none;
SET
gaussdb=# EXPLAIN SELECT * FROM ct2,(SELECT ct1.* FROM ct1,ct2,ct3 WHERE ct1.c > 10 AND ct1.a = 10) ct WHERE ct.a = ct2.d;
                                            QUERY PLAN
---------------------------------------------------------------------------------------------------
 Streaming (type: GATHER)  (cost=4.00..100.52 rows=660 width=32)
   Node/s: All datanodes
   ->  Nested Loop  (cost=0.00..73.02 rows=660 width=32)
         ->  Nested Loop  (cost=0.00..69.52 rows=110 width=16)
               ->  Streaming(type: BROADCAST)  (cost=0.00..65.78 rows=3 width=16)
                     Spawn on: datanode2
                     ->  Seq Scan on ct1  (cost=0.00..65.50 rows=1 width=16)
                           Filter: ((c > 10) AND (a = 10))
               ->  Seq Scan on ct3  (cost=0.00..3.37 rows=110 width=0)
         ->  Materialize  (cost=0.00..2.58 rows=6 width=16)
               ->  Streaming(type: BROADCAST)  (cost=0.00..2.57 rows=6 width=16)
                     Spawn on: All datanodes
                     ->  Nested Loop  (cost=0.00..2.30 rows=2 width=16)
                           ->  Seq Scan on ct2  (cost=0.00..1.01 rows=2 width=0)
                           ->  Materialize  (cost=0.00..1.29 rows=3 width=16)
                                 ->  Streaming(type: BROADCAST)  (cost=0.00..1.28 rows=3 width=16)
                                       Spawn on: All datanodes
                                       ->  Seq Scan on ct2  (cost=0.00..1.01 rows=1 width=16)
                                             Filter: (d = 10)
(19 rows)

但当设置costbased_rewrite_rule参数值为pullup_subquery时,则优化器会基于代价评估是否对简单子查询进行改写,因为此场景不进行简单子查询改写生成的计划更优,因此生成计划如下:

gaussdb=# SET rewrite_rule=none;
SET
gaussdb=# SET costbased_rewrite_rule=pullup_subquery;
SET
gaussdb=# EXPLAIN SELECT * FROM ct2,(SELECT ct1.* FROM ct1,ct2,ct3 WHERE ct1.c > 10 AND ct1.a = 10) ct WHERE ct.a = ct2.d;
                                             QUERY PLAN
----------------------------------------------------------------------------------------------------
 Streaming (type: GATHER)  (cost=1.37..73.72 rows=4 width=32)
   Node/s: All datanodes
   ->  Hash Join  (cost=1.31..73.60 rows=4 width=32)
         Hash Cond: (ct1.a = costbased_rule_test.ct2.d)
         ->  Nested Loop  (cost=0.00..71.37 rows=220 width=16)
               ->  Seq Scan on ct3  (cost=0.00..3.37 rows=110 width=0)
               ->  Materialize  (cost=0.00..67.08 rows=6 width=16)
                     ->  Streaming(type: BROADCAST)  (cost=0.00..67.07 rows=6 width=16)
                           Spawn on: All datanodes
                           ->  Nested Loop  (cost=0.00..66.80 rows=2 width=16)
                                 ->  Streaming(type: BROADCAST)  (cost=0.00..65.78 rows=3 width=16)
                                       Spawn on: datanode2
                                       ->  Seq Scan on ct1  (cost=0.00..65.50 rows=1 width=16)
                                             Filter: ((c > 10) AND (a = 10))
                                 ->  Seq Scan on ct2  (cost=0.00..1.01 rows=2 width=0)
         ->  Hash  (cost=1.28..1.28 rows=6 width=16)
               ->  Streaming(type: BROADCAST)  (cost=0.00..1.28 rows=6 width=16)
                     Spawn on: All datanodes
                     ->  Seq Scan on ct2  (cost=0.00..1.01 rows=2 width=16)
(19 rows)

该场景仅受GUC参数costbased_rewrite_rule=pullup_subquery控制,GUC参数rewrite_rule无相关控制参数。

生成PGXC计划时退化为基于规则的查询重写

生成PGXC计划时,DN执行计划的代价计算不准确,故基于代价比较计划的结果不准确,基于代价的查询重写也因此退化为基于规则的查询重写。

示例如下,该计划效果更佳(基于规则改写,将带有EXISTS的子链接进行了提升)。

gaussdb=# SET enable_stream_operator=off;
SET
gaussdb=# SET costbased_rewrite_rule= pullup_sublink_any_exists;
SET
gaussdb=# EXPLAIN ANALYZE  SELECT * FROM ct3 WHERE EXISTS (SELECT 1 from ct1 WHERE ct1.a = ct3.a AND ct1.c < 1000);
                                                                     QUERY PLAN                                                                      
-----------------------------------------------------------------------------------------------------------------------------------------------------
 Hash Join  (cost=26.02..26.66 rows=110 width=16) (actual time=23.507..23.744 rows=110 loops=1)
   Hash Cond: (ct3.a = ct1.a)
   ->  Data Node Scan on ct3 "_REMOTE_TABLE_QUERY_"  (cost=0.00..0.00 rows=110 width=16) (actual time=0.741..0.817 rows=110 loops=1)
         Node/s: All datanodes
   ->  Hash  (cost=25.59..25.59 rows=34 width=4) (actual time=22.596..22.596 rows=100 loops=1)
          Buckets: 32768  Batches: 1  Memory Usage: 4kB
         ->  HashAggregate  (cost=25.25..25.59 rows=34 width=4) (actual time=22.517..22.552 rows=100 loops=1)
               Group By Key: ct1.a
               ->  Data Node Scan on ct1 "_REMOTE_TABLE_QUERY_"  (cost=0.00..0.00 rows=10100 width=4) (actual time=1.129..17.014 rows=10100 loops=1)
                     Node/s: All datanodes
 Total runtime: 24.440 ms
(11 rows)

而不应生成如下所示的计划(代价显示虽低但实际更慢)。

gaussdb=# EXPLAIN ANALYZE  SELECT  * FROM ct3 WHERE EXISTS (SELECT /*+ no_expand */ 1 from ct1 WHERE ct1.a = ct3.a AND ct1.c < 1000);
                                                                 QUERY PLAN                                                                 
--------------------------------------------------------------------------------------------------------------------------------------------
 Data Node Scan on ct3 "_REMOTE_TABLE_QUERY_"  (cost=0.00..0.00 rows=55 width=16) (actual time=8.168..120.004 rows=110 loops=1)
   Node/s: All datanodes
   Coordinator quals: (SubPlan 1)
   SubPlan 1
     ->  Data Node Scan on ct1 "_REMOTE_TABLE_QUERY_"  (cost=0.00..0.00 rows=101 width=0) (actual time=119.024..119.024 rows=110 loops=110)
           Node/s: All datanodes
           Coordinator quals: (a = ct3.a)
 Total runtime: 123.777 ms
(8 rows)

相关文档