结构匹配查询重写
结构匹配查询重写不要求查询语句与物化视图定义完全一致,只要求查询语句的结果能够通过物化视图的结果改写得到,通过添加投影列、过滤条件、聚集列、排序列以及limit的方式,使重写后的结果与原始查询语句的结果一致。
DWS支持四种结果匹配查询重写:条件重写、投影重写、聚合重写和JOIN重写。
使用约束
- 结构匹配查询重写功能语句较为复杂,建议使用前联系技术支持。
- 只支持形式是SELECT(条件)-PROJECTION(投影)-JOIN-GROUP BY (SPJG) 形式的查询。
- 查询中不能包含非RELATION、VIEW、MATVIEW、FOREIGN TABLE的对象。
- 物化视图不能包含子查询。
- 物化视图不能包含CTE、窗口函数、集合操作、同义词、rowSecurity、for update等。
- 查询可以包含limit offset/having/grouping sets,物化视图不能包含limit offset/having/grouping sets。
- 指定分区查询时,不支持结构匹配重写。
- distinct on不支持重写。
- 支持聚合上卷的聚合类型包含:sum/count/min/max/avg/roaringbitmap。
条件重写
物化视图和查询语句中都可能会存在过滤条件。若满足以下条件可以进行条件重写:
- 查询有过滤条件,而物化视图没有过滤条件。该情况下,可以通过添加过滤条件的方式进行重写。
- 查询有过滤条件,物化视图也有过滤条件,但是查询的过滤条件所表示的范围是物化视图的过滤条件所表示的范围的子集。该情况下,同样可以通过添加过滤条件的方式,来减小物化视图数据的范围进行重写
- 查询有过滤条件,物化视图也有过滤条件,但是物化视图的过滤条件都在查询的过滤条件中。该情况下,可以通过添加查询比物化视图多出来的过滤条件来进行重写。
条件重写示例:
创建基表t1及物化视图mv1_t1,mv2_t1,mv3_t1。
1 2 3 4 | create table t1(a int, b int, c int); create materialized view mv1_t1 enable query rewrite as select c from t1; create materialized view mv2_t1 enable query rewrite as select a, b, c from t1 where a > 2; create materialized view mv3_t1 enable query rewrite as select a, b, c from t1 where a > 1 and b < 2; |
场景一:查询有过滤条件,物化视图没有过滤条件。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | explain (costs off, verbose off, nodes off) select c from t1 where c > 2; QUERY PLAN ----------------------------------------------- id | operation ----+------------------------------ 1 | -> Streaming (type: GATHER) 2 | -> Seq Scan on mv1_t1 Predicate Information (identified by plan id) --------------------------------------------- 2 --Seq Scan on mv1_t1 Filter: (c >= 3) ====== Query Summary ===== ------------------------------- System available mem: 4710400KB Query Max mem: 4710400KB Query estimated mem: 1024KB (15 rows) |
场景二:查询的过滤条件是物化视图的过滤条件所表示的范围的子集。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | explain (costs off, verbose on, nodes off) select a from t1 where a > 3; QUERY PLAN ----------------------------------------------- id | operation ----+------------------------------ 1 | -> Streaming (type: GATHER) 2 | -> Seq Scan on mv2_t1 Predicate Information (identified by plan id) --------------------------------------------- 2 --Seq Scan on mv2_t1 Filter: (a >= 4) ====== Query Summary ===== ------------------------------- System available mem: 4710400KB Query Max mem: 4710400KB Query estimated mem: 1024KB (15 rows) |
场景三:物化视图的过滤条件都在查询的过滤条件中。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | explain (costs off, verbose on, nodes off) select a, b, c from t1 where a > 1 and b < 2 and c > 1; QUERY PLAN ----------------------------------------------- id | operation ----+------------------------------ 1 | -> Streaming (type: GATHER) 2 | -> Seq Scan on mv3_t1 Predicate Information (identified by plan id) --------------------------------------------- 2 --Seq Scan on mv3_t1 Filter: (c >= 2) ====== Query Summary ===== ------------------------------- System available mem: 4710400KB Query Max mem: 4710400KB Query estimated mem: 1024KB (15 rows) |
约束场景:添加的过滤条件不在物化视图的输出列中。
1 2 | create table t1(a int, b int, c int); create materialized view mv1_t1 enable query rewrite as select c from t1; |
当查询多出来的过滤条件不是关于物化视图输出列c时,无法通过mv1_t1重写。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 | explain (costs off, verbose on, nodes off) select c from t1 where a > 0; QUERY PLAN ----------------------------------------------- id | operation ----+------------------------------ 1 | -> Streaming (type: GATHER) 2 | -> Seq Scan on t1 Predicate Information (identified by plan id) --------------------------------------------- 2 --Seq Scan on t1 Filter: (a > 0) ====== Query Summary ===== ------------------------------- System available mem: 4710400KB Query Max mem: 4710400KB Query estimated mem: 1024KB (15 rows) |
投影重写
物化视图结构匹配查询重写不要求查询和物化视图的输出列完全一致,只要满足以下场景便可重写:
- 查询的输出列是物化视图输出列的组合。
- 查询的输出列是在物化视图输出列的基础上增加函数或者表达式。
- 查询的输出列可以通过等价类替代成物化视图的输出列。
条件重写示例:
创建基表t2和物化视图mv1_t2和mv2_t2。
create table t2(a int, b int, c int); create materialized view mv1_t2 enable query rewrite as select a, c from t2; create materialized view mv2_t2 enable query rewrite as select a from t2 where a = b;
场景一:查询的输出列是物化视图输出列的组合。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | explain (costs off, verbose on, nodes off) select a + c from t2; QUERY PLAN ------------------------------------------------------- id | operation ----+---------------------------------- 1 | -> Streaming (type: GATHER) 2 | -> Seq Scan on public.mv1_t2 Targetlist Information (identified by plan id) ---------------------------------------------- 1 --Streaming (type: GATHER) Output: ((a + c)) 2 --Seq Scan on public.mv1_t2 Output: (a + c) |
场景二:查询的输出列是在物化视图输出列的基础上增加函数或者表达式。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | explain (costs off, verbose on, nodes off) select nvl(a, 10) from t2; QUERY PLAN ------------------------------------------------------- id | operation ----+---------------------------------- 1 | -> Streaming (type: GATHER) 2 | -> Seq Scan on public.mv1_t2 Targetlist Information (identified by plan id) ---------------------------------------------- 1 --Streaming (type: GATHER) Output: (COALESCE(a, 10)) 2 --Seq Scan on public.mv1_t2 Output: COALESCE(a, 10) |
场景三:查询的输出列可以通过等价类替代成物化视图的输出列。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | explain (costs off, verbose on, nodes off) select b from t2 where a = b; QUERY PLAN ------------------------------------------------------- id | operation ----+---------------------------------- 1 | -> Streaming (type: GATHER) 2 | -> Seq Scan on public.mv2_t2 Targetlist Information (identified by plan id) ---------------------------------------------- 1 --Streaming (type: GATHER) Output: a 2 --Seq Scan on public.mv2_t2 Output: a |
聚合重写
物化视图结构匹配查询重写不要求查询和物化视图的group by列一样,只要满足以下情况便可进行重写。
- group by列的顺序没有区别,物化视图和查询的group by列相同的话,顺序不同时也可重写。
- 当物化视图没有group by列,查询有group by列时,通过给物化视图添加group by的方式进行重写。
- 当物化视图和查询都有group by列,并且查询的group by列是物化视图的子集时,可以通过上卷重新group by的方式重写,min/max/count/sum等聚集函数会被重写为min(min),max(max),sum(count),sum(sum)。
- 当物化视图group by列没有grouping sets/rollup/cube,而查询有grouping sets/rollup/cube,可以进行查询。
聚合重写示例:
创建示例表t3。
1 | CREATE TABLE t3(a int, b int, c int); |
场景一:group by列完全匹配。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | create materialized view mv1_t3 enable query rewrite as select a,b,sum(c) from t3 group by a,b; explain (costs off, verbose on, nodes off) select sum(c) from t3 group by b,a; QUERY PLAN ------------------------------------------------------- id | operation ----+---------------------------------- 1 | -> Streaming (type: GATHER) 2 | -> Seq Scan on public.mv1_t3 Targetlist Information (identified by plan id) ---------------------------------------------- 1 --Streaming (type: GATHER) Output: sum 2 --Seq Scan on public.mv1_t3 Output: sum |
场景二:物化视图没有group by,查询有group by。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 | create materialized view mv2_t3 enable query rewrite as select a,b,c from t3 explain (costs off, verbose on, nodes off) select sum(a) from t3 group by b,c; QUERY PLAN ------------------------------------------------------- id | operation ----+---------------------------------------- 1 | -> HashAggregate 2 | -> Streaming (type: GATHER) 3 | -> HashAggregate 4 | -> Seq Scan on public.mv2_t3 Targetlist Information (identified by plan id) ---------------------------------------------- 1 --HashAggregate Output: pg_catalog.sum((sum(a))), b, c Group By Key: mv2_t3.b, mv2_t3.c 2 --Streaming (type: GATHER) Output: (sum(a)), b, c 3 --HashAggregate Output: sum(a), b, c Group By Key: mv2_t3.b, mv2_t3.c 4 --Seq Scan on public.mv2_t3 Output: b, c, a |
场景三:查询的group by是物化视图的group by的子集。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 | create materialized view mv3_t3 enable query rewrite as select sum(b),min(b),max(b),count(b),a,c from t3 group by a,c; explain (costs off, verbose on, nodes off) select sum(b),min(b),max(b),count(b),avg(b) from t3 group by a; QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------------------------------- id | operation ----+----------------------------------------- 1 | -> Streaming (type: GATHER) 2 | -> HashAggregate 3 | -> Streaming(type: REDISTRIBUTE) 4 | -> Seq Scan on public.mv3_t3 Targetlist Information (identified by plan id) ------------------------------------------------------------------------------------------------------------------------------------------------------------ 1 --Streaming (type: GATHER) Output: ((sum(sum))::bigint), (min(min)), (max(max)), ((sum(count))::bigint), ((((sum(sum))::bigint)::numeric / ((sum(count))::bigint)::numeric)), a 2 --HashAggregate Output: (sum(sum))::bigint, min(min), max(max), (sum(count))::bigint, (((sum(sum))::bigint)::numeric / ((sum(count))::bigint)::numeric), a Group By Key: mv3_t3.a 3 --Streaming(type: REDISTRIBUTE) Output: a, sum, min, max, count Distribute Key: a 4 --Seq Scan on public.mv3_t3 Output: a, sum, min, max, count |
JOIN重写
查询和物化视图存在多表关联时,除了满足其它重写条件,需要表一一对应,并且在JOIN部分需要满足关联顺序、关联类型以及关联条件的限制。
满足以下条件可以进行多表关联的重写:
- 如果关联均为内连接,则无需满足关联顺序的限制,关联条件上,如果物化视图中两列是等价类,则必须在查询中也是等价类。
等价类,即当查询中有t1.a=t2.a这种条件时,认为t1.a和t2.a等价,如果再加上t2.a=t3.a,则可认为t2.a和t3.a等价,最后t1.a,t2.a,t3.a等价,构成一个等价类。
- 如果存在外连接,并且物化视图和查询的外连接类型相同,则要求物化视图和查询外连接的内外表顺序一致,并且关联条件一致,并且可能被补null的一侧要一致。
- 如果连接类型不同,则需要满足派生重写的条件,如表1 JOIN派生重写所示。此外,连接类型不同的位置,不能是可能被补null的一侧。
- 如果表个数不同,查询涉及的表是物化视图的表的子集,并且物化视图多出来的表能够通过主外键关联消除掉,可以进行重写。
| 查询Join类型 | 物化视图 Join类型 | 补偿谓词 |
|---|---|---|
| INNER JOIN | LEFT JOIN | LEFT JOIN右侧表的列 is not null |
| ANTI JOIN | LEFT JOIN | LEFT JOIN右侧表的列 is null |
| INNER JOIN | RIGHT JOIN | RIGHT JOIN左侧表的列 is not null |
| RIGHT ANTI JOIN | RIGHT JOIN | RIGHT JOIN左侧表的列 is null |
| SEMI JOIN | INNER JOIN | 主键/unique 直接转换,无需添加补偿谓词 |
| RIGHT SEMI JOIN | INNER JOIN | 主键/unique 直接转换,无需添加补偿谓词 |
| LEFT JOIN | FULL JOIN | FULL JOIN左侧表的列 is not null |
| RIGHT JOIN | FULL JOIN | FULL JOIN右侧表的列 is not null |
| INNER JOIN | FULL JOIN | FULL JOIN两侧表的列 is not null |
关联重写示例:
创建基表join_t1和join_t2。
create table join_t1 (a1 int, b1 int, c1 int); create table join_t2 (a2 int, b2 int, c2 int);
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 | create materialized view mv_join_01 enable query rewrite as select * from join_t1 join join_t2 on join_t1.a1 = join_t2.a2; explain (costs off, verbose on, nodes off) select sum(a1), b1 + c1 from join_t1 join join_t2 on join_t1.a1 = join_t2.a2 and join_t1.b1 = join_t2.b2 and join_t1.c1 = join_t2.c2 where c1 > 10 group by b1,c1; QUERY PLAN ----------------------------------------------------------------------------------------------------------------- id | operation ----+-------------------------------------------- 1 | -> HashAggregate 2 | -> Streaming (type: GATHER) 3 | -> HashAggregate 4 | -> Seq Scan on public.mv_join_01 Predicate Information (identified by plan id) --------------------------------------------------------------------------------------------------------------- 4 --Seq Scan on public.mv_join_01 Filter: ((mv_join_01.c1 >= 11) AND (mv_join_01.b1 = mv_join_01.b2) AND (mv_join_01.c1 = mv_join_01.c2)) Targetlist Information (identified by plan id) -------------------------------------------------------------- 1 --HashAggregate Output: pg_catalog.sum((sum(a1))), ((b1 + c1)), b1, c1 Group By Key: mv_join_01.b1, mv_join_01.c1 2 --Streaming (type: GATHER) Output: (sum(a1)), ((b1 + c1)), b1, c1 3 --HashAggregate Output: sum(a1), (b1 + c1), b1, c1 Group By Key: mv_join_01.b1, mv_join_01.c1 4 --Seq Scan on public.mv_join_01 Output: b1, c1, a1, b2, c2 |
场景二:物化视图和查询存在外连接。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 | create materialized view mv_join_02 enable query rewrite as select * from join_t1 left join join_t2 on join_t1.a1 = join_t2.a2; explain (costs off, verbose on, nodes off) select sum(a1), b1 + c1 from join_t1 left join join_t2 on join_t1.a1 = join_t2.a2 where join_t1.b1 = join_t2.b2 and join_t1.c1 = join_t2.c2 and c1 > 10 group by b1,c1; QUERY PLAN ----------------------------------------------------------------------------------------------------------------- id | operation ----+-------------------------------------------- 1 | -> HashAggregate 2 | -> Streaming (type: GATHER) 3 | -> HashAggregate 4 | -> Seq Scan on public.mv_join_02 Predicate Information (identified by plan id) --------------------------------------------------------------------------------------------------------------- 4 --Seq Scan on public.mv_join_02 Filter: ((mv_join_02.c1 >= 11) AND (mv_join_02.b1 = mv_join_02.b2) AND (mv_join_02.c1 = mv_join_02.c2)) Targetlist Information (identified by plan id) -------------------------------------------------------------- 1 --HashAggregate Output: pg_catalog.sum((sum(a1))), ((b1 + c1)), b1, c1 Group By Key: mv_join_02.b1, mv_join_02.c1 2 --Streaming (type: GATHER) Output: (sum(a1)), ((b1 + c1)), b1, c1 3 --HashAggregate Output: sum(a1), (b1 + c1), b1, c1 Group By Key: mv_join_02.b1, mv_join_02.c1 4 --Seq Scan on public.mv_join_02 Output: b1, c1, a1, b2, c2 |
场景三:物化视图和查询的关联类型不同。
物化视图是full join,查询是right join,通过给右侧添加is not null成功重写。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 | create materialized view mv_join_03 enable query rewrite as select * from join_t1 full join join_t2 on join_t1.a1 = join_t2.a2; explain (costs off, verbose on, nodes off) select sum(a1), b1 + c1 from join_t1 right join join_t2 on join_t1.a1 = join_t2.a2 where join_t1.b1 = join_t2.b2 and join_t1.c1 = join_t2.c2 group by b1,c1; QUERY PLAN ----------------------------------------------------------------------------------------------------------------------- id | operation ----+-------------------------------------------- 1 | -> HashAggregate 2 | -> Streaming (type: GATHER) 3 | -> HashAggregate 4 | -> Seq Scan on public.mv_join_03 Predicate Information (identified by plan id) --------------------------------------------------------------------------------------------------------------------- 4 --Seq Scan on public.mv_join_03 Filter: ((mv_join_03.a2 IS NOT NULL) AND (mv_join_03.b1 = mv_join_03.b2) AND (mv_join_03.c1 = mv_join_03.c2)) Targetlist Information (identified by plan id) -------------------------------------------------------------- 1 --HashAggregate Output: pg_catalog.sum((sum(a1))), ((b1 + c1)), b1, c1 Group By Key: mv_join_03.b1, mv_join_03.c1 2 --Streaming (type: GATHER) Output: (sum(a1)), ((b1 + c1)), b1, c1 3 --HashAggregate Output: sum(a1), (b1 + c1), b1, c1 Group By Key: mv_join_03.b1, mv_join_03.c1 4 --Seq Scan on public.mv_join_03 Output: b1, c1, a1, b2, c2 |
场景四:查询的表是物化视图的表的子集,并且物化视图中关联条件符合主外键关系。
使用场景:宽表是将业务相关的指标、维度、属性关联在一起的一张表。类似大宽表,可以基于多张表的关联创建物化视图,业务查询只涉及其中的几张表。 通过物化视图改写查询后,多表关联的查询可以达到与查询相应大宽表相同的效果。注意:导入数据时,不进行外键约束校验。业务必须确保导入的数据满足主外键约束条件,再在本场景使用物化视图重写。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 | set info_constraint_options = 'foreign_key'; alter table join_t1 add constraint pkey_join_t1 primary key(a1); alter table join_t2 add constraint fkey_join_t2 foreign key(a2) references join_t1(a1); alter table join_t2 modify a2 not null; explain (costs off, verbose on, nodes off) select a2 from join_t2; QUERY PLAN ------------------------------------------------------- id | operation ----+-------------------------------------- 1 | -> Streaming (type: GATHER) 2 | -> Seq Scan on public.mv_join_01 Targetlist Information (identified by plan id) ---------------------------------------------- 1 --Streaming (type: GATHER) Output: a1 2 --Seq Scan on public.mv_join_01 Output: a1 |
使用HINT指定物化视图进行重写
可以使用hint mvrewrite指定物化视图名字进行重写。mvrewrite后面可以指定多个表名,或者schema.表名。
1 2 3 4 5 6 7 8 9 10 11 12 13 | create table t1(a int,b int); create materialized view hintmv1 ENABLE QUERY REWRITE as select * from t1 where a > 1; create materialized view hintmv2 ENABLE QUERY REWRITE as select * from t1 where a > 2; explain (verbose on, costs off, nodes off) select /*+ mvrewrite (hintmv2) */ * from t1 where a>2; QUERY PLAN ------------------------------------------- Streaming (type: GATHER) Output: a, b, c -> Seq Scan on public.hintmv2 Output: a, b, c Distribute Key: a (5 rows) |