Case: Modifying the GUC Parameter rewrite_rule
rewrite_rule contains multiple query rewriting rules: magicset, uniquecheck, intargetlist, and predpush. The following describes the application scenarios of some important rules.
Preparing the Case Environment
To demonstrate rule application scenarios, you need to prepare the following table creation statements:
-- Clean the environment. DROP SCHEMA IF EXISTS rewrite_rule_guc_test CASCADE; CREATE SCHEMA rewrite_rule_guc_test; SET current_schema=rewrite_rule_guc_test; -- Create a test table. CREATE TABLE t(c1 INT, c2 INT, c3 INT, c4 INT); CREATE TABLE t1(c1 INT, c2 INT, c3 INT, c4 INT); CREATE TABLE t2(c1 INT, c2 INT, c3 INT, c4 INT);
intargetlist: Target Column Subquery Performance Improvement
The query performance can be greatly improved by converting the subquery in the target column to JOIN. The following is an example:
gaussdb=# SET rewrite_rule='none'; SET gaussdb=# EXPLAIN (verbose on, costs off) SELECT c1,(SELECT avg(c2) FROM t2 WHERE t2.c2=t1.c2) FROM t1 WHERE t1.c1<100 ORDER BY t1.c2; QUERY PLAN ----------------------------------------------- Sort Output: t1.c1, ((SubPlan 1)), t1.c2 Sort Key: t1.c2 -> Seq Scan on public.t1 Output: t1.c1, (SubPlan 1), t1.c2 Filter: (t1.c1 < 100) SubPlan 1 -> Aggregate Output: avg(t2.c2) -> Seq Scan on public.t2 Output: t2.c1, t2.c2 Filter: (t2.c2 = t1.c2) (12 rows)
Because the subquery (select avg(c2) from t2 where t2.c2=t1.c2) in the target column cannot be pulled up, execution of the subquery is triggered each time a row of data of t1 is scanned, and the query efficiency is low. If the intargetlist parameter is enabled, the subquery is converted to JOIN to improve the query performance.
gaussdb=# SET rewrite_rule='intargetlist'; SET gaussdb=# EXPLAIN (verbose on, costs off) SELECT c1,(SELECT avg(c2) FROM t2 WHERE t2.c2=t1.c2) FROM t1 WHERE t1.c1<100 ORDER BY t1.c2; QUERY PLAN ----------------------------------------------- Sort Output: t1.c1, (avg(t2.c2)), t1.c2 Sort Key: t1.c2 -> Hash Left Join Output: t1.c1, (avg(t2.c2)), t1.c2 Hash Cond: (t1.c2 = t2.c2) -> Seq Scan on public.t1 Output: t1.c1, t1.c2 Filter: (t1.c1 < 100) -> Hash Output: (avg(t2.c2)), t2.c2 -> HashAggregate Output: avg(t2.c2), t2.c2 Group By Key: t2.c2 -> Seq Scan on public.t2 Output: t2.c2 (16 rows)
uniquecheck: Performance Improvement of Subqueries Without Aggregate Functions
To ensure subquery pullup, each condition must have only one line of output. The subqueries with aggregate functions can be automatically pulled up. For subqueries without aggregate functions, as shown in the following example:
SELECT t1.c1 FROM t1 WHERE t1.c1 = (SELECT t2.c1 FROM t2 WHERE t1.c1=t2.c2);
Rewrite as follows:
SELECT t1.c1 FROM t1 JOIN (SELECT t2.c1 FROM t2 WHERE t2.c1 IS NOT NULL GROUP BY t2.c1(unique check)) tt(c1) on tt.c1=t1.c1;
Note that unique check in the preceding SQL statement indicates that t2.c1 needs to be checked. If the SQL statement is abnormal, the SQL statement cannot be directly executed. To ensure semantic equivalence, the subquery tt must ensure that each group by t2.c1 has only one line of output. Enable the uniquecheck query rewriting parameter to ensure that the query can be pulled up and equivalent. If more than one row of data is output at run time, an error is reported.
gaussdb=# SET rewrite_rule='uniquecheck'; SET gaussdb=# EXPLAIN verbose SELECT t1.c1 FROM t1 WHERE t1.c1 = (SELECT t2.c1 FROM t2 WHERE t1.c1=t2.c1); QUERY PLAN ------------------------------------------------------------------------------------- Hash Join (cost=43.36..104.40 rows=2149 distinct=[200, 200] width=4) Output: t1.c1 Hash Cond: (t1.c1 = subquery."?column?") -> Seq Scan on public.t1 (cost=0.00..31.49 rows=2149 width=4) Output: t1.c1, t1.c2 -> Hash (cost=40.86..40.86 rows=200 width=8) Output: subquery."?column?", subquery.c1 -> Subquery Scan on subquery (cost=36.86..40.86 rows=200 width=8) Output: subquery."?column?", subquery.c1 -> HashAggregate (cost=36.86..38.86 rows=200 width=4) Output: t2.c1, t2.c1 Group By Key: t2.c1 Filter: (t2.c1 IS NOT NULL) Unique Check Required -> Seq Scan on public.t2 (cost=0.00..31.49 rows=2149 width=4) Output: t2.c1 (16 rows)
Note: Because group by t2.c1 unique check occurs before the filter condition tt.c1=t1.c1, an error may be reported after the query that does not report an error is rewritten. An example is as follows:
There are tables t1 and t2. The data in the tables is as follows:
gaussdb=# SELECT * FROM t1 ORDER BY c2; c1 | c2 ----+---- 1 | 1 2 | 2 3 | 3 (3 rows) gaussdb=# SELECT * FROM t2 ORDER BY c2; c1 | c2 ----+---- 1 | 1 2 | 2 3 | 3 4 | 4 4 | 4 5 | 5 (6 rows)
Disable and enable the uniquecheck parameter for comparison. After the parameter is enabled, an error is reported.
gaussdb=# SELECT t1.c1 FROM t1 WHERE t1.c1 = (SELECT t2.c1 FROM t2 WHERE t1.c1=t2.c2) ; c1 ---- 1 2 3 (3 rows) gaussdb=# SET rewrite_rule='uniquecheck'; SET gaussdb=# SELECT t1.c1 FROM t1 WHERE t1.c1 = (SELECT t2.c1 FROM t2 WHERE t1.c1=t2.c2) ; ERROR: more than one row returned by a subquery used as an expression
Parameter lazyagg: Eliminating the Aggregation Operation in a Subquery
The aggregation operation in a subquery is eliminated to improve the query efficiency. An example is as follows:
gaussdb=# SET rewrite_rule =none; SET gaussdb=# EXPLAIN (costs off) SELECT t.c2, sum(cc) FROM (SELECT c2, sum(c3) AS cc FROM t1 GROUP BY c2) s1, t WHERE s1.c2=t.c2 GROUP BY t.c2 ORDER BY 1,2; QUERY PLAN ----------------------------------------------------- Sort Sort Key: t.c2, (sum(s1.cc)) -> HashAggregate Group By Key: t.c2 -> Hash Join Hash Cond: (t.c2 = s1.c2) -> Seq Scan on t -> Hash -> Subquery Scan on s1 -> HashAggregate Group By Key: t1.c2 -> Seq Scan on t1 (12 rows)
A subquery and the outer query have the same GROUP BY condition. The two-layer aggregation operation may lower down the query efficiency. Enable the lazyagg parameter to eliminate the aggregation operation in the subquery and improve the query performance.
gaussdb=# SET rewrite_rule = lazyagg; SET gaussdb=# EXPLAIN (costs off) SELECT t.c2, sum(cc) FROM (SELECT c2, sum(c3) AS cc FROM t1 GROUP BY c2) s1, t WHERE s1.c2=t.c2 GROUP BY t.c2 ORDER BY 1,2; QUERY PLAN ------------------------------------------ Sort Sort Key: t.c2, (sum((t1.c3)::bigint)) -> HashAggregate Group By Key: t.c2 -> Hash Join Hash Cond: (t1.c2 = t.c2) -> Seq Scan on t1 -> Hash -> Seq Scan on t (9 rows)
Pushing Conditions from the Main Query to a Subquery
Join a subquery with the aggregation operator with the main query in advance. An example is as follows:
gaussdb=# SET rewrite_rule = none; SET gaussdb=# EXPLAIN (costs off) SELECT t1 FROM t1 WHERE t1.c2 = 10 AND t1.c3 < (SELECT sum(c3) FROM t2 WHERE t1.c1 = t2.c1); QUERY PLAN --------------------------------------- Hash Join Hash Cond: (t2.c1 = t1.c1) Join Filter: (t1.c3 < (sum(t2.c3))) -> HashAggregate Group By Key: t2.c1 -> Seq Scan on t2 -> Hash -> Seq Scan on t1 Filter: (c2 = 10) (9 rows)
In this case, the join columns of the subquery are grouped and aggregated, and then the subquery is joined with the main query. This reduces repeated scanning of related sublinks and improves query efficiency. After the rewriting parameters are modified, the plan is changed as follows:
gaussdb=# SET rewrite_rule = magicset; SET gaussdb=# explain (costs off) SELECT t1 FROM t1 WHERE t1.c2 = 10 AND t1.c3 < (SELECT sum(c3) FROM t2 WHERE t1.c1 = t2.c1); QUERY PLAN --------------------------------------------------------------------- Hash Join Hash Cond: (t2.c1 = rewrite_rule_guc_test.t1.c1) Join Filter: (rewrite_rule_guc_test.t1.c3 < (sum(t2.c3))) -> HashAggregate Group By Key: t2.c1 -> Hash Join Hash Cond: (t2.c1 = rewrite_rule_guc_test.t1.c1) -> Seq Scan on t2 -> Hash -> HashAggregate Group By Key: rewrite_rule_guc_test.t1.c1 -> Seq Scan on t1 Filter: (c2 = 10) -> Hash -> Seq Scan on t1 Filter: (c2 = 10) (16 rows)
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot