Updated on 2024-06-03 GMT+08:00

Optimizing Statement Pushdown

Statement Pushdown

Currently, the GaussDB optimizer can use three methods to develop statement execution policies in the distributed framework: generating a statement pushdown plan, a distributed execution plan, or a distributed execution plan for sending statements.

  • A statement pushdown plan pushes complete query statements from a CN down to DNs for execution and returns the execution results to the CN.
  • In a distributed execution plan, a CN compiles and optimizes query statements, generates a plan tree, and then sends the plan tree to DNs for execution. After the statements have been executed, execution results will be returned to the CN.
  • A distributed execution plan for sending statements pushes queries that can be pushed down (mostly base table scanning statements) to DNs for execution. Then, the plan obtains the intermediate results and sends them to the CN, on which the remaining queries are to be executed.

The third policy sends many intermediate results from DNs to the CN for further execution. In this case, the CN performance bottleneck (in bandwidth, storage, and computing) is caused by statements that cannot be pushed down to DNs. Therefore, you are advised not to use the query statements where only the third policy applies.

Statements cannot be pushed down if they have functions that do not support pushdown or syntax that does not support pushdown. Generally, you can rewrite the execution statements to solve the problem.

Typical Scenarios of Statement Pushdown

In the GaussDB optimizer, if you want to support statement pushdown, set the GUC parameter enable_fast_query_shipping to on. Generally, no execution plan operator is displayed after the EXPLAIN statement. If the keyword "Data Node Scan on" in the execution plan is displayed in the first line (excluding the plan format), the statement has been pushed down to DNs for execution. The following describes statement pushdown and its supported scope from multiple scenarios:

  1. Pushdown of single-table query statements

    In a distributed database, to query a single table, whether the current statement can be pushed down depends on whether the CN needs to participate in calculation instead of simply collecting data. If the CN needs to further calculate the DN result, the statement cannot be pushed down. Generally, statements with keywords such as agg, windows function, limit/offset, sort, distinct cannot be pushed down.

    • Pushdown: Simple queries can be pushed down without further calculation on the CN.
      gaussdb=# explain select * from t where c1 > 1;
                                       QUERY PLAN                                 
      ----------------------------------------------------------------------------
       Data Node Scan on "__REMOTE_FQS_QUERY__"  (cost=0.00..0.00 rows=0 width=0)
         Node/s: All datanodes
      (2 rows)
    • Non-pushdown: A CN with the limit clause cannot simply send statements to DNs and collect data, which is inconsistent with the semantics of the limit clause.
      gaussdb=# explain select * from t limit 1;
                                           QUERY PLAN                                      
      -------------------------------------------------------------------------------------
       Limit  (cost=0.00..0.00 rows=1 width=12)
         ->  Data Node Scan on "__REMOTE_LIMIT_QUERY__"  (cost=0.00..0.00 rows=1 width=12)
               Node/s: All datanodes
      (3 rows)
    • Non-pushdown: A CN with the aggregate function cannot simply push down statements. Instead, it needs to further aggregate the results collected from DNs.
      gaussdb=# explain select sum(c1), count(*) from t;
                                           QUERY PLAN                                      
      -------------------------------------------------------------------------------------
       Aggregate  (cost=0.10..0.11 rows=1 width=20)
         ->  Data Node Scan on "__REMOTE_GROUP_QUERY__"  (cost=0.00..0.00 rows=20 width=4)
               Node/s: All datanodes
      (3 rows)
  1. Pushdown of multi-table query statements
    In the multi-table query scenario, whether a statement can be pushed down depends on the join condition and distribution keys. That is, if the join condition matches the distribution keys of the table, the statement can be pushed down. Otherwise, the statement cannot be pushed down. Generally, a replication table can be pushed down.
    • Create two hash distribution tables.
      gaussdb=# create table t(c1 int, c2 int, c3 int)distribute by hash(c1);
      CREATE TABLE
      gaussdb=# create table t1(c1 int, c2 int, c3 int)distribute by hash(c1);
      CREATE TABLE
    • Pushdown: The join condition meets the hash distribution key attributes of two tables.
      gaussdb=# explain select * from t1 join t on t.c1 = t1.c1;
                                       QUERY PLAN                                 
      ----------------------------------------------------------------------------
       Data Node Scan on "__REMOTE_FQS_QUERY__"  (cost=0.00..0.00 rows=0 width=0)
         Node/s: All datanodes
      (2 rows)
    • Non-pushdown: The join condition does not meet the hash distribution key attribute. That is, t1.c2 is not the distribution key of t1.
      gaussdb=# explain select * from t1 join t on t.c1 = t1.c2;
                                               QUERY PLAN                                         
      --------------------------------------------------------------------------------------------
       Hash Join  (cost=0.25..0.53 rows=20 width=24)
         Hash Cond: (t1.c2 = t.c1)
         ->  Data Node Scan on t1 "_REMOTE_TABLE_QUERY_"  (cost=0.00..0.00 rows=20 width=12)
               Node/s: All datanodes
         ->  Hash  (cost=0.00..0.00 rows=20 width=12)
               ->  Data Node Scan on t "_REMOTE_TABLE_QUERY_"  (cost=0.00..0.00 rows=20 width=12)
                     Node/s: All datanodes
      (7 rows)
    • Delete two hash distribution tables.
      gaussdb=# DROP TABLE t;
      DROP TABLE
      gaussdb=# DROP TABLE t1;
      DROP TABLE
  1. Collection operation and CTE pushdown

    Whether statements in the collection operation scenario can be pushed down depends on whether the left and right subqueries can be pushed down. The rules are as follows:

    • UNION ALL can be pushed down only when both the left and right branches can be pushed down and the DNs of the left and right branches are the same.
    • The collection operation can be pushed down only when the left and right branches of UNION/INTERSECT/INTERSECT ALL/EXCEPT/EXCEPT ALL can be pushed down to the same single node.
      -- The test environment contains one CN and six DNs.
      -- Create a node group. You can run the SELECT node_name FROM PGXC_NODE WHERE node_type = 'D' command to query the DN name in the node group. In the query result, replace the DN name following WITH in the CREATE NODE GROUP statement as required.
      gaussdb=# CREATE NODE GROUP ng WITH(datanode1, datanode2, datanode3, datanode4, datanode5, datanode6);
      CREATE NODE GROUP
      -- Create a table.
      gaussdb=# CREATE TABLE t1(a int, b int, c int) DISTRIBUTE BY HASH(a) TO GROUP ng;
      CREATE TABLE
      gaussdb=# CREATE TABLE t2(a int, b int, c int) DISTRIBUTE BY HASH(a) TO GROUP ng;
      CREATE TABLE
      -- Both sides of UNION ALL can be pushed down. Therefore, UNION ALL can be pushed down.
      gaussdb=# EXPLAIN (COSTS OFF) SELECT * FROM t1 UNION ALL SELECT * FROM t2;
             QUERY PLAN
      -------------------------
       Data Node Scan
         Node/s: All datanodes
      (2 rows)
      
      -- Both sides of UNION can be pushed down, but not to a single DN. Therefore, UNION cannot be pushed down.
      gaussdb=# EXPLAIN (COSTS OFF) SELECT * FROM t1 UNION SELECT * FROM t2;
                     QUERY PLAN
      ----------------------------------------
       Streaming (type: GATHER)
         Node/s: All datanodes
         ->  HashAggregate
               Group By Key: t1.a, t1.b, t1.c
               ->  Append
                     ->  Seq Scan on t1
                     ->  Seq Scan on t2
      (7 rows)
      
      -- Both sides of UNION can be pushed down to the same DN. Therefore, UNION can be pushed down.
      gaussdb=# EXPLAIN (COSTS OFF) SELECT * FROM t1 WHERE a = 1 UNION SELECT * FROM t2 WHERE a = 1;
              QUERY PLAN
      --------------------------
       Data Node Scan
         Node/s: (ng) datanode5
      (2 rows)
      
      gaussdb=# EXPLAIN (COSTS OFF) SELECT * FROM t1 WHERE a = 1;
              QUERY PLAN
      --------------------------
       Data Node Scan
         Node/s: (ng) datanode5
      (2 rows)
      
      gaussdb=# EXPLAIN (COSTS OFF) SELECT * FROM t2 WHERE a = 3;
              QUERY PLAN
      --------------------------
       Data Node Scan
         Node/s: (ng) datanode4
      (2 rows)
      
      -- Both sides of UNION can be pushed down, but not to the same DN. Therefore, UNION cannot be pushed down.
      gaussdb=# EXPLAIN (COSTS OFF) SELECT * FROM t1 WHERE a = 1 UNION SELECT * FROM t2 WHERE a = 3;
                      QUERY PLAN
      -------------------------------------------
       Streaming (type: GATHER)
         Node/s: (GenGroup) datanode4, datanode5
         ->  Unique
               ->  Sort
                     Sort Key: t1.a, t1.b, t1.c
                     ->  Append
                           ->  Seq Scan on t1
                                 Filter: (a = 1)
                           ->  Seq Scan on t2
                                 Filter: (a = 3)
      (10 rows)

    Whether statements in the CTE scenario can be pushed down depends on whether statements in the CTE scenario can be pushed down. The rules are as follows:

    • The entire CTE can be pushed down only when statements in a CTE can be pushed down.
    • For a recursive CTE, the recursive part references its CTE. When determining whether the recursive part can be pushed down, the pushdown information of the self-referenced CTE depends on the pushdown information of the non-recursive part. That is, the non-recursive statement can be regarded as a subquery of the self-referenced CTE.
    • An entire recursive CTE can be pushed down only when statements in the recursive CTE can be pushed down, and the recursive and non-recursive statements can be pushed down to the same DN.
       1
       2
       3
       4
       5
       6
       7
       8
       9
      10
      11
      12
      13
      14
      15
      16
      17
      18
      19
      20
      21
      22
      23
      24
      25
      26
      27
      28
      29
      30
      31
      32
      33
      34
      35
      36
      37
      38
      39
      40
      41
      42
      43
      44
      45
      46
      47
      48
      49
      50
      51
      52
      53
      54
      55
      56
      57
      58
      59
      60
      61
      62
      63
      64
      65
      66
      67
      68
      69
      70
      71
      72
      73
      74
      75
      76
      77
      78
      -- Statements in the CTE can be pushed down. Therefore, the CTE can be pushed down.
      gaussdb=# EXPLAIN (COSTS OFF) WITH cte AS (SELECT * FROM t1) SELECT * FROM cte;
             QUERY PLAN
      -------------------------
       Data Node Scan
         Node/s: All datanodes
      (2 rows)
      
      -- Statements in the CTE can be pushed down to a single DN. Therefore, the CTE can be pushed down to a single DN.
      gaussdb=# EXPLAIN (COSTS OFF) WITH cte AS (SELECT * FROM t1 WHERE a = 1) SELECT * FROM cte;
              QUERY PLAN
      --------------------------
       Data Node Scan
         Node/s: (ng) datanode5
      (2 rows)
      
      -- Statements in the CTE cannot be pushed down. Therefore, the CTE cannot be pushed down.
      gaussdb=# EXPLAIN (COSTS OFF) WITH cte AS (SELECT * FROM t1 ORDER BY a ) SELECT * FROM cte;
               QUERY PLAN
      ----------------------------
       Streaming (type: GATHER)
         Node/s: All datanodes
         ->  Sort
               Sort Key: t1.a
               ->  Seq Scan on t1
      (5 rows)
      
      -- Statements in the recursive CTE can be pushed down, and the recursive and non-recursive parts can be pushed down to multiple same DNs. Therefore, the recursive CTE can be pushed down to multiple DNs.
      gaussdb=# EXPLAIN (COSTS OFF)
      WITH RECURSIVE cte AS (
      SELECT * FROM t1
      UNION ALL
      SELECT t2.* FROM t2, cte WHERE cte.a = t2.a) SELECT *FROM cte;
             QUERY PLAN
      -------------------------
       Data Node Scan
         Node/s: All datanodes
      (2 rows)
      
      -- Statements in the recursive CTE can be pushed down, and the recursive and non-recursive parts can be pushed down to a single DN. Therefore, the recursive CTE can be pushed down to the single DN.
      gaussdb=# EXPLAIN (COSTS OFF)
      WITH RECURSIVE cte AS (
      SELECT * FROM t1 WHERE a = 1
      UNION ALL
      SELECT t2.* FROM t2, cte WHERE cte.a = t2.a AND t2.a = 1) SELECT * FROM cte;
              QUERY PLAN
      --------------------------
       Data Node Scan
         Node/s: (ng) datanode5
      (2 rows)
      
      -- Statements in the recursive CTE cannot be pushed down. Therefore, the recursive CTE cannot be pushed down.
      gaussdb=# EXPLAIN (COSTS OFF)
      WITH RECURSIVE cte AS (
      SELECT * FROM t1
      UNION
      SELECT t2.* FROM t2, cte WHERE cte.a = t2.a
      ) SELECT *FROM cte;
                                    QUERY PLAN
      -----------------------------------------------------------------------
       CTE Scan on cte
         CTE cte
           ->  Recursive Union
                 ->  Data Node Scan on t1 "_REMOTE_TABLE_QUERY_"
                       Node/s: All datanodes
                 ->  Hash Join
                       Hash Cond: (cte.a = t2.a)
                       ->  WorkTable Scan on cte
                       ->  Hash
                             ->  Data Node Scan on t2 "_REMOTE_TABLE_QUERY_"
                                   Node/s: All datanodes
      (11 rows)
      gaussdb=# DROP TABLE t1;
      DROP TABLE
      gaussdb=# DROP TABLE t2;
      DROP TABLE
      gaussdb=# DROP NODE GROUP ng;
      DROP NODE GROUP
      
  1. Special scenarios

    The pushdown rules and restrictions in some special scenarios are as follows:

    • A statement cannot be pushed down if it is not a SELECT statement but contains WITH RECURSIVE CTE, common CTE, or collection operation.
    • A statement cannot be pushed down if it contains elements that cannot be pushed down (such as volatile functions, not shippable functions, agg functions, returning clauses, and triggers that cannot be pushed down).
    • If a CTE, recursive CTE, collection operation, subquery, or sublink cannot tailor a DN based on constant conditions, you can rewrite the statement to write conditions to the CTE, recursive CTE, collection operation, subquery, or sublink.
    • When a CTE, recursive CTE, or collection operation is joined with the main query, the entire statement can be pushed down only when the distribution information of the CTE, recursive CTE, or collection operation is the same as that of objects in the main query and the pushdown conditions are met.
    • In multi-table query scenarios, if a distributed table contains multiple distribution keys and the distribution keys in query conditions contain parameters, the gplan cannot be pushed down.

Checking Whether the Execution Plan Has Been Pushed Down

Perform the following procedure to quickly determine whether the execution plan can be pushed down:

  1. Set the GUC parameter enable_fast_query_shipping to off to use the distributed framework policy for the query optimizer.

    1
    SET enable_fast_query_shipping = off;
    

  2. View the execution plan.

    If the execution plan contains Data Node Scan nodes, the execution plan is a distributed execution plan for sending statements and cannot be pushed down. If the execution plan contains Streaming nodes, the SQL statements can be pushed down to DNs.

    For example:

    1
    2
    3
    4
    5
    gaussdb=# explain select
    count(ss.ss_sold_date_sk order by ss.ss_sold_date_sk)c1 
    from store_sales ss, store_returns sr 
    where 
    sr.sr_customer_sk = ss.ss_customer_sk;
    

    The execution plan is as follows, which indicates that the SQL statement cannot be pushed down.

                                  QUERY PLAN
    --------------------------------------------------------------------------
    Aggregate
    ->  Hash Join
    Hash Cond: (ss.ss_customer_sk = sr.sr_customer_sk)
    ->  Data Node Scan on store_sales "_REMOTE_TABLE_QUERY_"
    Node/s: All datanodes
    ->  Hash
    ->  Data Node Scan on store_returns "_REMOTE_TABLE_QUERY_"
    Node/s: All datanodes
    (8 rows)

Syntax That Does Not Support Pushdown

SQL syntax that does not support pushdown is described using the following table definition examples:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
gaussdb=# CREATE TABLE CUSTOMER1
(
    C_CUSTKEY     BIGINT NOT NULL
  , C_NAME        VARCHAR(25) NOT NULL
  , C_ADDRESS     VARCHAR(40) NOT NULL
  , C_NATIONKEY   INT NOT NULL
  , C_PHONE       CHAR(15) NOT NULL
  , C_ACCTBAL     DECIMAL(15,2)   NOT NULL
  , C_MKTSEGMENT  CHAR(10) NOT NULL
  , C_COMMENT     VARCHAR(117) NOT NULL
)
DISTRIBUTE BY hash(C_CUSTKEY);
gaussdb=# CREATE TABLE test_stream(a int, b float); --float does not support redistribution.
gaussdb=# CREATE TABLE sal_emp ( c1 integer[] ) DISTRIBUTE BY replication;
  • The RETURNING statement cannot be pushed down.
    1
    2
    3
    4
    5
    6
    7
    8
    9
    gaussdb=# explain update customer1 set C_NAME = 'a' returning c_name;
                                   QUERY PLAN                                           
    ------------------------------------------------------------------
     Update on customer1  (cost=0.00..0.00 rows=30 width=187)
       Node/s: All datanodes
       Node expr: c_custkey
       ->  Data Node Scan on customer1 "_REMOTE_TABLE_QUERY_"  (cost=0.00..0.00 rows=30 width=187)
             Node/s: All datanodes
    (5 rows)
    
  • The ORDER BY statement cannot be pushed down in aggregate functions.
     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    gaussdb=# explain verbose select count ( c_custkey order by c_custkey) from customer1;
                                   
                             QUERY PLAN                                        
    ------------------------------------------------------------------ Aggregate  (cost=2.50..2.51 rows=1 width=8)
       Output: count(customer1.c_custkey ORDER BY customer1.c_custkey)
       ->  Data Node Scan on customer1 "_REMOTE_TABLE_QUERY_"  (cost=0.00..0.00 rows=30 width=8)
             Output: customer1.c_custkey
             Node/s: All datanodes
             Remote query: SELECT c_custkey FROM ONLY public.customer1 WHERE true
    (6 rows)
    
  • If an SQL statement contains COUNT(DISTINCT expr) and columns in COUNT(DISTINCT expr) do not support redistribution, this statement cannot be pushed down.
    1
    2
    3
    4
    5
    6
    7
    8
    9
    gaussdb=# explain verbose select count(distinct b) from test_stream;
                                              QUERY PLAN                                           
    ------------------------------------------------------------------ Aggregate  (cost=2.50..2.51 rows=1 width=8)
       Output: count(DISTINCT test_stream.b)
       ->  Data Node Scan on test_stream "_REMOTE_TABLE_QUERY_"  (cost=0.00..0.00 rows=30 width=8)
             Output: test_stream.b
             Node/s: All datanodes
             Remote query: SELECT b FROM ONLY public.test_stream WHERE true
    (6 rows)
    
  • A statement containing DISTINCT ON cannot be pushed down.
     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    gaussdb=# explain verbose select distinct on (c_custkey) c_custkey from customer1 order by c_custkey;
                                                QUERY PLAN                                             
    ------------------------------------------------------------------ Unique  (cost=49.83..54.83 rows=30 width=8)
       Output: customer1.c_custkey
       ->  Sort  (cost=49.83..52.33 rows=30 width=8)
             Output: customer1.c_custkey
             Sort Key: customer1.c_custkey
             ->  Data Node Scan on customer1 "_REMOTE_TABLE_QUERY_"  (cost=0.00..0.00 rows=30 width=8)
                   Output: customer1.c_custkey
                   Node/s: All datanodes
                   Remote query: SELECT c_custkey FROM ONLY public.customer1 WHERE true
    (9 rows)
    
  • A statement containing array expressions cannot be pushed down.
     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    gaussdb=# explain verbose select array[c_custkey,1] from customer1 order by c_custkey;
    
                              QUERY PLAN                                                    
    ------------------------------------------------------------------ Sort  (cost=49.83..52.33 rows=30 width=8)
       Output: (ARRAY[customer1.c_custkey, 1::bigint]), customer1.c_custkey
       Sort Key: customer1.c_custkey
       ->  Data Node Scan on "__REMOTE_SORT_QUERY__"  (cost=0.00..0.00 rows=30 width=8)
             Output: (ARRAY[customer1.c_custkey, 1::bigint]), customer1.c_custkey
             Node/s: All datanodes
             Remote query: SELECT ARRAY[c_custkey, 1::bigint], c_custkey FROM ONLY public.customer1 WHERE true ORDER BY 2
    (7 rows)
    
  • Table 1 Scenarios where a statement containing WITH RECURSIVE cannot be pushed down describes scenarios where a statement containing WITH RECURSIVE cannot be pushed down in the current version, as well as the causes.
    Table 1 Scenarios where a statement containing WITH RECURSIVE cannot be pushed down

    No.

    Scenario

    Cause of Not Supporting Pushdown

    1

    The query contains foreign tables.

    LOG: SQL can't be shipped, reason: RecursiveUnion contains ForeignScan is not shippable (In this table, LOG describes the cause of not supporting pushdown.)

    In the current version, queries containing foreign tables do not support pushdown.

    2

    Multiple node groups exist.

    LOG: SQL can't be shipped, reason: With-Recursive under multi-nodegroup scenario is not shippable

    In the current version, pushdown is supported only when all base tables are stored and computed in the same node group.

    3

    UNION does not contain ALL, and deduplication is required.

    LOG: SQL can't be shipped, reason: With-Recursive does not contain "ALL" to bind recursive & none-recursive branches

    For example:

    WITH recursive t_result AS (
    SELECT dm,sj_dm,name,1 as level
    FROM test_rec_part
    WHERE sj_dm > 10
    UNION
    SELECT t2.dm,t2.sj_dm,t2.name||' > '||t1.name,t1.level+1
    FROM t_result t1
    JOIN test_rec_part t2 ON t2.sj_dm = t1.dm
    )
    SELECT * FROM t_result t;

    4

    A base table contains a system catalog.

    LOG: SQL can't be shipped, reason: With-Recursive contains system table is not shippable

    For example:

    WITH RECURSIVE x(id) AS
    (
    select count(1) from pg_class where oid=1247
    UNION ALL
    SELECT id+1 FROM x WHERE id < 5
    ), y(id) AS
    (
    select count(1) from pg_class where oid=1247
    UNION ALL
    SELECT id+1 FROM x WHERE id < 10
    )
    SELECT y.*, x.* FROM y LEFT JOIN x USING (id) ORDER BY 1;

    5

    Only the VALUES clause is used for scanning base tables. In this case, the statement can be executed on the CN, and DNs are unnecessary.

    LOG: SQL can't be shipped, reason: With-Recursive contains only values rte is not shippable

    For example:

    WITH RECURSIVE t(n) AS (
    VALUES (1)
    UNION ALL
    SELECT n+1 FROM t WHERE n < 100
    )
    SELECT sum(n) FROM t;

    6

    Only the recursion part has correlation conditions of correlated subqueries, and the non-recursion part has no correlation condition.

    LOG: SQL can't be shipped, reason: With-Recursive recursive term correlated only is not shippable

    For example:

    select  a.ID,a.Name,
    (
    with recursive cte as (
    select ID, PID, NAME from b where b.ID = 1
    union all
    select parent.ID,parent.PID,parent.NAME
    from cte as child join b as parent on child.pid=parent.id
    where child.ID = a.ID
    )
    select NAME from cte limit 1
    ) cName
    from
    (
    select id, name, count(*) as cnt
    from a group by id,name
    ) a order by 1,2;

    7

    The replicate plan is used for limit in the non-recursion part but the hash plan is used in the recursion part, resulting in conflicts.

    LOG: SQL can't be shipped, reason: With-Recursive contains conflict distribution in none-recursive(Replicate) recursive(Hash)

    For example:

    WITH recursive t_result AS (
    select * from(
    SELECT dm,sj_dm,name,1 as level
    FROM test_rec_part
    WHERE sj_dm < 10 order by dm limit 6 offset 2)
    UNION all
    SELECT t2.dm,t2.sj_dm,t2.name||' > '||t1.name,t1.level+1
    FROM t_result t1
    JOIN test_rec_part t2 ON t2.sj_dm = t1.dm
    )
    SELECT * FROM t_result t;

    8

    recursive of multiple-layers are nested. That is, a recursive is nested in the recursion part of another recursive.

    LOG: SQL can't be shipped, reason: Recursive CTE references recursive CTE "cte"

    For example:

    with recursive cte as
    (
    select * from rec_tb4 where id<4
    union all
    select h.id,h.parentID,h.name from
    (
    with recursive cte as
    (
    select * from rec_tb4 where id<4
    union all
    select h.id,h.parentID,h.name from rec_tb4 h inner join cte c on h.id=c.parentID
    )
    SELECT id ,parentID,name from cte order by parentID
    ) h
    inner join cte  c on h.id=c.parentID
    )
    SELECT id ,parentID,name from cte order by parentID,1,2,3;
Delete the tables.
1
2
3
4
5
6
gaussdb=# DROP TABLE CUSTOMER1;
DROP TABLE
gaussdb=# DROP TABLE test_stream;
DROP TABLE
gaussdb=# DROP TABLE sal_emp;
DROP TABLE

Functions That Do Not Support Pushdown

The following describes the volatility of functions. In GaussDB, every function has a volatility classification, with the possibilities being:

  • IMMUTABLE

    Indicates that the function always returns the same result if the parameter values are the same.

  • STABLE

    Indicates that the function cannot modify the database, and that within a single table scan it will consistently return the same result for the same parameter value, but its result varies by SQL statements.

  • VOLATILE

    Indicates that the function value can change in a single table scan and no optimization is performed.

The volatility of a function can be obtained by querying for its provolatile column in pg_proc. The value i indicates immutable, s indicates stable, and v indicates volatile. The valid values of the proshippable column in pg_proc are t, f, and NULL. This column and the provolatile column together describe whether a function is pushed down.

  • If the provolatile of a function is i, the function can be pushed down regardless of the value of proshippable.
  • If the provolatile of a function is s or v, the function can be pushed only if the value of proshippable is t.
  • CTEs containing random, exec_hadoop_sql, or exec_on_extension are not pushed down, because pushdown may lead to incorrect results.

When creating a customized function, you can specify the values of provolatile and proshippable. For details, see CREATE FUNCTION.

In scenarios where a function does not support pushdown, perform one of the following as required:

  • If it is a system function, replace it with a functionally equivalent one.
  • If it is a customized function, check whether its provolatile and proshippable are correctly defined.

Example: Customized Functions

Define a customized function that generates fixed output for a certain input as the immutable type.

Take the sales information of TPC Benchmark DS (TPC-DS) as an example. If you want to write a function to calculate the discount data of a product, you can define the function as follows:

1
2
3
4
CREATE FUNCTION func_percent_2 (NUMERIC, NUMERIC) RETURNS NUMERIC
AS 'SELECT $1 / $2 WHERE $2 > 0.01'
LANGUAGE SQL
VOLATILE;

Run the following statements:

1
2
SELECT func_percent_2(ss_sales_price, ss_list_price)
FROM store_sales;

The execution plan is as follows.

func_percent_2 is not pushed down, and ss_sales_price and ss_list_price are executed on a CN. In this case, a large amount of resources on the CN is consumed, and the performance deteriorates as a result.

In this example, the function generates the same output when the same input is provided. Therefore, we can modify the function to the following one:

1
2
3
4
CREATE FUNCTION func_percent_1 (NUMERIC, NUMERIC) RETURNS NUMERIC
AS 'SELECT $1 / $2 WHERE $2 > 0.01'
LANGUAGE SQL
IMMUTABLE;

Run the following statements:

1
2
SELECT func_percent_1(ss_sales_price, ss_list_price)
FROM store_sales;

The execution plan is as follows.

The func_percent_1 function is pushed down to DNs for execution.