Updated on 2024-05-20 GMT+08:00

Static Partition Pruning

For partitioned table query statements with constants in the search criteria, the search criteria contained in operators such as index scan, bitmap index scan, and index-only scan are used as pruning conditions in the optimizer phase to filter partitions. The search criteria must contain at least one partition key. For a partitioned table with a multi-column partition key, the search criteria can contain any column of the partition key.

Static pruning is supported in the following scenarios:
  1. Supported partitioning types: range partitioning, hash partitioning, and list partitioning.
  2. Supported expression types: comparison expression (<, <=, =, >=, >), logical expression, and array expression.
  • Currently, static pruning does not support subquery expressions.
  • To support partitioned table pruning, the filter condition on the partition key is forcibly converted to the partition key type when the plan is generated. This operation is different from the implicit type conversion rule. As a result, an error may be reported when the same condition is converted on the partition key, and no error is reported for non-partition keys.
  • Typical scenarios where static pruning is supported are as follows:
    1. Comparison expressions
      -- Create a partitioned table.
      gaussdb=# CREATE TABLE t1 (c1 int, c2 int)
      PARTITION BY RANGE (c1)
      (
          PARTITION p1 VALUES LESS THAN(10),
          PARTITION p2 VALUES LESS THAN(20),
          PARTITION p3 VALUES LESS THAN(MAXVALUE)
      );
      gaussdb=# SET max_datanode_for_plan = 1;
      
      gaussdb=# EXPLAIN (VERBOSE ON, COSTS OFF) SELECT * FROM t1 WHERE c1 = 1;
                              QUERY PLAN                         
      -----------------------------------------------------------
       Data Node Scan
         Output: t1.c1, t1.c2
         Node/s: datanode1
         Remote query: SELECT c1, c2 FROM public.t1 WHERE c1 = 1
      
       Remote SQL: SELECT c1, c2 FROM public.t1 WHERE c1 = 1
       Datanode Name: datanode1
         Partition Iterator
           Output: c1, c2
           Iterations: 1
           ->  Partitioned Seq Scan on public.t1
                 Output: c1, c2
                 Filter: (t1.c1 = 1)
                 Selected Partitions:  1
      
      (15 rows)
      
      gaussdb=# EXPLAIN (VERBOSE ON, COSTS OFF) SELECT * FROM t1 WHERE c1 < 1;
                              QUERY PLAN                         
      -----------------------------------------------------------
       Data Node Scan
         Output: t1.c1, t1.c2
         Node/s: All datanodes
         Remote query: SELECT c1, c2 FROM public.t1 WHERE c1 < 1
      
       Remote SQL: SELECT c1, c2 FROM public.t1 WHERE c1 < 1
       Datanode Name: datanode1
         Partition Iterator
           Output: c1, c2
           Iterations: 1
           ->  Partitioned Seq Scan on public.t1
                 Output: c1, c2
                 Filter: (t1.c1 < 1)
                 Selected Partitions:  1
      
      (15 rows)
      
      gaussdb=# EXPLAIN (VERBOSE ON, COSTS OFF) SELECT * FROM t1 WHERE c1 > 11;
                              QUERY PLAN                         
      ------------------------------------------------------------
       Data Node Scan
         Output: t1.c1, t1.c2
         Node/s: All datanodes
         Remote query: SELECT c1, c2 FROM public.t1 WHERE c1 > 11
      
       Remote SQL: SELECT c1, c2 FROM public.t1 WHERE c1 > 11
       Datanode Name: datanode1
         Partition Iterator
           Output: c1, c2
           Iterations: 2
           ->  Partitioned Seq Scan on public.t1
                 Output: c1, c2
                 Filter: (t1.c1 > 11)
                 Selected Partitions:  2..3
      
      (15 rows)
      
      gaussdb=# EXPLAIN (VERBOSE ON, COSTS OFF) SELECT * FROM t1 WHERE c1 is NULL;
                                QUERY PLAN                           
      ---------------------------------------------------------------
       Data Node Scan
         Output: t1.c1, t1.c2
         Node/s: datanode1
         Remote query: SELECT c1, c2 FROM public.t1 WHERE c1 IS NULL
      
       Remote SQL: SELECT c1, c2 FROM public.t1 WHERE c1 IS NULL
       Datanode Name: datanode1
         Partition Iterator
           Output: c1, c2
           Iterations: 1
           ->  Partitioned Seq Scan on public.t1
                 Output: c1, c2
                 Filter: (t1.c1 IS NULL)
                 Selected Partitions:  3
      
      (15 rows)
    2. Logical expressions
      gaussdb=# EXPLAIN (VERBOSE ON, COSTS OFF) SELECT * FROM t1 WHERE c1 = 1 AND c2 = 2;
                                    QUERY PLAN                              
      ----------------------------------------------------------------------
       Data Node Scan
         Output: t1.c1, t1.c2
         Node/s: datanode1
         Remote query: SELECT c1, c2 FROM public.t1 WHERE c1 = 1 AND c2 = 2
      
       Remote SQL: SELECT c1, c2 FROM public.t1 WHERE c1 = 1 AND c2 = 2
       Datanode Name: datanode1
         Partition Iterator
           Output: c1, c2
           Iterations: 1
           ->  Partitioned Seq Scan on public.t1
                 Output: c1, c2
                 Filter: ((t1.c1 = 1) AND (t1.c2 = 2))
                 Selected Partitions:  1
      
      (15 rows)
      
      gaussdb=# EXPLAIN (VERBOSE ON, COSTS OFF) SELECT * FROM t1 WHERE c1 = 1 OR c1 = 2;
                                  QUERY PLAN                              
      ---------------------------------------------------------------------
       Data Node Scan
         Output: t1.c1, t1.c2
         Node/s: All datanodes
         Remote query: SELECT c1, c2 FROM public.t1 WHERE c1 = 1 OR c1 = 2
      
       Remote SQL: SELECT c1, c2 FROM public.t1 WHERE c1 = 1 OR c1 = 2
       Datanode Name: datanode1
         Partition Iterator
           Output: c1, c2
           Iterations: 1
           ->  Partitioned Seq Scan on public.t1
                 Output: c1, c2
                 Filter: ((t1.c1 = 1) OR (t1.c1 = 2))
                 Selected Partitions:  1
      
      (15 rows)
      
      gaussdb=# EXPLAIN (VERBOSE ON, COSTS OFF) SELECT * FROM t1 WHERE NOT c1 = 1;
                                QUERY PLAN                           
      ---------------------------------------------------------------
       Data Node Scan
         Output: t1.c1, t1.c2
         Node/s: All datanodes
         Remote query: SELECT c1, c2 FROM public.t1 WHERE NOT c1 = 1
      
       Remote SQL: SELECT c1, c2 FROM public.t1 WHERE NOT c1 = 1
       Datanode Name: datanode1
         Partition Iterator
           Output: c1, c2
           Iterations: 3
           ->  Partitioned Seq Scan on public.t1
                 Output: c1, c2
                 Filter: (t1.c1 <> 1)
                 Selected Partitions:  1..3
      
      (15 rows)
    3. Array expressions
      gaussdb=# EXPLAIN (VERBOSE ON, COSTS OFF) SELECT * FROM t1 WHERE c1 IN (1, 2, 3);
                                       QUERY PLAN                                  
      ------------------------------------------------------------------------------
       Data Node Scan
         Output: t1.c1, t1.c2
         Node/s: All datanodes
         Remote query: SELECT c1, c2 FROM public.t1 WHERE c1 = ANY (ARRAY[1, 2, 3])
      
       Remote SQL: SELECT c1, c2 FROM public.t1 WHERE c1 = ANY (ARRAY[1, 2, 3])
       Datanode Name: datanode1
         Partition Iterator
           Output: c1, c2
           Iterations: 1
           ->  Partitioned Seq Scan on public.t1
                 Output: c1, c2
                 Filter: (t1.c1 = ANY ('{1,2,3}'::integer[]))
                 Selected Partitions:  1
      
      (15 rows)
      
      gaussdb=# EXPLAIN (VERBOSE ON, COSTS OFF) SELECT * FROM t1 WHERE c1 = ALL(ARRAY[1, 2, 3]);
                                        QUERY PLAN                                  
      ------------------------------------------------------------------------------
       Data Node Scan
         Output: t1.c1, t1.c2
         Node/s: All datanodes
         Remote query: SELECT c1, c2 FROM public.t1 WHERE c1 = ALL (ARRAY[1, 2, 3])
      
       Remote SQL: SELECT c1, c2 FROM public.t1 WHERE c1 = ALL (ARRAY[1, 2, 3])
       Datanode Name: datanode1
         Partition Iterator
           Output: c1, c2
           Iterations: 0
           ->  Partitioned Seq Scan on public.t1
                 Output: c1, c2
                 Filter: (t1.c1 = ALL ('{1,2,3}'::integer[]))
                 Selected Partitions:  NONE
      
      (15 rows)
      
      gaussdb=# EXPLAIN (VERBOSE ON, COSTS OFF) SELECT * FROM t1 WHERE c1 = ANY(ARRAY[1, 2, 3]);
                                        QUERY PLAN                                  
      ------------------------------------------------------------------------------
       Data Node Scan
         Output: t1.c1, t1.c2
         Node/s: All datanodes
         Remote query: SELECT c1, c2 FROM public.t1 WHERE c1 = ANY (ARRAY[1, 2, 3])
      
       Remote SQL: SELECT c1, c2 FROM public.t1 WHERE c1 = ANY (ARRAY[1, 2, 3])
       Datanode Name: datanode1
         Partition Iterator
           Output: c1, c2
           Iterations: 1
           ->  Partitioned Seq Scan on public.t1
                 Output: c1, c2
                 Filter: (t1.c1 = ANY ('{1,2,3}'::integer[]))
                 Selected Partitions:  1
      
      (15 rows)
      
      gaussdb=# EXPLAIN (VERBOSE ON, COSTS OFF) SELECT * FROM t1 WHERE c1 = SOME(ARRAY[1, 2, 3]);
                                        QUERY PLAN                                  
      ------------------------------------------------------------------------------
       Data Node Scan
         Output: t1.c1, t1.c2
         Node/s: All datanodes
         Remote query: SELECT c1, c2 FROM public.t1 WHERE c1 = ANY (ARRAY[1, 2, 3])
      
       Remote SQL: SELECT c1, c2 FROM public.t1 WHERE c1 = ANY (ARRAY[1, 2, 3])
       Datanode Name: datanode1
         Partition Iterator
           Output: c1, c2
           Iterations: 1
           ->  Partitioned Seq Scan on public.t1
                 Output: c1, c2
                 Filter: (t1.c1 = ANY ('{1,2,3}'::integer[]))
                 Selected Partitions:  1
      
      (15 rows)
  • Typical scenarios where static pruning is not supported are as follows:
    1. Subquery expressions
      gaussdb=# EXPLAIN (VERBOSE ON, COSTS OFF) SELECT * FROM t1 WHERE c1 = ALL(SELECT c2 FROM t1 WHERE c1 > 10);
                                     QUERY PLAN                                
      -------------------------------------------------------------------------
       Streaming (type: GATHER)
         Output: public.t1.c1, public.t1.c2
         Node/s: All datanodes
         ->  Partition Iterator
               Output: public.t1.c1, public.t1.c2
               Iterations: 3
               ->  Partitioned Seq Scan on public.t1
                     Output: public.t1.c1, public.t1.c2
                     Distribute Key: public.t1.c1
                     Filter: (SubPlan 1)
                     Selected Partitions:  1..3
                     SubPlan 1
                       ->  Materialize
                             Output: public.t1.c2
                             ->  Streaming(type: BROADCAST)
                                   Output: public.t1.c2
                                   Spawn on: All datanodes
                                   Consumer Nodes: All datanodes
                                   ->  Partition Iterator
                                         Output: public.t1.c2
                                         Iterations: 2
                                         ->  Partitioned Seq Scan on public.t1
                                               Output: public.t1.c2
                                               Distribute Key: public.t1.c1
                                               Filter: (public.t1.c1 > 10)
                                               Selected Partitions:  2..3
      (26 rows)
      
      -- Clean up the environment.
      gaussdb=# DROP TABLE t1;