Full Partition-Wise Join in the SMP Scenario

The partition-wise join plan in the SMP scenario is selected based on costs. In the path generation process, estimated costs of partition-wise join and non-partition-wise join paths are compared, and a path with a lower cost is selected. To enable it, set GUC parameter enable_smp_partitionwise.

Full partition-wise join applies to two tables with the same partitioning policy. The condition for generating a full partition-wise join path is that the partition keys of the two tables are a pair of matching join keys.

Usage Specifications

Full partition-wise join usage specifications in the SMP scenario are as follows:

Level-1 hash partitioned tables and level-1 range partitioned tables are supported.
For hash partitioned tables, the same partitioning policy means that the partition key types and the number of partitions are the same.
For range partitioned tables, the same partitioning policy means that the partition key types, number of partitions, number of partition keys, and boundary values of each partition are the same.
Only stream plans are supported.
Only the scenario where the partition key and distribution key are the same is supported.
Only the scenario where the Join operator can complete calculation on a single DN is supported. That is, data of the Join operator does not cross nodes.
Hash Join and Merge Join are supported.
Seqscan, Indexscan, Indexonlyscan and Imcvscan are supported. For Indexscan and Indexonlyscan, only partitioned local indexes are supported, and the index type is B-tree or UB-tree.
Related specifications are inherited from SMP specifications. IUD operations are not supported in the SMP scenario.
The SMP function must be enabled and the value of query_dop must be greater than 1.

Examples

-- Create a hash partitioned table.
gaussdb=# CREATE TABLE hash_part
(
    a INTEGER,
    b INTEGER,
    c INTEGER
)
DISTRIBUTE BY HASH(a)
PARTITION BY HASH(a)
(
    PARTITION p1,
    PARTITION p2,
    PARTITION p3,
    PARTITION p4,
    PARTITION p5
);
CREATE TABLE

-- Use a Stream plan.
gaussdb=# SET enable_fast_query_shipping = off;
SET
gaussdb=# SET enable_stream_operator = on;
SET

-- Set query_dop to 5 to enable SMP.
gaussdb=# SET query_dop = 5;
SET

-- Disable the partition-wise join function in the SMP scenario.
gaussdb=# SET enable_smp_partitionwise = off;
SET

-- View a non-partition-wise join plan. According to the plan, after data scanning is completed using the Partition Iterator and Partitioned Seq Scan operators at two layers, data is redistributed using the Streaming(type: LOCAL REDISTRIBUTE) operator to ensure that data in the Join operator matches each other.
gaussdb=# EXPLAIN (COSTS OFF) SELECT * FROM hash_part t1, hash_part t2 WHERE t1.a = t2.a;
                                QUERY PLAN                                
--------------------------------------------------------------------------
 Streaming (type: GATHER)
   Node/s: All datanodes
   ->  Streaming(type: LOCAL GATHER dop: 1/5)
         Spawn on: All datanodes
         ->  Nested Loop
               Join Filter: (t1.a = t2.a)
               ->  Streaming(type: LOCAL REDISTRIBUTE dop: 5/5)
                     Spawn on: All datanodes
                     ->  Partition Iterator
                           Iterations: 5
                           ->  Partitioned Seq Scan on hash_part t1
                                 Selected Partitions:  1..5
               ->  Materialize
                     ->  Streaming(type: LOCAL REDISTRIBUTE dop: 5/5)
                           Spawn on: All datanodes
                           ->  Partition Iterator
                                 Iterations: 5
                                 ->  Partitioned Seq Scan on hash_part t2
                                       Selected Partitions:  1..5
(19 rows)

-- Enable the partition-wise join function in the SMP scenario.
gaussdb=# SET enable_smp_partitionwise = on;
SET

-- View the execution plan of partition-wise join. According to the plan, the partition-wise join plan eliminates the Streaming operator, that is, data does not need to be redistributed between threads, thereby reducing overheads of data transfer and improving performance of the Join operation.
gaussdb=# EXPLAIN (COSTS OFF) SELECT * FROM hash_part t1, hash_part t2 WHERE t1.a = t2.a;
                             QUERY PLAN                             
--------------------------------------------------------------------
 Streaming (type: GATHER)
   Node/s: All datanodes
   ->  Streaming(type: LOCAL GATHER dop: 1/5)
         Spawn on: All datanodes
         ->  Hash Join (Partition-wise Join)
               Hash Cond: (t1.a = t2.a)
               ->  Partition Iterator
                     Iterations: 5
                     ->  Partitioned Seq Scan on hash_part t1
                           Selected Partitions:  1..5
               ->  Hash
                     ->  Partition Iterator
                           Iterations: 5
                           ->  Partitioned Seq Scan on hash_part t2
                                 Selected Partitions:  1..5
(15 rows)

-- Drop the partitioned table.
gaussdb=# DROP TABLE hash_part;

A partition-wise join message is displayed on the right of the Join operator only for a partition-wise join plan in the SMP scenario. This message is not displayed in non-SMP scenarios.

Parent topic: Partition-Wise Join

Previous topic: Partition-Wise Join in Non-SMP Scenarios

Next topic: Partitioned Table O&M Management