Eliminating Redundant JOINs by Using Primary and Foreign Key Constraints

Use primary and foreign key constraints to optimize query plans and eliminate unnecessary JOIN operations.

Prerequisites

Only clusters with versions 9.1.0.200 and later support the removal of redundant JOINs using primary and foreign key constraints.
Enable the option to declare foreign key constraints before creating them.
1

set info_constraint_options = 'foreign_key';

Function Description

When working with large datasets in relational databases, multi-table JOINs are common and time-consuming. JOIN elimination based on primary and foreign key constraints (which contain table relationships) can significantly improve SQL execution performance by reducing unnecessary JOINs.

You can set the GUC parameter rewrite_rule to enable or disable the use of foreign key constraints to eliminate unnecessary JOIN operations. The settings are as follows:

Enable the JOIN elimination optimization function. Once enabled, the system utilizes the specified primary and foreign key constraints during SQL execution to eliminate unnecessary JOIN operations, following the JOIN elimination rules.
1

SET rewrite_rule = 'join_elimination';

Disable the join elimination optimization function.
1

SET rewrite_rule = '';

Procedure

Declare a foreign key.
Perform the JOIN elimination function to improve query performance.

Declaring a Foreign Key

Create a foreign key as well when using CREATE TABLE to create a table.
Use ALTER TABLE to create or delete a foreign key for an existing table.

DWS does not check data constraints. You need to check by yourself to ensure correct data constraint relationship between primary and foreign keys to avoid incorrect query results.
The DROP TABLE statement automatically deletes foreign keys without verification or errors.
To optimize JOIN elimination, add the NOT NULL constraint to the foreign key column. If there is no NOT NULL constraint, the optimizer automatically adds the IS NOT NULL filter condition of the foreign key column to ensure that the foreign key is not NULL. Then JOIN elimination can be optimized.

Creating a foreign key using CREATE TABLE

Create a table item with a primary key.

        
             CREATE TABLE item(a1 INT PRIMARY KEY, b1 INT) WITH (ORIENTATION=COLUMN) distribute by HASH(a1);

Run the CREATE TABLE statement to create a table store with foreign key constraints. The foreign key column a2 has the NOT NULL constraint and points to the primary key column a1 of the item table.

        
             CREATE TABLE store(a2 int NOT NULL REFERENCES item(a1), b2 int);

Creating or deleting a foreign key using ALTER TABLE

Create a table item with a primary key.

        
             CREATE TABLE item(a1 INT PRIMARY KEY, b1 INT) WITH (ORIENTATION=COLUMN) distribute by HASH(a1);

Run the CREATE TABLE statement to create a non-foreign key table store.

        
             CREATE TABLE store(a2 int NOT NULL , b2 int);

Use ALTER TABLE to add a foreign key to the store table.

        
             ALTER TABLE store ADD FOREIGN KEY (a2) REFERENCES item(a1);

Delete the foreign key from the store table using ALTER TABLE.

        
             ALTER TABLE store DROP CONSTRAINT store_a2_fkey;

Viewing the foreign key constraints on the table
DWS also offers functions for you to manually verify the validity of foreign key constraints.
1

SELECT check_foreign_key_constraint('store');

Using JOIN Elimination

JOIN elimination involves using primary and foreign key constraints to optimize query plans and eliminate unnecessary JOIN operations.

Prerequisites: join_elimination has been added to rewrite_rule.

     
           SET rewrite_rule='join_elimination';

Eliminate JOINs between two tables.

test2.a2 is a foreign key and test1.a1 is a primary key.

      
           EXPLAIN (COSTS OFF, NODES OFF) SELECT a1, a2 FROM test1, test2 WHERE a1 = a2 ORDER BY 1,2;
             QUERY PLAN              
-------------------------------------
  id |          operation           
 ----+------------------------------
   1 | ->  Streaming (type: GATHER)
   2 |    ->  Sort                  
   3 |       ->  Seq Scan on test2

      
           EXPLAIN (COSTS OFF, NODES OFF) SELECT a1, a2 FROM test1 inner join test2 on a1 = a2 ORDER BY 1,2;
             QUERY PLAN              
-------------------------------------
  id |          operation           
 ----+------------------------------
   1 | ->  Streaming (type: GATHER)
   2 |    ->  Sort                  
   3 |       ->  Seq Scan on test2

Eliminate vtest2 -> (vtest1, vtest3) in multi-table intra-joins.

vtest2 is a foreign key table, while vtest1 and vtest3 are primary key tables.

      
           EXPLAIN (COSTS OFF, NODES OFF)select vtest2.* from vtest2, vtest1, vtest3 where a2 = a1 and b2 = a3;
                  QUERY PLAN                   
-----------------------------------------------
  id |               operation                
 ----+----------------------------------------
   1 | ->  Row Adapter                        
   2 |    ->  Vector Streaming (type: GATHER)
   3 |       ->  CStore Scan on vtest2

Eliminate test2 -> test1 -> test3 in series joins.

test2 is a foreign key table and test3 is a primary key table. test1 serves as both a foreign key table and a primary key table.

      
           EXPLAIN (COSTS OFF, NODES OFF) SELECT a1, a2 FROM test1, test2, test3 WHERE a1 = a2 and b1 = a3 ORDER BY 1,2;
             QUERY PLAN              
-------------------------------------
  id |          operation           
 ----+------------------------------
   1 | ->  Streaming (type: GATHER)
   2 |    ->  Sort                  
   3 |       ->  Seq Scan on test2

      
           EXPLAIN (COSTS OFF, NODES OFF) SELECT a1, a2 FROM test2 inner join (select test1.* from test1 inner join test3 on b1 = a3) on a1 = a2;;
             QUERY PLAN              
-------------------------------------
  id |          operation           
 ----+------------------------------
   1 | ->  Streaming (type: GATHER)
   2 |    ->  Seq Scan on test2

Scenarios where elimination is not possible:

If a non-primary key column of the primary key table is referenced, the inner join cannot be eliminated (e.g., returning test1.b1).

      
           EXPLAIN (COSTS OFF, NODES OFF)
SELECT b1, a2 FROM test1, test2 WHERE a1 = a2 ORDER BY 1,2;
                  QUERY PLAN                   
-----------------------------------------------
  id |             operation             
 ----+-----------------------------------
   1 | ->  Streaming (type: GATHER)      
   2 |    ->  Sort                       
   3 |       ->  Hash Join (4,5)         
   4 |          ->  Seq Scan on test2    
   5 |          ->  Hash                 
   6 |             ->  Seq Scan on test1

 Predicate Information (identified by plan id)
 ---------------------------------------------
   3 --Hash Join (4,5)
         Hash Cond: (test2.a2 = test1.a1)