Help Center/ GaussDB(DWS)/ Performance Tuning/ Optimization Cases/ Case: Converting from NOT IN to NOT EXISTS
Updated on 2023-06-25 GMT+08:00

Case: Converting from NOT IN to NOT EXISTS

nestloop anti join must be used to implement NOT IN, while you can use Hash anti join to implement NOT EXISTS. If no NULL value exists in the JOIN column, NOT IN is equivalent to NOT EXISTS. Therefore, if you are sure that no NULL value exists, you can convert NOT IN to NOT EXISTS to generate hash joins and to improve the query performance.

Before Optimization

Create two base tables t1 and t2.

1
2
CREATE TABLE t1(a int, b int, c int not null) WITH(orientation=row);
CREATE TABLE t2(a int, b int, c int not null) WITH(orientation=row);

Run the following SQL statement to query the NOT IN execution plan:

1
EXPLAIN VERBOSE SELECT * FROM t1 WHERE t1.c NOT IN (SELECT t2.c FROM t2);

The following figure shows the statement output.

According to the returned result, nest loops are used. As the OR operation result of NULL and any value is NULL,

1
t1.c NOT IN (SELECT t2.c FROM t2)

the preceding condition expression is equivalent to:

1
t1.c <> ANY(t2.c) AND t1.c IS NOT NULL AND ANY(t2.c) IS NOT NULL 

After Optimization

The query can be modified as follows:

1
SELECT * FROM t1 WHERE NOT EXISTS (SELECT * FROM t2 WHERE t2.c = t1.c);

Run the following statement to query the execution plan of NOT EXISTS:

1
EXPLAIN VERBOSE SELECT * FROM t1 WHERE NOT EXISTS (SELECT 1 FROM t2 WHERE t2.c = t1.c);