Updated on 2025-03-13 GMT+08:00

Hash Join

Description

A hash join is an efficient join method that depends on the hash technology. During a hash join, GaussDB selects one of the two tables (usually small tables) and creates a hash table based on join conditions. The key of the hash table is the join column of the small table, and the other columns of the small table are the value. Then, the hash value of the join column is calculated for each row in the large table. After this, the hash table is searched for a matching row.

The time complexity of a hash join is O(n + m), where n and m indicate the numbers of rows in the two tables. However, the performance may degrade if the inner table is too large for the memory storage, as it will require additional disk I/O operations.

Typical Scenarios

  • The number of rows in two tables differ significantly and the join operation is performed.
  • The join columns of the two tables are equi-join (for example, using the = operator).

Examples

Example: The number of rows in two tables differ significantly and the join operation is performed.

-- Prepare data.
gaussdb=# CREATE TABLE t1(c1 number,c2 number,c3 number); 
CREATE TABLE 
gaussdb=# CREATE TABLE t2(c1 number,c2 number,c3 number); 
CREATE TABLE 
gaussdb=# INSERT INTO t1 VALUES(generate_series(1,100000), 2 ,3); 
INSERT 0 100000 
gaussdb=# INSERT INTO t2 VALUES(generate_series(1,100), 2, 3); 
INSERT 0 100

-- Execution result.
gaussdb=#  EXPLAIN SELECT * FROM t1 JOIN t2 ON t1.c1 = t2.c2; 
                            QUERY PLAN                             
------------------------------------------------------------------ 
 Hash Join  (cost=24.58..1823.14 rows=90944 width=192) 
   Hash Cond: (t1.c1 = t2.c2) 
   ->  Seq Scan on t1  (cost=0.00..713.69 rows=28069 width=96) 
   ->  Hash  (cost=16.48..16.48 rows=648 width=96) 
         ->  Seq Scan on t2  (cost=0.00..16.48 rows=648 width=96) 
(5 rows)

-- Drop.
gaussdb=# DROP TABLE t1,t2;

In the preceding example, the output of the hash join operator is as follows.

Item

Description

Hash Join

Operator name.

Hash Cond

Join predicate of the operator hash join. In the example, the condition is that t1.c1 is equal to t2.c2. During query execution, rows that meet these conditions are included in the final result set.

Hash

Operator for creating a hash table in an inner table.