Hash Join
Description
A hash join is an efficient join method that depends on the hash technology. During a hash join, GaussDB selects one of the two tables (usually small tables) and creates a hash table based on join conditions. The key of the hash table is the join column of the small table, and the other columns of the small table are the value. Then, the hash value of the join column is calculated for each row in the large table. After this, the hash table is searched for a matching row.
The time complexity of a hash join is O(n + m), where n and m indicate the numbers of rows in the two tables. However, the performance may degrade if the inner table is too large for the memory storage, as it will require additional disk I/O operations.
Typical Scenarios
- The number of rows in two tables differ significantly and the join operation is performed.
- The join columns of the two tables are equi-join (for example, using the = operator).
Examples
Example: The number of rows in two tables differ significantly and the join operation is performed.
-- Prepare data. gaussdb=# CREATE TABLE t1(c1 number,c2 number,c3 number); CREATE TABLE gaussdb=# CREATE TABLE t2(c1 number,c2 number,c3 number); CREATE TABLE gaussdb=# INSERT INTO t1 VALUES(generate_series(1,100000), 2 ,3); INSERT 0 100000 gaussdb=# INSERT INTO t2 VALUES(generate_series(1,100), 2, 3); INSERT 0 100 -- Execution result. gaussdb=# EXPLAIN SELECT * FROM t1 JOIN t2 ON t1.c1 = t2.c2; QUERY PLAN ------------------------------------------------------------------ Hash Join (cost=24.58..1823.14 rows=90944 width=192) Hash Cond: (t1.c1 = t2.c2) -> Seq Scan on t1 (cost=0.00..713.69 rows=28069 width=96) -> Hash (cost=16.48..16.48 rows=648 width=96) -> Seq Scan on t2 (cost=0.00..16.48 rows=648 width=96) (5 rows) -- Drop. gaussdb=# DROP TABLE t1,t2;
In the preceding example, the output of the hash join operator is as follows.
Item |
Description |
---|---|
Hash Join |
Operator name. |
Hash Cond |
Join predicate of the operator hash join. In the example, the condition is that t1.c1 is equal to t2.c2. During query execution, rows that meet these conditions are included in the final result set. |
Hash |
Operator for creating a hash table in an inner table. |
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot