Querying Joined Tables
Join Types
Multiple joins are necessary for accomplishing complex queries. Joins are classified into inner joins and outer joins. Each type of joins have their subtypes.
- Inner join: inner join, cross join, and natural join.
- Outer join: left outer join, right outer join, and full join.
To better illustrate the differences between these joins, the following provides some examples.
Create the sample tables student and math_score and insert data into them. Set enable_fast_query_shipping to off (on by default), that is, the query optimizer uses the distributed framework. Set explain_perf_mode to pretty (default value) to specify the EXPLAIN display format.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 |
CREATE TABLE student( id INTEGER, name varchar(50) ) STORE AS orc; CREATE TABLE math_score( id INTEGER, score INTEGER ) STORE AS orc; INSERT INTO student VALUES(1, 'Tom'); INSERT INTO student VALUES(2, 'Lily'); INSERT INTO student VALUES(3, 'Tina'); INSERT INTO student VALUES(4, 'Perry'); INSERT INTO math_score VALUES(1, 80); INSERT INTO math_score VALUES(2, 75); INSERT INTO math_score VALUES(4, 95); INSERT INTO math_score VALUES(6, NULL); SET enable_fast_query_shipping = off; SET explain_perf_mode = pretty; |
Inner Join
- Inner join
Description: Rows that meet join_condition in both the left and right tables are joined and output. Tuples that do not meet join_condition are not output.
Example 1: Query students' math scores.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41
SELECT s.id, s.name, ms.score FROM student s JOIN math_score ms on s.id = ms.id; id | name | score ----+-------+------- 2 | Lily | 75 1 | Tom | 80 4 | Perry | 95 (3 rows) EXPLAIN SELECT s.id, s.name, ms.score FROM student s JOIN math_score ms on s.id = ms.id; QUERY PLAN -------------------------------------------------------------------------------------------------- id | operation | E-rows | E-memory | E-width | E-costs ----+---------------------------------------------------+--------+----------+---------+--------- 1 | -> Row Adapter | 5000 | | 126 | 435.27 2 | -> Vector Streaming (type: GATHER) | 5000 | | 126 | 435.27 3 | -> Vector Sonic Hash Join (4,6) | 5000 | | 126 | 200.89 4 | -> Vector Streaming(type: BROADCAST) | 2000 | | 122 | 104.64 5 | -> Vector Foreign Scan on student s | 1000 | | 122 | 60.00 6 | -> Vector Foreign Scan on math_score ms | 1000 | | 8 | 60.00 Predicate Information (identified by plan id) --------------------------------------------- 3 --Vector Sonic Hash Join (4,6) Hash Cond: (s.id = ms.id) 5 --Vector Foreign Scan on student s Server Type: lf Total files left: 4 6 --Vector Foreign Scan on math_score ms Server Type: lf Total files left: 4 ====== Query Summary ===== -------------------------------------------------------- System available mem: 1992294KB Query Max mem: 2097152KB Query estimated mem: 2097152KB LakeFormation request time: 290.913 ms, request count: 5 Total billed bytes: 0 bytes Turbo Engine: true (28 rows)
- Cross join
1
left_table CROSS JOIN right_table
Description: Each row in the left table is joined with each row in the right table. The number of final rows is the product of the number of rows on both sides. The product is also called Cartesian product.
Example 2: Cross join of the student table and the math_score table.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53
SELECT s.id, s.name, ms.score FROM student s CROSS JOIN math_score ms order by id; id | name | score ----+-------+------- 1 | Tom | 80 1 | Tom | 95 1 | Tom | 75 1 | Tom | 2 | Lily | 80 2 | Lily | 95 2 | Lily | 75 2 | Lily | 3 | Tina | 80 3 | Tina | 95 3 | Tina | 75 3 | Tina | 4 | Perry | 80 4 | Perry | 95 4 | Perry | 75 4 | Perry | (16 rows) EXPLAIN SELECT s.id, s.name, ms.score FROM student s CROSS JOIN math_score ms order by id; QUERY PLAN --------------------------------------------------------------------------------------------------------------- id | operation | E-rows | E-memory | E-width | E-costs ----+---------------------------------------------------------+---------+---------------+---------+---------- 1 | -> Row Adapter | 1000000 | | 126 | 55463.56 2 | -> Vector Streaming (type: GATHER) | 1000000 | | 126 | 55463.56 3 | -> Vector Sort | 1000000 | 518MB(2043MB) | 126 | 54994.81 4 | -> Vector Nest Loop (5,7) | 1000000 | 1MB | 126 | 6415.89 5 | -> Vector Streaming(type: BROADCAST) | 2000 | 2MB | 122 | 104.64 6 | -> Vector Foreign Scan on student s | 1000 | 1MB | 122 | 60.00 7 | -> Vector Materialize | 1000 | 16MB | 4 | 62.50 8 | -> Vector Foreign Scan on math_score ms | 1000 | 1MB | 4 | 60.00 Predicate Information (identified by plan id) --------------------------------------------- 6 --Vector Foreign Scan on student s Server Type: lf Total files left: 4 8 --Vector Foreign Scan on math_score ms Server Type: lf Total files left: 4 ====== Query Summary ===== -------------------------------------------------------- System available mem: 1992294KB Query Max mem: 2097152KB Query estimated mem: 2097152KB LakeFormation request time: 210.501 ms, request count: 5 Total billed bytes: 0 bytes Turbo Engine: true (28 rows)
- Natural join
1
left_table NATURAL JOIN right_table
Description: Columns with the same name in left table and right table are joined by equi-join, and the columns with the same name are merged into one column.
Example 3: Natural join of the student table and the math_scores table, where the columns with the same name in both tables are the id columns, will be joined on equal values of the id column.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41
SELECT * FROM student s NATURAL JOIN math_score ms order by id; id | name | score ----+-------+------- 1 | Tom | 80 2 | Lily | 75 4 | Perry | 95 (3 rows) EXPLAIN SELECT * FROM student s NATURAL JOIN math_score ms order by id; QUERY PLAN ----------------------------------------------------------------------------------------------------- id | operation | E-rows | E-memory | E-width | E-costs ----+------------------------------------------------------+--------+----------+---------+--------- 1 | -> Row Adapter | 5000 | | 126 | 582.61 2 | -> Vector Streaming (type: GATHER) | 5000 | | 126 | 582.61 3 | -> Vector Sort | 5000 | | 126 | 348.24 4 | -> Vector Sonic Hash Join (5,7) | 5000 | | 126 | 200.89 5 | -> Vector Streaming(type: BROADCAST) | 2000 | | 122 | 104.64 6 | -> Vector Foreign Scan on student s | 1000 | | 122 | 60.00 7 | -> Vector Foreign Scan on math_score ms | 1000 | | 8 | 60.00 Predicate Information (identified by plan id) --------------------------------------------- 4 --Vector Sonic Hash Join (5,7) Hash Cond: (s.id = ms.id) 6 --Vector Foreign Scan on student s Server Type: lf Total files left: 4 7 --Vector Foreign Scan on math_score ms Server Type: lf Total files left: 4 ====== Query Summary ===== -------------------------------------------------------- System available mem: 1992294KB Query Max mem: 2097152KB Query estimated mem: 2097152KB LakeFormation request time: 377.624 ms, request count: 5 Total billed bytes: 0 bytes Turbo Engine: true (29 rows)
Outer Join
- Left Join
1
left_table LEFT [OUTER] JOIN right_table [ ON join_condition | USING ( join_column )]
Description: The result set of a left outer join includes all rows of left table, not only the joined rows. If a row in the left table does not match any row in the right table, the row will be NULL in the result set.
Example 4: A left outer join is performed between the student table and the math_scores table. In the result set, for the row in the student table where the id is 3, the corresponding data from the right table will be filled with NULL.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42
SELECT s.id, s.name, ms.score FROM student s LEFT JOIN math_score ms on (s.id = ms.id) order by id; id | name | score ----+-------+------- 1 | Tom | 80 2 | Lily | 75 3 | Tina | 4 | Perry | 95 (4 rows) EXPLAIN SELECT s.id, s.name, ms.score FROM student s LEFT JOIN math_score ms on (s.id = ms.id) order by id; QUERY PLAN -------------------------------------------------------------------------------------------------------- id | operation | E-rows | E-memory | E-width | E-costs ----+---------------------------------------------------------+--------+----------+---------+--------- 1 | -> Row Adapter | 5000 | | 126 | 582.61 2 | -> Vector Streaming (type: GATHER) | 5000 | | 126 | 582.61 3 | -> Vector Sort | 5000 | | 126 | 348.24 4 | -> Vector Sonic Hash Right Join (5, 7) | 5000 | | 126 | 200.89 5 | -> Vector Streaming(type: BROADCAST) | 2000 | | 8 | 104.64 6 | -> Vector Foreign Scan on math_score ms | 1000 | | 8 | 60.00 7 | -> Vector Foreign Scan on student s | 1000 | | 122 | 60.00 Predicate Information (identified by plan id) --------------------------------------------- 4 --Vector Sonic Hash Right Join (5, 7) Hash Cond: (ms.id = s.id) 6 --Vector Foreign Scan on math_score ms Server Type: lf Total files left: 4 7 --Vector Foreign Scan on student s Server Type: lf Total files left: 4 ====== Query Summary ===== --------------------------------------------------------- System available mem: 1992294KB Query Max mem: 2097152KB Query estimated mem: 2097152KB LakeFormation request time: 1253.487 ms, request count: 5 Total billed bytes: 0 bytes Turbo Engine: true (29 rows)
- Right join
1
left_table RIGHT [OUTER] JOIN right_table [ ON join_condition | USING ( join_column )]
Description: Contrary to left outer join, the result set of right outer join includes all rows of the right table, not only the rows that match the join column. If a row in the right table does not match any row in the left table, all columns in the left table in the associated result set row are null.
Example 5: A right outer join is performed between the student table and the math_scores table. In the result set, for the row in the math_scores table where the id is 6, the corresponding data from the left table will be filled with NULL.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42
SELECT ms.id, s.name, ms.score FROM student s RIGHT JOIN math_score ms on (s.id = ms.id) order by id; id | name | score ----+-------+------- 1 | Tom | 80 2 | Lily | 75 4 | Perry | 95 6 | | (4 rows) EXPLAIN SELECT ms.id, s.name, ms.score FROM student s RIGHT JOIN math_score ms on (s.id = ms.id) order by id; QUERY PLAN ----------------------------------------------------------------------------------------------------- id | operation | E-rows | E-memory | E-width | E-costs ----+------------------------------------------------------+--------+----------+---------+--------- 1 | -> Row Adapter | 5000 | | 126 | 582.61 2 | -> Vector Streaming (type: GATHER) | 5000 | | 126 | 582.61 3 | -> Vector Sort | 5000 | | 126 | 348.24 4 | -> Vector Sonic Hash Right Join (5, 7) | 5000 | | 126 | 200.89 5 | -> Vector Streaming(type: BROADCAST) | 2000 | | 122 | 104.64 6 | -> Vector Foreign Scan on student s | 1000 | | 122 | 60.00 7 | -> Vector Foreign Scan on math_score ms | 1000 | | 8 | 60.00 Predicate Information (identified by plan id) --------------------------------------------- 4 --Vector Sonic Hash Right Join (5, 7) Hash Cond: (s.id = ms.id) 6 --Vector Foreign Scan on student s Server Type: lf Total files left: 4 7 --Vector Foreign Scan on math_score ms Server Type: lf Total files left: 4 ====== Query Summary ===== -------------------------------------------------------- System available mem: 1992294KB Query Max mem: 2097152KB Query estimated mem: 2097152KB LakeFormation request time: 214.389 ms, request count: 5 Total billed bytes: 0 bytes Turbo Engine: true (29 rows)
In a right join, Left is displayed in the join operator. This is because a right join is actually the process replacing the left table with the right table then performing left join.
- Full join
1
left_table FULL [OUTER] JOIN right_table [ ON join_condition | USING ( join_column )]
Description: A full join is a combination of a left outer join and a right outer join. The result set of a full outer join includes all rows of the left table and the right table, not just the joined rows. If a row in the left table does not match any row in the right table, all columns in the right table in the associated result set row are null. If a row in the right table does not match any row in the left table, all columns in the left table in the associated result set row are null.
Example 6: A full outer join is performed between the student table and the math_scores table. In the result set, for the row in the student table where the id is 3, the corresponding data from the right table will be filled with NULL. For the row in the math_scores table where the id is 6, the corresponding data from the left table will be filled with NULL.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44
SELECT s.id, s.name, ms.id, ms.score FROM student s FULL JOIN math_score ms ON (s.id = ms.id) order by s.id; id | name | id | score ----+-------+----+------- 1 | Tom | 1 | 80 2 | Lily | 2 | 75 3 | Tina | | 4 | Perry | 4 | 95 | | 6 | (5 rows) EXPLAIN SELECT s.id, s.name, ms.id, ms.score FROM student s FULL JOIN math_score ms ON (s.id = ms.id) order by s.id; QUERY PLAN -------------------------------------------------------------------------------------------------------- id | operation | E-rows | E-memory | E-width | E-costs ----+---------------------------------------------------------+--------+----------+---------+--------- 1 | -> Row Adapter | 5000 | | 130 | 734.79 2 | -> Vector Streaming (type: GATHER) | 5000 | | 130 | 734.79 3 | -> Vector Sort | 5000 | | 130 | 434.86 4 | -> Vector Sonic Hash Full Join (5, 7) | 5000 | | 130 | 243.95 5 | -> Vector Streaming(type: REDISTRIBUTE) | 1000 | | 122 | 103.57 6 | -> Vector Foreign Scan on student s | 1000 | | 122 | 60.00 7 | -> Vector Streaming(type: REDISTRIBUTE) | 1000 | | 8 | 103.57 8 | -> Vector Foreign Scan on math_score ms | 1000 | | 8 | 60.00 Predicate Information (identified by plan id) --------------------------------------------- 4 --Vector Sonic Hash Full Join (5, 7) Hash Cond: (s.id = ms.id) 6 --Vector Foreign Scan on student s Server Type: lf Total files left: 4 8 --Vector Foreign Scan on math_score ms Server Type: lf Total files left: 4 ====== Query Summary ===== -------------------------------------------------------- System available mem: 1992294KB Query Max mem: 2097152KB Query estimated mem: 2097152KB LakeFormation request time: 206.619 ms, request count: 5 Total billed bytes: 0 bytes Turbo Engine: true (30 rows)
Differences Between the ON Condition and the WHERE Condition in Multi-Table Query
According to the preceding join syntax, except natural join and cross join, the ON condition (USING is converted to the ON condition during query parsing) is used on the join result of both the two tables. Generally, the WHERE condition is used in the query statement to restrict the query result. The ON join condition and WHERE filter condition do not contain conditions that can be pushed down to tables. The differences between ON and WHERE are as follows:
- The ON condition is used for joining two tables.
- WHERE is used to filter the result set.
To sum up, the ON condition is used when two tables are joined. After the join result set of two tables is generated, the WHERE condition is used.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot