多表连接查询
连接类型介绍
通过SQL完成各种复杂的查询,多表之间的连接是必不可少的。连接分为:内连接和外连接两大类,每大类中还可进行细分。
- 内连接:标准内连接(INNER JOIN),交叉连接(CROSS JOIN)和自然连接(NATURAL JOIN)。
- 外连接:左外连接(LEFT OUTER JOIN),右外连接(RIGHT OUTER JOIN)和全外连接(FULL JOIN)。
为了能更好的说明各种连接之间的区别,下面通过具体示例进行详细的阐述。
创建示例表student和math_score,并插入数据,设置enable_fast_query_shipping为off(默认为on)即查询优化器使用分布式框架;参数explain_perf_mode为pretty(默认值为pretty)指定explain的显示格式。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 |
CREATE TABLE student( id INTEGER, name varchar(50) ) STORE AS orc; CREATE TABLE math_score( id INTEGER, score INTEGER ) STORE AS orc; INSERT INTO student VALUES(1, 'Tom'); INSERT INTO student VALUES(2, 'Lily'); INSERT INTO student VALUES(3, 'Tina'); INSERT INTO student VALUES(4, 'Perry'); INSERT INTO math_score VALUES(1, 80); INSERT INTO math_score VALUES(2, 75); INSERT INTO math_score VALUES(4, 95); INSERT INTO math_score VALUES(6, NULL); SET enable_fast_query_shipping = off; SET explain_perf_mode = pretty; |
内连接
- 标准内连接(INNER JOIN)
说明:表示left_table和right_table中满足join_condition的行拼接在一起作为结果输出,不满足条件的元组不会输出。
示例1:查询学生的数学成绩。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41
SELECT s.id, s.name, ms.score FROM student s JOIN math_score ms on s.id = ms.id; id | name | score ----+-------+------- 2 | Lily | 75 1 | Tom | 80 4 | Perry | 95 (3 rows) EXPLAIN SELECT s.id, s.name, ms.score FROM student s JOIN math_score ms on s.id = ms.id; QUERY PLAN -------------------------------------------------------------------------------------------------- id | operation | E-rows | E-memory | E-width | E-costs ----+---------------------------------------------------+--------+----------+---------+--------- 1 | -> Row Adapter | 5000 | | 126 | 435.27 2 | -> Vector Streaming (type: GATHER) | 5000 | | 126 | 435.27 3 | -> Vector Sonic Hash Join (4,6) | 5000 | | 126 | 200.89 4 | -> Vector Streaming(type: BROADCAST) | 2000 | | 122 | 104.64 5 | -> Vector Foreign Scan on student s | 1000 | | 122 | 60.00 6 | -> Vector Foreign Scan on math_score ms | 1000 | | 8 | 60.00 Predicate Information (identified by plan id) --------------------------------------------- 3 --Vector Sonic Hash Join (4,6) Hash Cond: (s.id = ms.id) 5 --Vector Foreign Scan on student s Server Type: lf Total files left: 4 6 --Vector Foreign Scan on math_score ms Server Type: lf Total files left: 4 ====== Query Summary ===== -------------------------------------------------------- System available mem: 1992294KB Query Max mem: 2097152KB Query estimated mem: 2097152KB LakeFormation request time: 290.913 ms, request count: 5 Total billed bytes: 0 bytes Turbo Engine: true (28 rows)
- 交叉连接(CROSS JOIN)
1
left_table CROSS JOIN right_table
说明:表示left_table中所有行和right_table中的所有行分别进行连接,最终结果行数等于两边行数的乘积。又称笛卡尔积。
示例2:学生表和数学成绩表的交叉连接。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53
SELECT s.id, s.name, ms.score FROM student s CROSS JOIN math_score ms order by id; id | name | score ----+-------+------- 1 | Tom | 80 1 | Tom | 95 1 | Tom | 75 1 | Tom | 2 | Lily | 80 2 | Lily | 95 2 | Lily | 75 2 | Lily | 3 | Tina | 80 3 | Tina | 95 3 | Tina | 75 3 | Tina | 4 | Perry | 80 4 | Perry | 95 4 | Perry | 75 4 | Perry | (16 rows) EXPLAIN SELECT s.id, s.name, ms.score FROM student s CROSS JOIN math_score ms order by id; QUERY PLAN --------------------------------------------------------------------------------------------------------------- id | operation | E-rows | E-memory | E-width | E-costs ----+---------------------------------------------------------+---------+---------------+---------+---------- 1 | -> Row Adapter | 1000000 | | 126 | 55463.56 2 | -> Vector Streaming (type: GATHER) | 1000000 | | 126 | 55463.56 3 | -> Vector Sort | 1000000 | 518MB(2043MB) | 126 | 54994.81 4 | -> Vector Nest Loop (5,7) | 1000000 | 1MB | 126 | 6415.89 5 | -> Vector Streaming(type: BROADCAST) | 2000 | 2MB | 122 | 104.64 6 | -> Vector Foreign Scan on student s | 1000 | 1MB | 122 | 60.00 7 | -> Vector Materialize | 1000 | 16MB | 4 | 62.50 8 | -> Vector Foreign Scan on math_score ms | 1000 | 1MB | 4 | 60.00 Predicate Information (identified by plan id) --------------------------------------------- 6 --Vector Foreign Scan on student s Server Type: lf Total files left: 4 8 --Vector Foreign Scan on math_score ms Server Type: lf Total files left: 4 ====== Query Summary ===== -------------------------------------------------------- System available mem: 1992294KB Query Max mem: 2097152KB Query estimated mem: 2097152KB LakeFormation request time: 210.501 ms, request count: 5 Total billed bytes: 0 bytes Turbo Engine: true (28 rows)
- 自然连接(NATURAL JOIN)
1
left_table NATURAL JOIN right_table
说明:表示left_table和right_table中列名相同的列进行等值连接,且自动将同名列只保留一份。
示例3:学生表和数学成绩表的自然连接,两表的同名列为id列,将按照id列进行等值连接。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41
SELECT * FROM student s NATURAL JOIN math_score ms order by id; id | name | score ----+-------+------- 1 | Tom | 80 2 | Lily | 75 4 | Perry | 95 (3 rows) EXPLAIN SELECT * FROM student s NATURAL JOIN math_score ms order by id; QUERY PLAN ----------------------------------------------------------------------------------------------------- id | operation | E-rows | E-memory | E-width | E-costs ----+------------------------------------------------------+--------+----------+---------+--------- 1 | -> Row Adapter | 5000 | | 126 | 582.61 2 | -> Vector Streaming (type: GATHER) | 5000 | | 126 | 582.61 3 | -> Vector Sort | 5000 | | 126 | 348.24 4 | -> Vector Sonic Hash Join (5,7) | 5000 | | 126 | 200.89 5 | -> Vector Streaming(type: BROADCAST) | 2000 | | 122 | 104.64 6 | -> Vector Foreign Scan on student s | 1000 | | 122 | 60.00 7 | -> Vector Foreign Scan on math_score ms | 1000 | | 8 | 60.00 Predicate Information (identified by plan id) --------------------------------------------- 4 --Vector Sonic Hash Join (5,7) Hash Cond: (s.id = ms.id) 6 --Vector Foreign Scan on student s Server Type: lf Total files left: 4 7 --Vector Foreign Scan on math_score ms Server Type: lf Total files left: 4 ====== Query Summary ===== -------------------------------------------------------- System available mem: 1992294KB Query Max mem: 2097152KB Query estimated mem: 2097152KB LakeFormation request time: 377.624 ms, request count: 5 Total billed bytes: 0 bytes Turbo Engine: true (29 rows)
外连接
- 左外连接(LEFT JOIN)
1
left_table LEFT [OUTER] JOIN right_table [ ON join_condition | USING ( join_column )]
说明:左外连接的结果集包括left_table的所有行,而不仅是连接列所匹配的行。如果left_table的某行在right_table中没有匹配行,则在相关联的结果集行中输出right_table的列均为空值。
示例4:学生表和数学成绩表进行左外连接,学生表中id为3的行在结果集中对应的右表数据用NULL填充。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42
SELECT s.id, s.name, ms.score FROM student s LEFT JOIN math_score ms on (s.id = ms.id) order by id; id | name | score ----+-------+------- 1 | Tom | 80 2 | Lily | 75 3 | Tina | 4 | Perry | 95 (4 rows) EXPLAIN SELECT s.id, s.name, ms.score FROM student s LEFT JOIN math_score ms on (s.id = ms.id) order by id; QUERY PLAN -------------------------------------------------------------------------------------------------------- id | operation | E-rows | E-memory | E-width | E-costs ----+---------------------------------------------------------+--------+----------+---------+--------- 1 | -> Row Adapter | 5000 | | 126 | 582.61 2 | -> Vector Streaming (type: GATHER) | 5000 | | 126 | 582.61 3 | -> Vector Sort | 5000 | | 126 | 348.24 4 | -> Vector Sonic Hash Right Join (5, 7) | 5000 | | 126 | 200.89 5 | -> Vector Streaming(type: BROADCAST) | 2000 | | 8 | 104.64 6 | -> Vector Foreign Scan on math_score ms | 1000 | | 8 | 60.00 7 | -> Vector Foreign Scan on student s | 1000 | | 122 | 60.00 Predicate Information (identified by plan id) --------------------------------------------- 4 --Vector Sonic Hash Right Join (5, 7) Hash Cond: (ms.id = s.id) 6 --Vector Foreign Scan on math_score ms Server Type: lf Total files left: 4 7 --Vector Foreign Scan on student s Server Type: lf Total files left: 4 ====== Query Summary ===== --------------------------------------------------------- System available mem: 1992294KB Query Max mem: 2097152KB Query estimated mem: 2097152KB LakeFormation request time: 1253.487 ms, request count: 5 Total billed bytes: 0 bytes Turbo Engine: true (29 rows)
- 右外连接(RIGHT JOIN)
1
left_table RIGHT [OUTER] JOIN right_table [ ON join_condition | USING ( join_column )]
说明:与左外连接相反,右外连接的结果集包括right_table的所有行,而不仅是连接列所匹配的行。如果right_table的某行在left_table中没有匹配行,则在相关联的结果集行中left_table的列均为空值。
示例5:学生表和数学成绩表进行右外连接,数学成绩表中id为6的行在结果集中对应的左表数据用NULL填充。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42
SELECT ms.id, s.name, ms.score FROM student s RIGHT JOIN math_score ms on (s.id = ms.id) order by id; id | name | score ----+-------+------- 1 | Tom | 80 2 | Lily | 75 4 | Perry | 95 6 | | (4 rows) EXPLAIN SELECT ms.id, s.name, ms.score FROM student s RIGHT JOIN math_score ms on (s.id = ms.id) order by id; QUERY PLAN ----------------------------------------------------------------------------------------------------- id | operation | E-rows | E-memory | E-width | E-costs ----+------------------------------------------------------+--------+----------+---------+--------- 1 | -> Row Adapter | 5000 | | 126 | 582.61 2 | -> Vector Streaming (type: GATHER) | 5000 | | 126 | 582.61 3 | -> Vector Sort | 5000 | | 126 | 348.24 4 | -> Vector Sonic Hash Right Join (5, 7) | 5000 | | 126 | 200.89 5 | -> Vector Streaming(type: BROADCAST) | 2000 | | 122 | 104.64 6 | -> Vector Foreign Scan on student s | 1000 | | 122 | 60.00 7 | -> Vector Foreign Scan on math_score ms | 1000 | | 8 | 60.00 Predicate Information (identified by plan id) --------------------------------------------- 4 --Vector Sonic Hash Right Join (5, 7) Hash Cond: (s.id = ms.id) 6 --Vector Foreign Scan on student s Server Type: lf Total files left: 4 7 --Vector Foreign Scan on math_score ms Server Type: lf Total files left: 4 ====== Query Summary ===== -------------------------------------------------------- System available mem: 1992294KB Query Max mem: 2097152KB Query estimated mem: 2097152KB LakeFormation request time: 214.389 ms, request count: 5 Total billed bytes: 0 bytes Turbo Engine: true (29 rows)
对于右外连接,在join算子中却显示的是Left。这是因为,右外连接其实就是将左右表进行交互后的左外连接,所以数据库内部实现为了减少处理逻辑,会将右外连接转为左外连接。
- 全外连接(FULL JOIN)
1
left_table FULL [OUTER] JOIN right_table [ ON join_condition | USING ( join_column )]
说明:全外连接是左外连接和右外连接的综合。全外连接的结果集包括left_table和right_table的所有行,而不仅是连接列所匹配的行。如果left_table的某行在right_table中没有匹配行,则在相关联的结果集行中right_table的列均为空值。如果right_table的某行在left_table中没有匹配行,则在相关联的结果集行中left_table的列均为空值。
示例6:学生表和数学成绩表进行全外连接,学生表中id为3的行在结果集中对应的右表数据用NULL填充,数学成绩表中id为6的行在结果集中对应的左表数据用NULL填充。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44
SELECT s.id, s.name, ms.id, ms.score FROM student s FULL JOIN math_score ms ON (s.id = ms.id) order by s.id; id | name | id | score ----+-------+----+------- 1 | Tom | 1 | 80 2 | Lily | 2 | 75 3 | Tina | | 4 | Perry | 4 | 95 | | 6 | (5 rows) EXPLAIN SELECT s.id, s.name, ms.id, ms.score FROM student s FULL JOIN math_score ms ON (s.id = ms.id) order by s.id; QUERY PLAN -------------------------------------------------------------------------------------------------------- id | operation | E-rows | E-memory | E-width | E-costs ----+---------------------------------------------------------+--------+----------+---------+--------- 1 | -> Row Adapter | 5000 | | 130 | 734.79 2 | -> Vector Streaming (type: GATHER) | 5000 | | 130 | 734.79 3 | -> Vector Sort | 5000 | | 130 | 434.86 4 | -> Vector Sonic Hash Full Join (5, 7) | 5000 | | 130 | 243.95 5 | -> Vector Streaming(type: REDISTRIBUTE) | 1000 | | 122 | 103.57 6 | -> Vector Foreign Scan on student s | 1000 | | 122 | 60.00 7 | -> Vector Streaming(type: REDISTRIBUTE) | 1000 | | 8 | 103.57 8 | -> Vector Foreign Scan on math_score ms | 1000 | | 8 | 60.00 Predicate Information (identified by plan id) --------------------------------------------- 4 --Vector Sonic Hash Full Join (5, 7) Hash Cond: (s.id = ms.id) 6 --Vector Foreign Scan on student s Server Type: lf Total files left: 4 8 --Vector Foreign Scan on math_score ms Server Type: lf Total files left: 4 ====== Query Summary ===== -------------------------------------------------------- System available mem: 1992294KB Query Max mem: 2097152KB Query estimated mem: 2097152KB LakeFormation request time: 206.619 ms, request count: 5 Total billed bytes: 0 bytes Turbo Engine: true (30 rows)
多表查询中on条件和where条件的区别
从上面各种连接语法中可见,除自然连接和交叉连接外,其他都需要有on条件(using在查询解析过程中会被转为on条件)来限制两表连接的结果。通常在查询的语句中也都会有where条件限制查询结果。这里说的on连接条件和where过滤条件是指不含可以下推到表上的过滤条件。on和where的区别是:
- on条件是两表连接的约束条件。
- where是对两表连接后产生的结果集再次进行过滤。
简单总结就是:on条件优先于where条件,在两表进行连接时被应用;生成两表连接结果集后,再应用where条件。