GaussDB(DWS)多表连接查询
连接类型介绍
通过SQL完成各种复杂的查询,多表之间的连接是必不可少的。连接分为:内连接和外连接两大类,每大类中还可进行细分。
- 内连接:标准内连接(INNER JOIN),交叉连接(CROSS JOIN)和自然连接(NATURAL JOIN)。
- 外连接:左外连接(LEFT OUTER JOIN),右外连接(RIGHT OUTER JOIN)和全外连接(FULL JOIN)。
为了能更好的说明各种连接之间的区别,下面通过具体示例进行详细的阐述。
创建示例表student和math_score,并插入数据,设置enable_fast_query_shipping为off(默认为on)即查询优化器使用分布式框架;参数explain_perf_mode为pretty(默认值为pretty)指定explain的显示格式。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 |
CREATE TABLE student( id INTEGER, name varchar(50) ); CREATE TABLE math_score( id INTEGER, score INTEGER ); INSERT INTO student VALUES(1, 'Tom'); INSERT INTO student VALUES(2, 'Lily'); INSERT INTO student VALUES(3, 'Tina'); INSERT INTO student VALUES(4, 'Perry'); INSERT INTO math_score VALUES(1, 80); INSERT INTO math_score VALUES(2, 75); INSERT INTO math_score VALUES(4, 95); INSERT INTO math_score VALUES(6, NULL); SET enable_fast_query_shipping = off; SET explain_perf_mode = pretty; |
内连接
- 标准内连接(INNER JOIN)
说明:表示left_table和right_table中满足join_condition的行拼接在一起作为结果输出,不满足条件的元组不会输出。
示例1:查询学生的数学成绩。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
SELECT s.id, s.name, ms.score FROM student s JOIN math_score ms on s.id = ms.id; id | name | score ----+-------+------- 2 | Lily | 75 1 | Tom | 80 4 | Perry | 95 (3 rows) EXPLAIN SELECT s.id, s.name, ms.score FROM student s JOIN math_score ms on s.id = ms.id; QUERY PLAN ---------------------------------------------------------------------------------------- id | operation | E-rows | E-memory | E-width | E-costs ----+-----------------------------------------+--------+----------+---------+--------- 1 | -> Streaming (type: GATHER) | 4 | | 13 | 19.47 2 | -> Hash Join (3,4) | 4 | 1MB | 13 | 11.47 3 | -> Seq Scan on math_score ms | 30 | 1MB | 8 | 10.10 4 | -> Hash | 12 | 16MB | 9 | 1.28 5 | -> Streaming(type: BROADCAST) | 12 | 2MB | 9 | 1.28 6 | -> Seq Scan on student s | 4 | 1MB | 9 | 1.01 Predicate Information (identified by plan id) --------------------------------------------- 2 --Hash Join (3,4) Hash Cond: (ms.id = s.id) ====== Query Summary ===== ------------------------------- System available mem: 1761280KB Query Max mem: 1761280KB Query estimated mem: 4400KB (19 rows)
- 交叉连接(CROSS JOIN)
1
left_table CROSS JOIN right_table
说明:表示left_table中所有行和right_table中的所有行分别进行连接,最终结果行数等于两边行数的乘积。又称笛卡尔积。
示例2:学生表和数学成绩表的交叉连接。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39
SELECT s.id, s.name, ms.score FROM student s CROSS JOIN math_score ms; id | name | score ----+-------+------- 3 | Tina | 80 2 | Lily | 80 1 | Tom | 80 4 | Perry | 80 3 | Tina | 2 | Lily | 1 | Tom | 4 | Perry | 3 | Tina | 95 2 | Lily | 95 1 | Tom | 95 4 | Perry | 95 2 | Lily | 75 3 | Tina | 75 1 | Tom | 75 4 | Perry | 75 (16 rows) EXPLAIN SELECT s.id, s.name, ms.score FROM student s CROSS JOIN math_score ms; QUERY PLAN ---------------------------------------------------------------------------------------- id | operation | E-rows | E-memory | E-width | E-costs ----+-----------------------------------------+--------+----------+---------+--------- 1 | -> Streaming (type: GATHER) | 120 | | 13 | 19.89 2 | -> Nested Loop (3,4) | 120 | 1MB | 13 | 11.89 3 | -> Seq Scan on math_score ms | 30 | 1MB | 4 | 10.10 4 | -> Materialize | 12 | 16MB | 9 | 1.30 5 | -> Streaming(type: BROADCAST) | 12 | 2MB | 9 | 1.28 6 | -> Seq Scan on student s | 4 | 1MB | 9 | 1.01 ====== Query Summary ===== ------------------------------- System available mem: 1761280KB Query Max mem: 1761280KB Query estimated mem: 4144KB (14 rows)
- 自然连接(NATURAL JOIN)
1
left_table NATURAL JOIN right_table
说明:表示left_table和right_table中列名相同的列进行等值连接,且自动将同名列只保留一份。
示例3:学生表和数学成绩表的自然连接,两表的同名列为id列,将按照id列进行等值连接。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
SELECT * FROM student s NATURAL JOIN math_score ms; id | name | score ----+-------+------- 1 | Tom | 80 4 | Perry | 95 2 | Lily | 75 (3 rows) EXPLAIN SELECT * FROM student s NATURAL JOIN math_score ms; QUERY PLAN ---------------------------------------------------------------------------------------- id | operation | E-rows | E-memory | E-width | E-costs ----+-----------------------------------------+--------+----------+---------+--------- 1 | -> Streaming (type: GATHER) | 4 | | 13 | 19.47 2 | -> Hash Join (3,4) | 4 | 1MB | 13 | 11.47 3 | -> Seq Scan on math_score ms | 30 | 1MB | 8 | 10.10 4 | -> Hash | 12 | 16MB | 9 | 1.28 5 | -> Streaming(type: BROADCAST) | 12 | 2MB | 9 | 1.28 6 | -> Seq Scan on student s | 4 | 1MB | 9 | 1.01 Predicate Information (identified by plan id) --------------------------------------------- 2 --Hash Join (3,4) Hash Cond: (ms.id = s.id) ====== Query Summary ===== ------------------------------- System available mem: 1761280KB Query Max mem: 1761280KB Query estimated mem: 4400KB (19 rows)
外连接
- 左外连接(LEFT JOIN)
1
left_table LEFT [OUTER] JOIN right_table [ ON join_condition | USING ( join_column )]
说明:左外连接的结果集包括left_table的所有行,而不仅是连接列所匹配的行。如果left_table的某行在right_table中没有匹配行,则在相关联的结果集行中输出right_table的列均为空值。
示例4:学生表和数学成绩表进行左外连接,学生表中id为3的行在结果集中对应的右表数据用NULL填充。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33
SELECT s.id, s.name, ms.score FROM student s LEFT JOIN math_score ms on (s.id = ms.id); id | name | score ----+-------+------- 3 | Tina | 1 | Tom | 80 2 | Lily | 75 4 | Perry | 95 (4 rows) EXPLAIN SELECT s.id, s.name, ms.score FROM student s LEFT JOIN math_score ms on (s.id = ms.id); QUERY PLAN ------------------------------------------------------------------------------------------- id | operation | E-rows | E-memory | E-width | E-costs ----+--------------------------------------------+--------+----------+---------+--------- 1 | -> Streaming (type: GATHER) | 4 | | 13 | 10.26 2 | -> Hash Left Join (3, 5) | 4 | 1MB | 13 | 2.26 3 | -> Streaming(type: REDISTRIBUTE) | 4 | 2MB | 9 | 1.11 4 | -> Seq Scan on student s | 4 | 1MB | 9 | 1.01 5 | -> Hash | 4 | 16MB | 8 | 1.11 6 | -> Streaming(type: REDISTRIBUTE) | 4 | 2MB | 8 | 1.11 7 | -> Seq Scan on math_score ms | 4 | 1MB | 8 | 1.01 Predicate Information (identified by plan id) --------------------------------------------- 2 --Hash Left Join (3, 5) Hash Cond: (s.id = ms.id) ====== Query Summary ===== ------------------------------ System available mem: 901120KB Query Max mem: 901120KB Query estimated mem: 7520KB (20 rows)
- 右外连接(RIGHT JOIN)
1
left_table RIGHT [OUTER] JOIN right_table [ ON join_condition | USING ( join_column )]
说明:与左外连接相反,右外连接的结果集包括right_table的所有行,而不仅是连接列所匹配的行。如果right_table的某行在left_table中没有匹配行,则在相关联的结果集行中left_table的列均为空值。
示例5:学生表和数学成绩表进行右外连接,数学成绩表中id为6的行在结果集中对应的左表数据用NULL填充。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31
SELECT ms.id, s.name, ms.score FROM student s RIGHT JOIN math_score ms on (s.id = ms.id); id | name | score ----+-------+------- 1 | Tom | 80 6 | | 4 | Perry | 95 2 | Lily | 75 EXPLAIN SELECT ms.id, s.name, ms.score FROM student s RIGHT JOIN math_score ms on (s.id = ms.id); QUERY PLAN ---------------------------------------------------------------------------------------- id | operation | E-rows | E-memory | E-width | E-costs ----+-----------------------------------------+--------+----------+---------+--------- 1 | -> Streaming (type: GATHER) | 30 | | 13 | 19.47 2 | -> Hash Left Join (3, 4) | 30 | 1MB | 13 | 11.47 3 | -> Seq Scan on math_score ms | 30 | 1MB | 8 | 10.10 4 | -> Hash | 12 | 16MB | 9 | 1.28 5 | -> Streaming(type: BROADCAST) | 12 | 2MB | 9 | 1.28 6 | -> Seq Scan on student s | 4 | 1MB | 9 | 1.01 Predicate Information (identified by plan id) --------------------------------------------- 2 --Hash Left Join (3, 4) Hash Cond: (ms.id = s.id) ====== Query Summary ===== ------------------------------- System available mem: 1761280KB Query Max mem: 1761280KB Query estimated mem: 5424KB (19 rows)
对于右外连接,在join算子中却显示的是Left。这是因为,右外连接其实就是将左右表进行交互后的左外连接,所以数据库内部实现为了减少处理逻辑,会将右外连接转为左外连接。
- 全外连接(FULL JOIN)
1
left_table FULL [OUTER] JOIN right_table [ ON join_condition | USING ( join_column )]
说明:全外连接是左外连接和右外连接的综合。全外连接的结果集包括left_table和right_table的所有行,而不仅是连接列所匹配的行。如果left_table的某行在right_table中没有匹配行,则在相关联的结果集行中right_table的列均为空值。如果right_table的某行在left_table中没有匹配行,则在相关联的结果集行中left_table的列均为空值。
示例6:学生表和数学成绩表进行全外连接,学生表中id为3的行在结果集中对应的右表数据用NULL填充,数学成绩表中id为6的行在结果集中对应的左表数据用NULL填充。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34
SELECT s.id, s.name, ms.id, ms.score FROM student s FULL JOIN math_score ms ON (s.id = ms.id); id | name | id | score ----+-------+----+------- 2 | Lily | 2 | 75 4 | Perry | 4 | 95 1 | Tom | 1 | 80 3 | Tina | | | | 6 | (5 rows) EXPLAIN SELECT s.id, s.name, ms.id, ms.score FROM student s FULL JOIN math_score ms ON (s.id = ms.id); QUERY PLAN ------------------------------------------------------------------------------------------- id | operation | E-rows | E-memory | E-width | E-costs ----+--------------------------------------------+--------+----------+---------+--------- 1 | -> Streaming (type: GATHER) | 30 | | 17 | 20.24 2 | -> Hash Full Join (3, 5) | 30 | 1MB | 17 | 12.24 3 | -> Streaming(type: REDISTRIBUTE) | 30 | 2MB | 8 | 11.06 4 | -> Seq Scan on math_score ms | 30 | 1MB | 8 | 10.10 5 | -> Hash | 4 | 16MB | 9 | 1.11 6 | -> Streaming(type: REDISTRIBUTE) | 4 | 2MB | 9 | 1.11 7 | -> Seq Scan on student s | 4 | 1MB | 9 | 1.01 Predicate Information (identified by plan id) --------------------------------------------- 2 --Hash Full Join (3, 5) Hash Cond: (ms.id = s.id) ====== Query Summary ===== ------------------------------- System available mem: 1761280KB Query Max mem: 1761280KB Query estimated mem: 6496KB (20 rows)
多表查询中on条件和where条件的区别
从上面各种连接语法中可见,除自然连接和交叉连接外,其他都需要有on条件(using在查询解析过程中会被转为on条件)来限制两表连接的结果。通常在查询的语句中也都会有where条件限制查询结果。这里说的on连接条件和where过滤条件是指不含可以下推到表上的过滤条件。on和where的区别是:
- on条件是两表连接的约束条件。
- where是对两表连接后产生的结果集再次进行过滤。
简单总结就是:on条件优先于where条件,在两表进行连接时被应用;生成两表连接结果集后,再应用where条件。