更新时间:2024-10-10 GMT+08:00

GaussDB(DWS)多表连接查询

连接类型介绍

通过SQL完成各种复杂的查询,多表之间的连接是必不可少的。连接分为:内连接和外连接两大类,每大类中还可进行细分。

  • 内连接:标准内连接(INNER JOIN),交叉连接(CROSS JOIN)和自然连接(NATURAL JOIN)。
  • 外连接:左外连接(LEFT OUTER JOIN),右外连接(RIGHT OUTER JOIN)和全外连接(FULL JOIN)。

为了能更好的说明各种连接之间的区别,下面通过具体示例进行详细的阐述。

创建示例表student和math_score,并插入数据,设置enable_fast_query_shipping为off(默认为on)即查询优化器使用分布式框架;参数explain_perf_mode为pretty(默认值为pretty)指定explain的显示格式。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
CREATE TABLE student(
  id INTEGER,
  name varchar(50)
);

CREATE TABLE math_score(
  id INTEGER,
  score INTEGER
);

INSERT INTO student VALUES(1, 'Tom');
INSERT INTO student VALUES(2, 'Lily');
INSERT INTO student VALUES(3, 'Tina');
INSERT INTO student VALUES(4, 'Perry');

INSERT INTO math_score VALUES(1, 80);
INSERT INTO math_score VALUES(2, 75);
INSERT INTO math_score VALUES(4, 95);
INSERT INTO math_score VALUES(6, NULL);

SET enable_fast_query_shipping = off;
SET explain_perf_mode = pretty;

内连接

  • 标准内连接(INNER JOIN)
    语法:
    1
    left_table [INNER] JOIN right_table [ ON join_condition | USING ( join_column )]
    

    说明:表示left_table和right_table中满足join_condition的行拼接在一起作为结果输出,不满足条件的元组不会输出。

    示例1:查询学生的数学成绩。

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    SELECT s.id, s.name, ms.score FROM student s JOIN math_score ms on s.id = ms.id;
     id | name  | score
    ----+-------+-------
      2 | Lily  |    75
      1 | Tom   |    80
      4 | Perry |    95
    (3 rows)
    
    
    EXPLAIN SELECT s.id, s.name, ms.score FROM student s JOIN math_score ms on s.id = ms.id;
                                           QUERY PLAN
    ----------------------------------------------------------------------------------------
      id |                operation                | E-rows | E-memory | E-width | E-costs
     ----+-----------------------------------------+--------+----------+---------+---------
       1 | ->  Streaming (type: GATHER)            |      4 |          |      13 | 19.47
       2 |    ->  Hash Join (3,4)                  |      4 | 1MB      |      13 | 11.47
       3 |       ->  Seq Scan on math_score ms     |     30 | 1MB      |       8 | 10.10
       4 |       ->  Hash                          |     12 | 16MB     |       9 | 1.28
       5 |          ->  Streaming(type: BROADCAST) |     12 | 2MB      |       9 | 1.28
       6 |             ->  Seq Scan on student s   |      4 | 1MB      |       9 | 1.01
    
     Predicate Information (identified by plan id)
     ---------------------------------------------
       2 --Hash Join (3,4)
             Hash Cond: (ms.id = s.id)
    
       ====== Query Summary =====
     -------------------------------
     System available mem: 1761280KB
     Query Max mem: 1761280KB
     Query estimated mem: 4400KB
    (19 rows)
    
  • 交叉连接(CROSS JOIN)

    语法:

    1
    left_table CROSS JOIN right_table
    

    说明:表示left_table中所有行和right_table中的所有行分别进行连接,最终结果行数等于两边行数的乘积。又称笛卡尔积。

    示例2:学生表和数学成绩表的交叉连接。

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    SELECT s.id, s.name, ms.score FROM student s CROSS JOIN math_score ms;
     id | name  | score
    ----+-------+-------
      3 | Tina  |    80
      2 | Lily  |    80
      1 | Tom   |    80
      4 | Perry |    80
      3 | Tina  |
      2 | Lily  |
      1 | Tom   |
      4 | Perry |
      3 | Tina  |    95
      2 | Lily  |    95
      1 | Tom   |    95
      4 | Perry |    95
      2 | Lily  |    75
      3 | Tina  |    75
      1 | Tom   |    75
      4 | Perry |    75
    (16 rows)
    
    EXPLAIN SELECT s.id, s.name, ms.score FROM student s CROSS JOIN math_score ms;
                                           QUERY PLAN
    ----------------------------------------------------------------------------------------
      id |                operation                | E-rows | E-memory | E-width | E-costs
     ----+-----------------------------------------+--------+----------+---------+---------
       1 | ->  Streaming (type: GATHER)            |    120 |          |      13 | 19.89
       2 |    ->  Nested Loop (3,4)                |    120 | 1MB      |      13 | 11.89
       3 |       ->  Seq Scan on math_score ms     |     30 | 1MB      |       4 | 10.10
       4 |       ->  Materialize                   |     12 | 16MB     |       9 | 1.30
       5 |          ->  Streaming(type: BROADCAST) |     12 | 2MB      |       9 | 1.28
       6 |             ->  Seq Scan on student s   |      4 | 1MB      |       9 | 1.01
    
       ====== Query Summary =====
     -------------------------------
     System available mem: 1761280KB
     Query Max mem: 1761280KB
     Query estimated mem: 4144KB
    (14 rows)
    
  • 自然连接(NATURAL JOIN)

    语法:

    1
    left_table NATURAL JOIN right_table
    

    说明:表示left_table和right_table中列名相同的列进行等值连接,且自动将同名列只保留一份。

    示例3:学生表和数学成绩表的自然连接,两表的同名列为id列,将按照id列进行等值连接。

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    SELECT * FROM student s NATURAL JOIN math_score ms;
     id | name  | score
    ----+-------+-------
      1 | Tom   |    80
      4 | Perry |    95
      2 | Lily  |    75
    (3 rows)
    
    EXPLAIN SELECT * FROM student s NATURAL JOIN math_score ms;
                                           QUERY PLAN
    ----------------------------------------------------------------------------------------
      id |                operation                | E-rows | E-memory | E-width | E-costs
     ----+-----------------------------------------+--------+----------+---------+---------
       1 | ->  Streaming (type: GATHER)            |      4 |          |      13 | 19.47
       2 |    ->  Hash Join (3,4)                  |      4 | 1MB      |      13 | 11.47
       3 |       ->  Seq Scan on math_score ms     |     30 | 1MB      |       8 | 10.10
       4 |       ->  Hash                          |     12 | 16MB     |       9 | 1.28
       5 |          ->  Streaming(type: BROADCAST) |     12 | 2MB      |       9 | 1.28
       6 |             ->  Seq Scan on student s   |      4 | 1MB      |       9 | 1.01
    
     Predicate Information (identified by plan id)
     ---------------------------------------------
       2 --Hash Join (3,4)
             Hash Cond: (ms.id = s.id)
    
       ====== Query Summary =====
     -------------------------------
     System available mem: 1761280KB
     Query Max mem: 1761280KB
     Query estimated mem: 4400KB
    (19 rows)
    

外连接

  • 左外连接(LEFT JOIN)

    语法:

    1
    left_table LEFT [OUTER] JOIN right_table [ ON join_condition | USING ( join_column )]
    

    说明:左外连接的结果集包括left_table的所有行,而不仅是连接列所匹配的行。如果left_table的某行在right_table中没有匹配行,则在相关联的结果集行中输出right_table的列均为空值。

    示例4:学生表和数学成绩表进行左外连接,学生表中id为3的行在结果集中对应的右表数据用NULL填充。

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    SELECT s.id, s.name, ms.score FROM student s LEFT JOIN math_score ms on (s.id = ms.id);
     id | name  | score
    ----+-------+-------
      3 | Tina  |
      1 | Tom   |    80
      2 | Lily  |    75
      4 | Perry |    95
    (4 rows)
    
    EXPLAIN SELECT s.id, s.name, ms.score FROM student s LEFT JOIN math_score ms on (s.id = ms.id);
                                            QUERY PLAN
    -------------------------------------------------------------------------------------------
      id |                 operation                  | E-rows | E-memory | E-width | E-costs
     ----+--------------------------------------------+--------+----------+---------+---------
       1 | ->  Streaming (type: GATHER)               |      4 |          |      13 | 10.26
       2 |    ->  Hash Left Join (3, 5)               |      4 | 1MB      |      13 | 2.26
       3 |       ->  Streaming(type: REDISTRIBUTE)    |      4 | 2MB      |       9 | 1.11
       4 |          ->  Seq Scan on student s         |      4 | 1MB      |       9 | 1.01
       5 |       ->  Hash                             |      4 | 16MB     |       8 | 1.11
       6 |          ->  Streaming(type: REDISTRIBUTE) |      4 | 2MB      |       8 | 1.11
       7 |             ->  Seq Scan on math_score ms  |      4 | 1MB      |       8 | 1.01
    
     Predicate Information (identified by plan id)
     ---------------------------------------------
       2 --Hash Left Join (3, 5)
             Hash Cond: (s.id = ms.id)
    
       ====== Query Summary =====
     ------------------------------
     System available mem: 901120KB
     Query Max mem: 901120KB
     Query estimated mem: 7520KB
    (20 rows)
    
  • 右外连接(RIGHT JOIN)

    语法:

    1
    left_table RIGHT [OUTER] JOIN right_table [ ON join_condition | USING ( join_column )]
    

    说明:与左外连接相反,右外连接的结果集包括right_table的所有行,而不仅是连接列所匹配的行。如果right_table的某行在left_table中没有匹配行,则在相关联的结果集行中left_table的列均为空值。

    示例5:学生表和数学成绩表进行右外连接,数学成绩表中id为6的行在结果集中对应的左表数据用NULL填充。

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    SELECT ms.id, s.name, ms.score FROM student s RIGHT JOIN math_score ms on (s.id = ms.id);
     id | name  | score
    ----+-------+-------
      1 | Tom   |    80
      6 |       |
      4 | Perry |    95
      2 | Lily  |    75
    
    EXPLAIN SELECT ms.id, s.name, ms.score FROM student s RIGHT JOIN math_score ms on (s.id = ms.id);
                                           QUERY PLAN
    ----------------------------------------------------------------------------------------
      id |                operation                | E-rows | E-memory | E-width | E-costs
     ----+-----------------------------------------+--------+----------+---------+---------
       1 | ->  Streaming (type: GATHER)            |     30 |          |      13 | 19.47
       2 |    ->  Hash Left Join (3, 4)            |     30 | 1MB      |      13 | 11.47
       3 |       ->  Seq Scan on math_score ms     |     30 | 1MB      |       8 | 10.10
       4 |       ->  Hash                          |     12 | 16MB     |       9 | 1.28
       5 |          ->  Streaming(type: BROADCAST) |     12 | 2MB      |       9 | 1.28
       6 |             ->  Seq Scan on student s   |      4 | 1MB      |       9 | 1.01
    
     Predicate Information (identified by plan id)
     ---------------------------------------------
       2 --Hash Left Join (3, 4)
             Hash Cond: (ms.id = s.id)
    
       ====== Query Summary =====
     -------------------------------
     System available mem: 1761280KB
     Query Max mem: 1761280KB
     Query estimated mem: 5424KB
    (19 rows)
    

    对于右外连接,在join算子中却显示的是Left。这是因为,右外连接其实就是将左右表进行交互后的左外连接,所以数据库内部实现为了减少处理逻辑,会将右外连接转为左外连接。

  • 全外连接(FULL JOIN)

    语法:

    1
    left_table FULL [OUTER] JOIN right_table [ ON join_condition | USING ( join_column )]
    

    说明:全外连接是左外连接和右外连接的综合。全外连接的结果集包括left_table和right_table的所有行,而不仅是连接列所匹配的行。如果left_table的某行在right_table中没有匹配行,则在相关联的结果集行中right_table的列均为空值。如果right_table的某行在left_table中没有匹配行,则在相关联的结果集行中left_table的列均为空值。

    示例6:学生表和数学成绩表进行全外连接,学生表中id为3的行在结果集中对应的右表数据用NULL填充,数学成绩表中id为6的行在结果集中对应的左表数据用NULL填充。

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    SELECT s.id, s.name, ms.id, ms.score FROM student s FULL JOIN math_score ms ON (s.id = ms.id);
     id | name  | id | score
    ----+-------+----+-------
      2 | Lily  |  2 |    75
      4 | Perry |  4 |    95
      1 | Tom   |  1 |    80
      3 | Tina  |    |
        |       |  6 |
    (5 rows)
    
    EXPLAIN SELECT s.id, s.name, ms.id, ms.score FROM student s FULL JOIN math_score ms ON (s.id = ms.id);
                                            QUERY PLAN
    -------------------------------------------------------------------------------------------
      id |                 operation                  | E-rows | E-memory | E-width | E-costs
     ----+--------------------------------------------+--------+----------+---------+---------
       1 | ->  Streaming (type: GATHER)               |     30 |          |      17 | 20.24
       2 |    ->  Hash Full Join (3, 5)               |     30 | 1MB      |      17 | 12.24
       3 |       ->  Streaming(type: REDISTRIBUTE)    |     30 | 2MB      |       8 | 11.06
       4 |          ->  Seq Scan on math_score ms     |     30 | 1MB      |       8 | 10.10
       5 |       ->  Hash                             |      4 | 16MB     |       9 | 1.11
       6 |          ->  Streaming(type: REDISTRIBUTE) |      4 | 2MB      |       9 | 1.11
       7 |             ->  Seq Scan on student s      |      4 | 1MB      |       9 | 1.01
    
     Predicate Information (identified by plan id)
     ---------------------------------------------
       2 --Hash Full Join (3, 5)
             Hash Cond: (ms.id = s.id)
    
       ====== Query Summary =====
     -------------------------------
     System available mem: 1761280KB
     Query Max mem: 1761280KB
     Query estimated mem: 6496KB
    (20 rows)
    

多表查询中on条件和where条件的区别

从上面各种连接语法中可见,除自然连接和交叉连接外,其他都需要有on条件(using在查询解析过程中会被转为on条件)来限制两表连接的结果。通常在查询的语句中也都会有where条件限制查询结果。这里说的on连接条件和where过滤条件是指不含可以下推到表上的过滤条件。on和where的区别是:

  • on条件是两表连接的约束条件。
  • where是对两表连接后产生的结果集再次进行过滤。

简单总结就是:on条件优先于where条件,在两表进行连接时被应用;生成两表连接结果集后,再应用where条件。