更新时间:2024-09-06 GMT+08:00

多表连接场景下DISTINCT优化

对于多表连接+DISTINCT场景,MySQL 8.0需要扫描表连接后的结果,当表连接数量多或基表数据量大时,需要扫描的数据量很大,导致执行效率很低。

为了提升DISTINCT,尤其多表连接下DISTINCT的查询效率,GaussDB(for MySQL)在执行优化器中加入了剪枝功能,可以去除不必要的扫描分支,提升查询性能。

适用场景

  • Nested Loop Inner Join + Distinct
  • Nested Loop Outer Join + Distinct

使用须知

内核版本大于等于2.0.51.240300时可使用该功能。

开启多表连接DISTINCT优化

表1 参数说明

参数名称

级别

描述

rds_nlj_distinct_optimize

Global,Session

DISTINCT优化特性开关,默认值为OFF。

  • ON:开启DISTINCT优化特性。
  • OFF:关闭DISTINCT优化特性。

除了使用上述开关来控制优化特性生效或者不生效,还可以使用HINT来实现,语法如下。

  • 开启DISTINCT优化特性

    /*+ SET_VAR(rds_nlj_distinct_optimize=ON) */

  • 关闭 DISTINCT 优化特性

    /*+ SET_VAR(rds_nlj_distinct_optimize=OFF) */

使用示例

  1. 使用如下任意方式开启DISTINCT优化特性。
    • 通过SET命令设置此开关值。
      mysql> SET rds_nlj_distinct_optimize=ON;
      Query OK, 0 rows affected (0.00 sec)
      
      mysql> SET rds_nlj_distinct_optimize=OFF;
      Query OK, 0 rows affected (0.00 sec)
    • 通过HINT方式在SQL语句中设置开关值。
      mysql> EXPLAIN ANALYZE SELECT/*+ SET_VAR(rds_nlj_distinct_optimize=ON) */ DISTINCT tt1.a FROM t1 AS tt1 JOIN t1 AS tt2 JOIN t1 AS tt3 ON tt2.a + 3 = tt3.a;
      
      mysql> EXPLAIN ANALYZE SELECT/*+ SET_VAR(rds_nlj_distinct_optimize=OFF) */ DISTINCT tt1.a FROM t1 AS tt1 JOIN t1 AS tt2 JOIN t1 AS tt3 ON tt2.a + 3 = tt3.a;
  2. 确认多表连接场景下DISTINCT优化效果。

    通过执行Explain Analyze/Explain Format=tree语句可以确认优化是否生效,执行计划出现'with distinct optimization'关键字时,说明优化生效。

    具体步骤如下:

    1. 准备数据。
      CREATE TABLE t1(a INT, KEY(a));
      INSERT INTO t1 VALUES(1),(2),(5),(6),(7),(8),(9),(11);
      ANALYZE TABLE t1;
    2. 关闭特性,执行以下SQL语句,优化器选择默认的执行计划。
      mysql> SET rds_nlj_distinct_optimize=OFF;
      Query OK, 0 rows affected (0.00 sec)
      
      mysql> EXPLAIN FORMAT=TREE SELECT DISTINCT tt1.a FROM t1 AS tt1 LEFT JOIN t1 AS tt2 ON TRUE LEFT JOIN t1 AS tt3 ON tt2.a + 3 = tt3.a\G
      *************************** 1. row ***************************
      EXPLAIN: -> Table scan on <temporary>
          -> Temporary table with deduplication  (cost=29.18 rows=64)
              -> Nested loop left join  (cost=29.18 rows=64)
                  -> Left hash join (no condition)  (cost=6.78 rows=64)
                      -> Index scan on tt1 using a  (cost=1.05 rows=8)
                      -> Hash
                          -> Index scan on tt2 using a  (cost=0.13 rows=8)
                  -> Filter: ((tt2.a + 3) = tt3.a)  (cost=0.25 rows=1)
                      -> Index lookup on tt3 using a (a=(tt2.a + 3))  (cost=0.25 rows=1)
    3. 开启特性,执行以下SQL语句,执行计划中可以看到有"with distinct optimization"关键字,说明此优化生效。
      mysql> SET rds_nlj_distinct_optimize=ON;
      Query OK, 0 rows affected (0.00 sec)
      
      mysql> EXPLAIN FORMAT=TREE SELECT DISTINCT tt1.a FROM t1 AS tt1 LEFT JOIN t1 AS tt2 ON TRUE LEFT JOIN t1 AS tt3 ON tt2.a + 3 = tt3.a\G
      *************************** 1. row ***************************
      EXPLAIN: -> Table scan on <temporary>
          -> Temporary table with deduplication  (cost=29.18 rows=64)
              -> Nested loop left join with distinct optimization  (cost=29.18 rows=64)
                  -> Left hash join (no condition)  (cost=6.78 rows=64)
                      -> Index scan on tt1 using a  (cost=1.05 rows=8)
                      -> Hash
                          -> Index scan on tt2 using a  (cost=0.13 rows=8)
                  -> Filter: ((tt2.a + 3) = tt3.a)  (cost=0.25 rows=1)
                      -> Index lookup on tt3 using a (a=(tt2.a + 3))  (cost=0.25 rows=1)

性能测试

GaussDB(for MySQL)执行耗时2.7秒完成,只需要扫描约61万行数据。相比MySQL 8.0 社区版本执行耗时约186秒,扫描数据量4400万,执行效率大大提升。

如下示例中,对7个表连接后的结果做DISTINCT,使用MySQL 8.0.30社区版本,执行耗时186秒,扫描了约4400万行数据。

GaussDB(for MySQL)执行耗时2.7秒,扫描约61万行数据。

查询语句:

select distinct ed.code,et.*
from ele_template et
left join ele_template_tenant ett on ett.template_id = et.id
left join ele_relation tm on tm.tom_id = et.id and tm.jerry_type = 'chapter'
left join ele_relation mv on mv.tom_id = tm.jerry_id and mv.jerry_type = 'variable'
left join ele_relation cv on cv.jerry_id = mv.jerry_id and cv.tom_type = 'column'
left join ele_doc_column edc on edc.id = cv.tom_id
left join ele_doc ed on ed.id = edc.doc_id
where ett.uctenantid = 'mmo0l3f8'
and ed.code = 'contract'
and et.billtype = 'contract'
order by ifnull(et.utime,et.ctime)
desc limit 0,10;

执行计划:

+----+-------------+-------+------------+--------+-------------------------+-----------------+---------+----------------------+------+----------+----------------------------------------------+
| id | select_type | table | partitions | type   | possible_keys           | key             | key_len | ref                  | rows | filtered | Extra                                        |
+----+-------------+-------+------------+--------+-------------------------+-----------------+---------+----------------------+------+----------+----------------------------------------------+
|  1 | SIMPLE      | ed    | NULL       | ref    | PRIMARY,idx_code        | idx_code        | 203     | const                |    1 |   100.00 | Using index; Using temporary; Using filesort |
|  1 | SIMPLE      | ett   | NULL       | ref    | PRIMARY,idx_uctenanatid | idx_uctenanatid | 203     | const                |  352 |   100.00 | Using index                                  |
|  1 | SIMPLE      | et    | NULL       | eq_ref | PRIMARY,idx_billtype    | PRIMARY         | 8       | test.ett.template_id |    1 |    94.57 | Using where                                  |
|  1 | SIMPLE      | tm    | NULL       | ref    | idx_tom_id,idx_jerry_id | idx_tom_id      | 9       | test.ett.template_id |   59 |    10.00 | Using index condition; Using where; Distinct |
|  1 | SIMPLE      | mv    | NULL       | ref    | idx_tom_id,idx_jerry_id | idx_tom_id      | 9       | test.tm.jerry_id     |   59 |    10.00 | Using where; Distinct                        |
|  1 | SIMPLE      | cv    | NULL       | ref    | idx_tom_id,idx_jerry_id | idx_jerry_id    | 9       | test.mv.jerry_id     |   47 |    10.00 | Using where; Distinct                        |
|  1 | SIMPLE      | edc   | NULL       | eq_ref | PRIMARY,idx_doc_id      | PRIMARY         | 8       | test.cv.tom_id       |    1 |    50.00 | Using where; Distinct                        |
+----+-------------+-------+------------+--------+-------------------------+-----------------+---------+----------------------+------+----------+----------------------------------------------+
图1 执行耗时对比
图2 扫描行数