多表连接场景下DISTINCT优化
对于多表连接+DISTINCT场景,MySQL 8.0需要扫描表连接后的结果,当表连接数量多或基表数据量大时,需要扫描的数据量很大,导致执行效率很低。
为了提升DISTINCT,尤其多表连接下DISTINCT的查询效率,GaussDB(for MySQL)在执行优化器中加入了剪枝功能,可以去除不必要的扫描分支,提升查询性能。
适用场景
- Nested Loop Inner Join + Distinct
- Nested Loop Outer Join + Distinct
使用须知
内核版本大于等于2.0.51.240300时可使用该功能。
开启多表连接DISTINCT优化
参数名称 |
级别 |
描述 |
---|---|---|
rds_nlj_distinct_optimize |
Global,Session |
DISTINCT优化特性开关,默认值为OFF。
|
除了使用上述开关来控制优化特性生效或者不生效,还可以使用HINT来实现,语法如下。
使用示例
- 使用如下任意方式开启DISTINCT优化特性。
- 通过SET命令设置此开关值。
mysql> SET rds_nlj_distinct_optimize=ON; Query OK, 0 rows affected (0.00 sec) mysql> SET rds_nlj_distinct_optimize=OFF; Query OK, 0 rows affected (0.00 sec)
- 通过HINT方式在SQL语句中设置开关值。
mysql> EXPLAIN ANALYZE SELECT/*+ SET_VAR(rds_nlj_distinct_optimize=ON) */ DISTINCT tt1.a FROM t1 AS tt1 JOIN t1 AS tt2 JOIN t1 AS tt3 ON tt2.a + 3 = tt3.a; mysql> EXPLAIN ANALYZE SELECT/*+ SET_VAR(rds_nlj_distinct_optimize=OFF) */ DISTINCT tt1.a FROM t1 AS tt1 JOIN t1 AS tt2 JOIN t1 AS tt3 ON tt2.a + 3 = tt3.a;
- 通过SET命令设置此开关值。
- 确认多表连接场景下DISTINCT优化效果。
通过执行Explain Analyze/Explain Format=tree语句可以确认优化是否生效,执行计划出现'with distinct optimization'关键字时,说明优化生效。
具体步骤如下:
- 准备数据。
CREATE TABLE t1(a INT, KEY(a)); INSERT INTO t1 VALUES(1),(2),(5),(6),(7),(8),(9),(11); ANALYZE TABLE t1;
- 关闭特性,执行以下SQL语句,优化器选择默认的执行计划。
mysql> SET rds_nlj_distinct_optimize=OFF; Query OK, 0 rows affected (0.00 sec) mysql> EXPLAIN FORMAT=TREE SELECT DISTINCT tt1.a FROM t1 AS tt1 LEFT JOIN t1 AS tt2 ON TRUE LEFT JOIN t1 AS tt3 ON tt2.a + 3 = tt3.a\G *************************** 1. row *************************** EXPLAIN: -> Table scan on <temporary> -> Temporary table with deduplication (cost=29.18 rows=64) -> Nested loop left join (cost=29.18 rows=64) -> Left hash join (no condition) (cost=6.78 rows=64) -> Index scan on tt1 using a (cost=1.05 rows=8) -> Hash -> Index scan on tt2 using a (cost=0.13 rows=8) -> Filter: ((tt2.a + 3) = tt3.a) (cost=0.25 rows=1) -> Index lookup on tt3 using a (a=(tt2.a + 3)) (cost=0.25 rows=1)
- 开启特性,执行以下SQL语句,执行计划中可以看到有"with distinct optimization"关键字,说明此优化生效。
mysql> SET rds_nlj_distinct_optimize=ON; Query OK, 0 rows affected (0.00 sec) mysql> EXPLAIN FORMAT=TREE SELECT DISTINCT tt1.a FROM t1 AS tt1 LEFT JOIN t1 AS tt2 ON TRUE LEFT JOIN t1 AS tt3 ON tt2.a + 3 = tt3.a\G *************************** 1. row *************************** EXPLAIN: -> Table scan on <temporary> -> Temporary table with deduplication (cost=29.18 rows=64) -> Nested loop left join with distinct optimization (cost=29.18 rows=64) -> Left hash join (no condition) (cost=6.78 rows=64) -> Index scan on tt1 using a (cost=1.05 rows=8) -> Hash -> Index scan on tt2 using a (cost=0.13 rows=8) -> Filter: ((tt2.a + 3) = tt3.a) (cost=0.25 rows=1) -> Index lookup on tt3 using a (a=(tt2.a + 3)) (cost=0.25 rows=1)
- 准备数据。
性能测试
GaussDB(for MySQL)执行耗时2.7秒完成,只需要扫描约61万行数据。相比MySQL 8.0 社区版本执行耗时约186秒,扫描数据量4400万,执行效率大大提升。
如下示例中,对7个表连接后的结果做DISTINCT,使用MySQL 8.0.30社区版本,执行耗时186秒,扫描了约4400万行数据。
GaussDB(for MySQL)执行耗时2.7秒,扫描约61万行数据。
查询语句:
select distinct ed.code,et.* from ele_template et left join ele_template_tenant ett on ett.template_id = et.id left join ele_relation tm on tm.tom_id = et.id and tm.jerry_type = 'chapter' left join ele_relation mv on mv.tom_id = tm.jerry_id and mv.jerry_type = 'variable' left join ele_relation cv on cv.jerry_id = mv.jerry_id and cv.tom_type = 'column' left join ele_doc_column edc on edc.id = cv.tom_id left join ele_doc ed on ed.id = edc.doc_id where ett.uctenantid = 'mmo0l3f8' and ed.code = 'contract' and et.billtype = 'contract' order by ifnull(et.utime,et.ctime) desc limit 0,10;
执行计划:
+----+-------------+-------+------------+--------+-------------------------+-----------------+---------+----------------------+------+----------+----------------------------------------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-------+------------+--------+-------------------------+-----------------+---------+----------------------+------+----------+----------------------------------------------+ | 1 | SIMPLE | ed | NULL | ref | PRIMARY,idx_code | idx_code | 203 | const | 1 | 100.00 | Using index; Using temporary; Using filesort | | 1 | SIMPLE | ett | NULL | ref | PRIMARY,idx_uctenanatid | idx_uctenanatid | 203 | const | 352 | 100.00 | Using index | | 1 | SIMPLE | et | NULL | eq_ref | PRIMARY,idx_billtype | PRIMARY | 8 | test.ett.template_id | 1 | 94.57 | Using where | | 1 | SIMPLE | tm | NULL | ref | idx_tom_id,idx_jerry_id | idx_tom_id | 9 | test.ett.template_id | 59 | 10.00 | Using index condition; Using where; Distinct | | 1 | SIMPLE | mv | NULL | ref | idx_tom_id,idx_jerry_id | idx_tom_id | 9 | test.tm.jerry_id | 59 | 10.00 | Using where; Distinct | | 1 | SIMPLE | cv | NULL | ref | idx_tom_id,idx_jerry_id | idx_jerry_id | 9 | test.mv.jerry_id | 47 | 10.00 | Using where; Distinct | | 1 | SIMPLE | edc | NULL | eq_ref | PRIMARY,idx_doc_id | PRIMARY | 8 | test.cv.tom_id | 1 | 50.00 | Using where; Distinct | +----+-------------+-------+------------+--------+-------------------------+-----------------+---------+----------------------+------+----------+----------------------------------------------+