GaussDB(DWS)查询时索引失效场景解析
对表建立索引可提高数据库查询性能,但有时会出现建立了索引,但查询计划中却发现索引没有被使用的情况。针对这种情况,本文将列举几种常见的场景和优化方法。
场景一:返回结果集很大
以行存表的Seq Scan和Index Scan为例:
- Seq Scan:按照表的记录的排列顺序从头到尾依次检索扫描,每次扫描要取到所有的记录。这也是最简单最基础的扫表方式,扫描的代价比较大。
- Index Scan:对于指定的查询,先扫描一遍索引,从索引中找到符合要求的记录的位置(指针),再定位到表中具体的Page去获取,即先走索引,再读表数据。
因此,根据两种扫描方式的特点可知,多数情况下,Index Scan要比Seq Scan快。但是如果获取的结果集占所有数据的比重很大时(超过70%),这时Index Scan因为要先扫描索引再读表数据反而不如直接全表扫描的速度快。
场景二:未及时ANALYZE
ANALYZE更新表的统计信息,如果表未执行ANALYZE或最近一次执行完ANALYZE后表进行过数据量较大的增删操作,会导致统计信息不准,该场景下也可能导致查询表时没有使用索引。
优化方法:对表执行ANALYZE更新统计信息。
场景三:过滤条件使用了函数或隐式类型转化
如果在过滤条件中使用了计算、函数、隐式类型转化,都可能导致无法选择索引。
例如创建表,并在a,b,c三列上都分别创建了索引。
1
|
CREATE TABLE test(a int, b text, c date); |
- 在索引字段进行计算操作。
从下面的执行结果可以看出:where a = 101,where a = 102 - 1都使用了a列上的索引,但是where a + 1 = 102没有使用索引。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28
explain verbose select * from test where a = 101; QUERY PLAN ------------------------------------------------------------------------------------------------------------ id | operation | E-rows | E-distinct | E-memory | E-width | E-costs ----+------------------------------------------------+--------+------------+----------+---------+--------- 1 | -> Streaming (type: GATHER) | 1 | | | 44 | 16.27 2 | -> Index Scan using index_a on public.test | 1 | | 1MB | 44 | 8.27 Predicate Information (identified by plan id) --------------------------------------------- 2 --Index Scan using index_a on public.test Index Cond: (test.a = 101) Targetlist Information (identified by plan id) ---------------------------------------------- 1 --Streaming (type: GATHER) Output: a, b, c Node/s: dn_6005_6006 2 --Index Scan using index_a on public.test Output: a, b, c Distribute Key: a ====== Query Summary ===== ------------------------------- System available mem: 3358720KB Query Max mem: 3358720KB Query estimated mem: 1024KB (24 rows)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28
explain verbose select * from test where a = 102 - 1; QUERY PLAN ------------------------------------------------------------------------------------------------------------ id | operation | E-rows | E-distinct | E-memory | E-width | E-costs ----+------------------------------------------------+--------+------------+----------+---------+--------- 1 | -> Streaming (type: GATHER) | 1 | | | 44 | 16.27 2 | -> Index Scan using index_a on public.test | 1 | | 1MB | 44 | 8.27 Predicate Information (identified by plan id) --------------------------------------------- 2 --Index Scan using index_a on public.test Index Cond: (test.a = 101) Targetlist Information (identified by plan id) ---------------------------------------------- 1 --Streaming (type: GATHER) Output: a, b, c Node/s: dn_6005_6006 2 --Index Scan using index_a on public.test Output: a, b, c Distribute Key: a ====== Query Summary ===== ------------------------------- System available mem: 3358720KB Query Max mem: 3358720KB Query estimated mem: 1024KB (24 rows)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28
explain verbose select * from test where a + 1 = 102; QUERY PLAN -------------------------------------------------------------------------------------------- id | operation | E-rows | E-distinct | E-memory | E-width | E-costs ----+--------------------------------+--------+------------+----------+---------+--------- 1 | -> Streaming (type: GATHER) | 1 | | | 44 | 22.21 2 | -> Seq Scan on public.test | 1 | | 1MB | 44 | 14.21 Predicate Information (identified by plan id) --------------------------------------------- 2 --Seq Scan on public.test Filter: ((test.a + 1) = 102) Targetlist Information (identified by plan id) ---------------------------------------------- 1 --Streaming (type: GATHER) Output: a, b, c Node/s: All datanodes 2 --Seq Scan on public.test Output: a, b, c Distribute Key: a ====== Query Summary ===== ------------------------------- System available mem: 3358720KB Query Max mem: 3358720KB Query estimated mem: 1024KB (24 rows)
优化方式:尽量使用常量代替表达式,或者常量计算尽量写在等号的右侧。
- 在索引字段上使用函数。
从下面的执行结果可以看出:在索引列上使用函数也会导致无法选择索引。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28
explain verbose select * from test where to_char(c, 'yyyyMMdd') = to_char(CURRENT_DATE,'yyyyMMdd'); QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------------- id | operation | E-rows | E-distinct | E-memory | E-width | E-costs ----+--------------------------------+--------+------------+----------+---------+--------- 1 | -> Streaming (type: GATHER) | 1 | | | 44 | 22.28 2 | -> Seq Scan on public.test | 1 | | 1MB | 44 | 14.28 Predicate Information (identified by plan id) ------------------------------------------------------------------------------------------------------------------------------------------ 2 --Seq Scan on public.test Filter: (to_char(test.c, 'yyyyMMdd'::text) = to_char(('2022-11-30'::pg_catalog.date)::timestamp with time zone, 'yyyyMMdd'::text)) Targetlist Information (identified by plan id) ---------------------------------------------- 1 --Streaming (type: GATHER) Output: a, b, c Node/s: All datanodes 2 --Seq Scan on public.test Output: a, b, c Distribute Key: a ====== Query Summary ===== ------------------------------- System available mem: 3358720KB Query Max mem: 3358720KB Query estimated mem: 1024KB (24 rows)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28
explain verbose select * from test where c = current_date; QUERY PLAN ------------------------------------------------------------------------------------------------------------ id | operation | E-rows | E-distinct | E-memory | E-width | E-costs ----+------------------------------------------------+--------+------------+----------+---------+--------- 1 | -> Streaming (type: GATHER) | 1 | | | 44 | 16.27 2 | -> Index Scan using index_c on public.test | 1 | | 1MB | 44 | 8.27 Predicate Information (identified by plan id) ------------------------------------------------------------ 2 --Index Scan using index_c on public.test Index Cond: (test.c = '2022-11-30'::pg_catalog.date) Targetlist Information (identified by plan id) ---------------------------------------------- 1 --Streaming (type: GATHER) Output: a, b, c Node/s: All datanodes 2 --Index Scan using index_c on public.test Output: a, b, c Distribute Key: a ====== Query Summary ===== ------------------------------- System available mem: 3358720KB Query Max mem: 3358720KB Query estimated mem: 1024KB (24 rows)
优化方法:尽量减少索引列上没有必要的函数调用。
- 数据类型隐式转化。
此类场景较常见,例如字段b的类型是text,过滤条件是where b = 2,在生成计划时,text类型会隐式转化为bigint类型,实际的过滤条件变成where b::bigint = 2,导致b列上的索引失效。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28
explain verbose select * from test where b = 2; QUERY PLAN -------------------------------------------------------------------------------------------- id | operation | E-rows | E-distinct | E-memory | E-width | E-costs ----+--------------------------------+--------+------------+----------+---------+--------- 1 | -> Streaming (type: GATHER) | 1 | | | 44 | 22.21 2 | -> Seq Scan on public.test | 1 | | 1MB | 44 | 14.21 Predicate Information (identified by plan id) --------------------------------------------- 2 --Seq Scan on public.test Filter: ((test.b)::bigint = 2) Targetlist Information (identified by plan id) ---------------------------------------------- 1 --Streaming (type: GATHER) Output: a, b, c Node/s: All datanodes 2 --Seq Scan on public.test Output: a, b, c Distribute Key: a ====== Query Summary ===== ------------------------------- System available mem: 3358720KB Query Max mem: 3358720KB Query estimated mem: 1024KB (24 rows)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28
explain verbose select * from test where b = '2'; QUERY PLAN ------------------------------------------------------------------------------------------------------------ id | operation | E-rows | E-distinct | E-memory | E-width | E-costs ----+------------------------------------------------+--------+------------+----------+---------+--------- 1 | -> Streaming (type: GATHER) | 1 | | | 44 | 16.27 2 | -> Index Scan using index_b on public.test | 1 | | 1MB | 44 | 8.27 Predicate Information (identified by plan id) --------------------------------------------- 2 --Index Scan using index_b on public.test Index Cond: (test.b = '2'::text) Targetlist Information (identified by plan id) ---------------------------------------------- 1 --Streaming (type: GATHER) Output: a, b, c Node/s: All datanodes 2 --Index Scan using index_b on public.test Output: a, b, c Distribute Key: a ====== Query Summary ===== ------------------------------- System available mem: 3358720KB Query Max mem: 3358720KB Query estimated mem: 1024KB (24 rows)
优化方法:索引条件上的常量尽可能使用和索引列相同类型的常量,避免发生隐式类型转化。
场景四:用nestloop + indexscan代替hashjoin
此类语句的特征是两个表关联的时候,其中一个表上where条件过滤之后的结果集行数很小,同时,最终满足条件的结果集行数也很小。此时,使用nestloop+indexscan的效果往往要优于hashjoin。较优的执行计划如下:
可以看到第5层的Index Cond: (t1.b = t2.b)已经把join条件下推到了基表扫描上。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 |
explain verbose select t1.a,t1.b from t1,t2 where t1.b=t2.b and t2.a=4; id | operation | E-rows | E-distinct | E-memory | E-width | E-costs ----+--------------------------------------------------+--------+------------+----------+---------+--------- 1 | -> Streaming (type: GATHER) | 26 | | | 8 | 17.97 2 | -> Nested Loop (3,5) | 26 | | 1MB | 8 | 11.97 3 | -> Streaming(type: BROADCAST) | 2 | | 2MB | 4 | 2.78 4 | -> Seq Scan on public.t2 | 1 | | 1MB | 4 | 2.62 5 | -> Index Scan using t1_b_idx on public.t1 | 26 | | 1MB | 8 | 9.05 (5 rows) Predicate Information (identified by plan id) ----------------------------------------------- 4 --Seq Scan on public.t2 Filter: (t2.a = 4) 5 --Index Scan using t1_b_idx on public.t1 Index Cond: (t1.b = t2.b) (4 rows) Targetlist Information (identified by plan id) ------------------------------------------------ 1 --Streaming (type: GATHER) Output: t1.a, t1.b Node/s: All datanodes 2 --Nested Loop (3,5) Output: t1.a, t1.b 3 --Streaming(type: BROADCAST) Output: t2.b Spawn on: datanode2 Consumer Nodes: All datanodes 4 --Seq Scan on public.t2 Output: t2.b Distribute Key: t2.a 5 --Index Scan using t1_b_idx on public.t1 Output: t1.a, t1.b Distribute Key: t1.a (15 rows) ====== Query Summary ===== --------------------------------- System available mem: 9262694KB Query Max mem: 9471590KB Query estimated mem: 5144KB (3 rows) |
如果优化器没有选择这种执行计划,可以通过以下方式优化:
1 2 3 |
set enable_index_nestloop = on; set enable_hashjoin = off; set enable_seqscan = off; |
场景五:使用hint指定索引时指定的索引方式不对
GaussDB(DWS)的plan hint当前支持指定的Scan方式有三种:tablescan、indexscan和indexonlyscan。
- tablescan:全表扫描,比如行存表的Seq Scan,列存表的CStore Scan。
- indexscan:先扫索引,再根据索引取表记录。
- indexonlyscan:覆盖索引扫描,所需的返回结果能被所扫描的索引全部覆盖。与index scan相比,index only scan所包含的字段集合,涵盖了查询语句中的字段,这样,提取出相应的index就不必再根据索引取表记录了。
因此,对于需要indexonlyscan的场景,如果hint指定了indexscan,该hint是无法生效的:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 |
explain verbose select/*+ indexscan(test)*/ b from test where b = '1'; WARNING: unused hint: IndexScan(test) QUERY PLAN ----------------------------------------------------------------------------------------------------------------- id | operation | E-rows | E-distinct | E-memory | E-width | E-costs ----+-----------------------------------------------------+--------+------------+----------+---------+--------- 1 | -> Streaming (type: GATHER) | 1 | | | 32 | 16.27 2 | -> Index Only Scan using index_b on public.test | 1 | | 1MB | 32 | 8.27 Predicate Information (identified by plan id) -------------------------------------------------- 2 --Index Only Scan using index_b on public.test Index Cond: (test.b = '1'::text) Targetlist Information (identified by plan id) -------------------------------------------------- 1 --Streaming (type: GATHER) Output: b Node/s: All datanodes 2 --Index Only Scan using index_b on public.test Output: b Distribute Key: a ====== Query Summary ===== ------------------------------- System available mem: 3358720KB Query Max mem: 3358720KB Query estimated mem: 1024KB (24 rows) |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 |
explain verbose select/*+ indexonlyscan(test)*/ b from test where b = '1'; QUERY PLAN ----------------------------------------------------------------------------------------------------------------- id | operation | E-rows | E-distinct | E-memory | E-width | E-costs ----+-----------------------------------------------------+--------+------------+----------+---------+--------- 1 | -> Streaming (type: GATHER) | 1 | | | 32 | 16.27 2 | -> Index Only Scan using index_b on public.test | 1 | | 1MB | 32 | 8.27 Predicate Information (identified by plan id) -------------------------------------------------- 2 --Index Only Scan using index_b on public.test Index Cond: (test.b = '1'::text) Targetlist Information (identified by plan id) -------------------------------------------------- 1 --Streaming (type: GATHER) Output: b Node/s: All datanodes 2 --Index Only Scan using index_b on public.test Output: b Distribute Key: a ====== Query Summary ===== ------------------------------- System available mem: 3358720KB Query Max mem: 3358720KB Query estimated mem: 1024KB (24 rows) |
优化方法:使用hint时正确指定indexscan和indexonlyscan。
场景六:全文检索GIN索引
为了加速文本搜索,进行全文检索时可以创建GIN索引:
1
|
CREATE INDEX idxb ON test using gin(to_tsvector('english',b)); |
创建GIN索引时,必须使用to_tsvector的两参数版本,并且只有当查询时也使用了两参数版本,且参数值与索引中相同时,才会使用该索引:
to_tsvector()函数有两个版本,只输入一个参数的版本和输入两个参数的版本。输入一个参数时,系统默认采用default_text_search_config所指定的分词器。创建索引时必须使用to_tsvector的两参数版本,否则索引内容可能不一致。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 |
explain verbose select * from test where to_tsvector(b) @@ to_tsquery('cat') order by 1; QUERY PLAN ----------------------------------------------------------------------------------------------- id | operation | E-rows | E-distinct | E-memory | E-width | E-costs ----+-----------------------------------+--------+------------+----------+---------+--------- 1 | -> Streaming (type: GATHER) | 2 | | | 44 | 22.23 2 | -> Sort | 2 | | 16MB | 44 | 14.23 3 | -> Seq Scan on public.test | 1 | | 1MB | 44 | 14.21 Predicate Information (identified by plan id) ----------------------------------------------------------- 3 --Seq Scan on public.test Filter: (to_tsvector(test.b) @@ '''cat'''::tsquery) Targetlist Information (identified by plan id) ---------------------------------------------- 1 --Streaming (type: GATHER) Output: a, b, c Merge Sort Key: test.a Node/s: All datanodes 2 --Sort Output: a, b, c Sort Key: test.a 3 --Seq Scan on public.test Output: a, b, c Distribute Key: a ====== Query Summary ===== ------------------------------- System available mem: 3358720KB Query Max mem: 3358720KB Query estimated mem: 1024KB (29 rows) |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 |
explain verbose select * from test where to_tsvector('english',b) @@ to_tsquery('cat') order by 1; QUERY PLAN ------------------------------------------------------------------------------------------------------- id | operation | E-rows | E-distinct | E-memory | E-width | E-costs ----+-------------------------------------------+--------+------------+----------+---------+--------- 1 | -> Streaming (type: GATHER) | 2 | | | 44 | 20.03 2 | -> Sort | 2 | | 16MB | 44 | 12.03 3 | -> Bitmap Heap Scan on public.test | 1 | | 1MB | 44 | 12.02 4 | -> Bitmap Index Scan | 1 | | 1MB | 0 | 8.00 Predicate Information (identified by plan id) --------------------------------------------------------------------------------------- 3 --Bitmap Heap Scan on public.test Recheck Cond: (to_tsvector('english'::regconfig, test.b) @@ '''cat'''::tsquery) 4 --Bitmap Index Scan Index Cond: (to_tsvector('english'::regconfig, test.b) @@ '''cat'''::tsquery) Targetlist Information (identified by plan id) ---------------------------------------------- 1 --Streaming (type: GATHER) Output: a, b, c Merge Sort Key: test.a Node/s: All datanodes 2 --Sort Output: a, b, c Sort Key: test.a 3 --Bitmap Heap Scan on public.test Output: a, b, c Distribute Key: a ====== Query Summary ===== ------------------------------- System available mem: 3358720KB Query Max mem: 3358720KB Query estimated mem: 2048KB (32 rows) |
优化方式:查询时使用to_tsvector的两参数版本,且保证参数值与索引中相同。