更新时间:2023-03-08 GMT+08:00

GaussDB(DWS)查询时索引失效场景解析

对表建立索引可提高数据库查询性能,但有时会出现建立了索引,但查询计划中却发现索引没有被使用的情况。针对这种情况,本文将列举几种常见的场景和优化方法。

场景一:返回结果集很大

以行存表的Seq Scan和Index Scan为例:

  • Seq Scan:就是按照表的记录的排列顺序从头到尾依次检索扫描,每次扫描要取到所有的记录。这也是最简单最基础的扫表方式,扫描的代价比较大。
  • Index Scan:对于给定的查询,先扫描一遍索引,从索引中找到符合要求的记录的位置(指针),再定位到表中具体的Page去获取,即先走索引,再读表数据。

因此,根据两种扫描方式的特点可知,多数情况下,Index Scan要比Seq Scan快。但是如果获取的结果集占所有数据的比重很大时(超过70%),这时Index Scan因为要先扫描索引再读表数据反而不如直接全表扫描的速度快。

场景二:未及时ANALYZE

ANALYZE更新表的统计信息,如果表未执行ANALYZE或最近一次执行完ANALYZE后表进行过数据量较大的增删操作,会导致统计信息不准,场景下也可能导致查询表时没有使用索引。

优化方法:对表执行ANALYZE更新统计信息。

场景三:过滤条件使用了函数或隐式类型转化

如果在过滤条件中使用了计算、函数、隐式类型转化,都可能导致无法选择索引。

例如,创建表,在a,b,c三列上都分别创建了索引。

1
create table test(a int, b text, c date); 
  • 在索引字段进行计算操作。

    从下面的执行结果可以看出:where a = 101,where a = 102 - 1都使用了a列上的索引,但是where a + 1 = 102没有使用索引。

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    explain verbose select * from test where a  = 101;
                                                     QUERY PLAN
    ------------------------------------------------------------------------------------------------------------
      id |                   operation                    | E-rows | E-distinct | E-memory | E-width | E-costs
     ----+------------------------------------------------+--------+------------+----------+---------+---------
       1 | ->  Streaming (type: GATHER)                   |      1 |            |          |      44 | 16.27
       2 |    ->  Index Scan using index_a on public.test |      1 |            | 1MB      |      44 | 8.27
    
     Predicate Information (identified by plan id)
     ---------------------------------------------
       2 --Index Scan using index_a on public.test
             Index Cond: (test.a = 101)
    
     Targetlist Information (identified by plan id)
     ----------------------------------------------
       1 --Streaming (type: GATHER)
             Output: a, b, c
             Node/s: dn_6005_6006
       2 --Index Scan using index_a on public.test
             Output: a, b, c
             Distribute Key: a
    
       ====== Query Summary =====
     -------------------------------
     System available mem: 3358720KB
     Query Max mem: 3358720KB
     Query estimated mem: 1024KB
    (24 rows)
    
     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
     explain verbose select * from test where a  = 102 - 1;
                                                     QUERY PLAN
    ------------------------------------------------------------------------------------------------------------
      id |                   operation                    | E-rows | E-distinct | E-memory | E-width | E-costs
     ----+------------------------------------------------+--------+------------+----------+---------+---------
       1 | ->  Streaming (type: GATHER)                   |      1 |            |          |      44 | 16.27
       2 |    ->  Index Scan using index_a on public.test |      1 |            | 1MB      |      44 | 8.27
    
     Predicate Information (identified by plan id)
     ---------------------------------------------
       2 --Index Scan using index_a on public.test
             Index Cond: (test.a = 101)
    
     Targetlist Information (identified by plan id)
     ----------------------------------------------
       1 --Streaming (type: GATHER)
             Output: a, b, c
             Node/s: dn_6005_6006
       2 --Index Scan using index_a on public.test
             Output: a, b, c
             Distribute Key: a
    
       ====== Query Summary =====
     -------------------------------
     System available mem: 3358720KB
     Query Max mem: 3358720KB
     Query estimated mem: 1024KB
    (24 rows)
    
     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    explain verbose select * from test where a + 1 = 102;
                                             QUERY PLAN
    --------------------------------------------------------------------------------------------
      id |           operation            | E-rows | E-distinct | E-memory | E-width | E-costs
     ----+--------------------------------+--------+------------+----------+---------+---------
       1 | ->  Streaming (type: GATHER)   |      1 |            |          |      44 | 22.21
       2 |    ->  Seq Scan on public.test |      1 |            | 1MB      |      44 | 14.21
    
     Predicate Information (identified by plan id)
     ---------------------------------------------
       2 --Seq Scan on public.test
             Filter: ((test.a + 1) = 102)
    
     Targetlist Information (identified by plan id)
     ----------------------------------------------
       1 --Streaming (type: GATHER)
             Output: a, b, c
             Node/s: All datanodes
       2 --Seq Scan on public.test
             Output: a, b, c
             Distribute Key: a
    
       ====== Query Summary =====
     -------------------------------
     System available mem: 3358720KB
     Query Max mem: 3358720KB
     Query estimated mem: 1024KB
    (24 rows)
    

    优化方式:尽量使用常量代替表达式,或者常量计算尽量写在等号的右侧。

  • 在索引字段上使用函数。

    从下面的执行结果可以看出:在索引列上使用函数也会导致无法选择索引。

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    explain verbose select * from test where to_char(c, 'yyyyMMdd') = to_char(CURRENT_DATE,'yyyyMMdd');
                                                                     QUERY PLAN
    --------------------------------------------------------------------------------------------------------------------------------------------
      id |           operation            | E-rows | E-distinct | E-memory | E-width | E-costs
     ----+--------------------------------+--------+------------+----------+---------+---------
       1 | ->  Streaming (type: GATHER)   |      1 |            |          |      44 | 22.28
       2 |    ->  Seq Scan on public.test |      1 |            | 1MB      |      44 | 14.28
    
                                                   Predicate Information (identified by plan id)
     ------------------------------------------------------------------------------------------------------------------------------------------
       2 --Seq Scan on public.test
             Filter: (to_char(test.c, 'yyyyMMdd'::text) = to_char(('2022-11-30'::pg_catalog.date)::timestamp with time zone, 'yyyyMMdd'::text))
    
     Targetlist Information (identified by plan id)
     ----------------------------------------------
       1 --Streaming (type: GATHER)
             Output: a, b, c
             Node/s: All datanodes
       2 --Seq Scan on public.test
             Output: a, b, c
             Distribute Key: a
    
       ====== Query Summary =====
     -------------------------------
     System available mem: 3358720KB
     Query Max mem: 3358720KB
     Query estimated mem: 1024KB
    (24 rows)
    
     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    explain verbose select * from test where c = current_date;
                                                     QUERY PLAN
    ------------------------------------------------------------------------------------------------------------
      id |                   operation                    | E-rows | E-distinct | E-memory | E-width | E-costs
     ----+------------------------------------------------+--------+------------+----------+---------+---------
       1 | ->  Streaming (type: GATHER)                   |      1 |            |          |      44 | 16.27
       2 |    ->  Index Scan using index_c on public.test |      1 |            | 1MB      |      44 | 8.27
    
            Predicate Information (identified by plan id)
     ------------------------------------------------------------
       2 --Index Scan using index_c on public.test
             Index Cond: (test.c = '2022-11-30'::pg_catalog.date)
    
     Targetlist Information (identified by plan id)
     ----------------------------------------------
       1 --Streaming (type: GATHER)
             Output: a, b, c
             Node/s: All datanodes
       2 --Index Scan using index_c on public.test
             Output: a, b, c
             Distribute Key: a
    
       ====== Query Summary =====
     -------------------------------
     System available mem: 3358720KB
     Query Max mem: 3358720KB
     Query estimated mem: 1024KB
    (24 rows)
    

    优化方法:尽量减少索引列上没有必要的函数调用。

  • 数据类型隐式转化。

    此类场景较常见,例如字段b的类型是text,过滤条件是where b = 2,在生成计划时,text类型会隐式转化为bigint类型,实际的过滤条件变成where b::bigint = 2,导致b列上的索引失效。

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    explain verbose select * from test where b = 2;
                                             QUERY PLAN
    --------------------------------------------------------------------------------------------
      id |           operation            | E-rows | E-distinct | E-memory | E-width | E-costs
     ----+--------------------------------+--------+------------+----------+---------+---------
       1 | ->  Streaming (type: GATHER)   |      1 |            |          |      44 | 22.21
       2 |    ->  Seq Scan on public.test |      1 |            | 1MB      |      44 | 14.21
    
     Predicate Information (identified by plan id)
     ---------------------------------------------
       2 --Seq Scan on public.test
             Filter: ((test.b)::bigint = 2)
    
     Targetlist Information (identified by plan id)
     ----------------------------------------------
       1 --Streaming (type: GATHER)
             Output: a, b, c
             Node/s: All datanodes
       2 --Seq Scan on public.test
             Output: a, b, c
             Distribute Key: a
    
       ====== Query Summary =====
     -------------------------------
     System available mem: 3358720KB
     Query Max mem: 3358720KB
     Query estimated mem: 1024KB
    (24 rows)
    
     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    explain verbose select * from test where b = '2';
                                                     QUERY PLAN
    ------------------------------------------------------------------------------------------------------------
      id |                   operation                    | E-rows | E-distinct | E-memory | E-width | E-costs
     ----+------------------------------------------------+--------+------------+----------+---------+---------
       1 | ->  Streaming (type: GATHER)                   |      1 |            |          |      44 | 16.27
       2 |    ->  Index Scan using index_b on public.test |      1 |            | 1MB      |      44 | 8.27
    
     Predicate Information (identified by plan id)
     ---------------------------------------------
       2 --Index Scan using index_b on public.test
             Index Cond: (test.b = '2'::text)
    
     Targetlist Information (identified by plan id)
     ----------------------------------------------
       1 --Streaming (type: GATHER)
             Output: a, b, c
             Node/s: All datanodes
       2 --Index Scan using index_b on public.test
             Output: a, b, c
             Distribute Key: a
    
       ====== Query Summary =====
     -------------------------------
     System available mem: 3358720KB
     Query Max mem: 3358720KB
     Query estimated mem: 1024KB
    (24 rows)
    

    优化方法:索引条件上的常量尽可能使用和索引列相同类型的常量,避免发生隐式类型转化。

场景四:用nestloop + indexscan代替hashjoin

此类语句的特征是两个表关联的时候,其中一个表上where条件过滤之后的结果集行数很小,同时,最终满足条件的结果集行数也很小。此时,使用nestloop+indexscan的效果往往要优于hashjoin。较优的执行计划如下:

可以看到第5层的Index Cond: (t1.b = t2.b)已经把join条件下推到了基表扫描上。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
explain verbose select t1.a,t1.b from t1,t2 where t1.b=t2.b and t2.a=4;
 id |                    operation                     | E-rows | E-distinct | E-memory | E-width | E-costs 
----+--------------------------------------------------+--------+------------+----------+---------+---------
  1 | ->  Streaming (type: GATHER)                     |     26 |            |          |       8 | 17.97
  2 |    ->  Nested Loop (3,5)                         |     26 |            | 1MB      |       8 | 11.97
  3 |       ->  Streaming(type: BROADCAST)             |      2 |            | 2MB      |       4 | 2.78
  4 |          ->  Seq Scan on public.t2               |      1 |            | 1MB      |       4 | 2.62
  5 |       ->  Index Scan using t1_b_idx on public.t1 |     26 |            | 1MB      |       8 | 9.05
(5 rows)

 Predicate Information (identified by plan id) 
-----------------------------------------------
   4 --Seq Scan on public.t2
         Filter: (t2.a = 4)
   5 --Index Scan using t1_b_idx on public.t1
         Index Cond: (t1.b = t2.b)
(4 rows)

 Targetlist Information (identified by plan id) 
------------------------------------------------
   1 --Streaming (type: GATHER)
         Output: t1.a, t1.b
         Node/s: All datanodes
   2 --Nested Loop (3,5)
         Output: t1.a, t1.b
   3 --Streaming(type: BROADCAST)
         Output: t2.b
         Spawn on: datanode2
         Consumer Nodes: All datanodes
   4 --Seq Scan on public.t2
         Output: t2.b
         Distribute Key: t2.a
   5 --Index Scan using t1_b_idx on public.t1
         Output: t1.a, t1.b
         Distribute Key: t1.a
(15 rows)

   ====== Query Summary =====    
---------------------------------
 System available mem: 9262694KB
 Query Max mem: 9471590KB
 Query estimated mem: 5144KB
(3 rows)

如果优化器没有选择这种执行计划,可以通过以下方式优化:

1
2
3
set enable_index_nestloop = on;
set enable_hashjoin = off;
set enable_seqscan = off;

场景五:使用hint指定索引时指定的索引方式不对

GaussDB(DWS)的plan hint当前支持指定的Scan方式有三种:tablescan、indexscan和indexonlyscan。

  • tablescan:全表扫描,比如行存表的Seq Scan,列存表的CStore Scan。
  • indexscan:先扫索引,再根据索引取表记录。
  • indexonlyscan:覆盖索引扫描,所需的返回结果能被所扫描的索引全部覆盖。与index scan相比,index only scan所包含的字段集合,涵盖了我们查询语句中的字段,这样,提取出相应的index ,就不必再根据索引取表记录了。

因此,对于需要indexonlyscan的场景,如果hint指定了indexscan,该hint是无法生效的:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
explain verbose select/*+ indexscan(test)*/ b from test where b = '1';
WARNING:  unused hint: IndexScan(test)
                                                   QUERY PLAN
-----------------------------------------------------------------------------------------------------------------
  id |                      operation                      | E-rows | E-distinct | E-memory | E-width | E-costs
 ----+-----------------------------------------------------+--------+------------+----------+---------+---------
   1 | ->  Streaming (type: GATHER)                        |      1 |            |          |      32 | 16.27
   2 |    ->  Index Only Scan using index_b on public.test |      1 |            | 1MB      |      32 | 8.27

   Predicate Information (identified by plan id)
 --------------------------------------------------
   2 --Index Only Scan using index_b on public.test
         Index Cond: (test.b = '1'::text)

   Targetlist Information (identified by plan id)
 --------------------------------------------------
   1 --Streaming (type: GATHER)
         Output: b
         Node/s: All datanodes
   2 --Index Only Scan using index_b on public.test
         Output: b
         Distribute Key: a

   ====== Query Summary =====
 -------------------------------
 System available mem: 3358720KB
 Query Max mem: 3358720KB
 Query estimated mem: 1024KB
(24 rows)
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
explain verbose select/*+ indexonlyscan(test)*/ b from test where b = '1';
                                                   QUERY PLAN
-----------------------------------------------------------------------------------------------------------------
  id |                      operation                      | E-rows | E-distinct | E-memory | E-width | E-costs
 ----+-----------------------------------------------------+--------+------------+----------+---------+---------
   1 | ->  Streaming (type: GATHER)                        |      1 |            |          |      32 | 16.27
   2 |    ->  Index Only Scan using index_b on public.test |      1 |            | 1MB      |      32 | 8.27

   Predicate Information (identified by plan id)
 --------------------------------------------------
   2 --Index Only Scan using index_b on public.test
         Index Cond: (test.b = '1'::text)

   Targetlist Information (identified by plan id)
 --------------------------------------------------
   1 --Streaming (type: GATHER)
         Output: b
         Node/s: All datanodes
   2 --Index Only Scan using index_b on public.test
         Output: b
         Distribute Key: a

   ====== Query Summary =====
 -------------------------------
 System available mem: 3358720KB
 Query Max mem: 3358720KB
 Query estimated mem: 1024KB
(24 rows)

优化方法:使用hint时正确指定indexscan和indexonlyscan。

场景六:全文检索gin索引

为了加速文本搜索,进行全文检索时可以创建GIN索引:

1
CREATE INDEX idxb ON test using gin(to_tsvector('english',b));

创建索引时,必须使用to_tsvector的两参数版本,并且只有当查询时也使用了两参数版本,且参数值与索引中相同时,才会使用该索引:

to_tsvector()函数有两个版本,只输一个参数的版本和输两个参数的版本。只输一个参数时,系统默认采用default_text_search_config所指定的分词器。创建索引时必须使用to_tsvector的两参数版本,否则索引内容可能不一致。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
explain verbose select  * from test where to_tsvector(b) @@ to_tsquery('cat') order by 1;
                                          QUERY PLAN
-----------------------------------------------------------------------------------------------
  id |             operation             | E-rows | E-distinct | E-memory | E-width | E-costs
 ----+-----------------------------------+--------+------------+----------+---------+---------
   1 | ->  Streaming (type: GATHER)      |      2 |            |          |      44 | 22.23
   2 |    ->  Sort                       |      2 |            | 16MB     |      44 | 14.23
   3 |       ->  Seq Scan on public.test |      1 |            | 1MB      |      44 | 14.21

        Predicate Information (identified by plan id)
 -----------------------------------------------------------
   3 --Seq Scan on public.test
         Filter: (to_tsvector(test.b) @@ '''cat'''::tsquery)

 Targetlist Information (identified by plan id)
 ----------------------------------------------
   1 --Streaming (type: GATHER)
         Output: a, b, c
         Merge Sort Key: test.a
         Node/s: All datanodes
   2 --Sort
         Output: a, b, c
         Sort Key: test.a
   3 --Seq Scan on public.test
         Output: a, b, c
         Distribute Key: a

   ====== Query Summary =====
 -------------------------------
 System available mem: 3358720KB
 Query Max mem: 3358720KB
 Query estimated mem: 1024KB
(29 rows)
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
explain verbose select  * from test where to_tsvector('english',b) @@ to_tsquery('cat') order by 1;
                                              QUERY PLAN
-------------------------------------------------------------------------------------------------------
  id |                 operation                 | E-rows | E-distinct | E-memory | E-width | E-costs
 ----+-------------------------------------------+--------+------------+----------+---------+---------
   1 | ->  Streaming (type: GATHER)              |      2 |            |          |      44 | 20.03
   2 |    ->  Sort                               |      2 |            | 16MB     |      44 | 12.03
   3 |       ->  Bitmap Heap Scan on public.test |      1 |            | 1MB      |      44 | 12.02
   4 |          ->  Bitmap Index Scan            |      1 |            | 1MB      |       0 | 8.00

                      Predicate Information (identified by plan id)
 ---------------------------------------------------------------------------------------
   3 --Bitmap Heap Scan on public.test
         Recheck Cond: (to_tsvector('english'::regconfig, test.b) @@ '''cat'''::tsquery)
   4 --Bitmap Index Scan
         Index Cond: (to_tsvector('english'::regconfig, test.b) @@ '''cat'''::tsquery)

 Targetlist Information (identified by plan id)
 ----------------------------------------------
   1 --Streaming (type: GATHER)
         Output: a, b, c
         Merge Sort Key: test.a
         Node/s: All datanodes
   2 --Sort
         Output: a, b, c
         Sort Key: test.a
   3 --Bitmap Heap Scan on public.test
         Output: a, b, c
         Distribute Key: a

   ====== Query Summary =====
 -------------------------------
 System available mem: 3358720KB
 Query Max mem: 3358720KB
 Query estimated mem: 2048KB
(32 rows)

优化方式:查询时使用to_tsvector的两参数版本,且保证参数值与索引中相同。