Help Center/ GaussDB/ Developer Guide(Distributed_V2.0-8.x)/ SQL Optimization/ Optimization Cases/ Case: Using Global Secondary Indexes to Accelerate Query
Updated on 2025-05-29 GMT+08:00

Case: Using Global Secondary Indexes to Accelerate Query

Index-only Scanning of Global Secondary Indexes

When non-distribution keys in the base table are queried, DN pushdown query or index scanning cannot be performed, deteriorating performance problems.

  • Scenario 1: common query
     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    gaussdb=# CREATE TABLE t (a INT, b INT, c INT) DISTRIBUTE BY HASH(a);
    CREATE TABLE
    gaussdb=# INSERT INTO t VALUES(generate_series(1,1000), generate_series(1,1000), generate_series(1,1000));
    INSERT 0 1000
    gaussdb=# CREATE INDEX i ON t(b);
    CREATE INDEX
    gaussdb=# SET MAX_DATANODE_FOR_PLAN = 1;
    SET
    gaussdb=# SET ENABLE_FAST_QUERY_SHIPPING = TRUE;
    SET
    gaussdb=# SET ENABLE_LIGHT_PROXY = TRUE;
    SET
    gaussdb=# SET ENABLE_STREAM_OPERATOR = FALSE;
    SET
    gaussdb=# -- Point query. The query condition is not a distribution key. Push down all DNs and use ordinary indexes for query on DNs.
    gaussdb=# EXPLAIN (VERBOSE ON, COSTS OFF)  SELECT b FROM t WHERE b = 1;
                         QUERY PLAN                     
    ----------------------------------------------------
     Data Node Scan
       Output: t.b
       Node/s: All datanodes
       Remote query: SELECT b FROM public.t WHERE b = 1
    
     Remote SQL: SELECT b FROM public.t WHERE b = 1
     Datanode Name: datanode1
       [Bypass]
       Index Only Scan using i on public.t
         Output: b
         Index Cond: (t.b = 1)
    
    (12 rows)
    
    gaussdb=#  -- Sequentially scan all DNs in a range query.
    gaussdb=# EXPLAIN (VERBOSE ON, COSTS OFF)  SELECT b FROM t WHERE b <= 10;
                          QUERY PLAN                      
    ------------------------------------------------------
     Data Node Scan
       Output: t.b
       Node/s: All datanodes
       Remote query: SELECT b FROM public.t WHERE b <= 10
    
     Remote SQL: SELECT b FROM public.t WHERE b <= 10
     Datanode Name: datanode1
       Seq Scan on public.t
         Output: b
         Filter: (t.b <= 10)
    
    (11 rows)
    
    You can create a global secondary index to push down the preceding query to DNs and use index scanning to accelerate the query. (To use the GSI to optimize query, enable ENABLE_GSISCAN parameter and ENABLE_FAST_QUERY_SHIPPING parameter.)
    gaussdb=# DROP INDEX i;
    DROP INDEX
    gaussdb=# -- Create a global secondary index.
    gaussdb=# CREATE GLOBAL INDEX gsi_t ON t(b);
    NOTICE:  The 'DISTRIBUTE BY' clause is not specified. Using 'b' as the distribution column by default.
    HINT:  Please use 'DISTRIBUTE BY' clause to specify suitable data distribution column.
    CREATE GLOBAL INDEX
    gaussdb=#  -- Point query. The query condition falls on the distribution key of the global secondary index. A single DN is pushed down and the index is used to scan the query.
    gaussdb=# EXPLAIN (VERBOSE ON, COSTS OFF) SELECT b FROM t WHERE b = 1;
                             QUERY PLAN                          
    -------------------------------------------------------------
     Data Node Scan
       Output: gsi_t.b
       Node/s: datanode1
       Remote query: SELECT b FROM ONLY public.gsi_t WHERE b = 1
    
     Remote SQL: SELECT b FROM ONLY public.gsi_t WHERE b = 1
     Datanode Name: datanode1
       [Bypass]
       Index Only Scan using gsi_t on public.gsi_t
         Output: b
         Index Cond: (gsi_t.b = 1)
    
    (12 rows)
    
    gaussdb=#  -- Range query. All DNs are pushed down and the index is used to scan the query.
    gaussdb=# EXPLAIN (VERBOSE ON, COSTS OFF) SELECT b FROM t WHERE b <= 10;
                              QUERY PLAN                           
    ---------------------------------------------------------------
     Data Node Scan
       Output: gsi_t.b
       Node/s: All datanodes
       Remote query: SELECT b FROM ONLY public.gsi_t WHERE b <= 10
    
     Remote SQL: SELECT b FROM ONLY public.gsi_t WHERE b <= 10
     Datanode Name: datanode1
       [Bypass]
       Index Only Scan using gsi_t on public.gsi_t
         Output: b
         Index Cond: (gsi_t.b <= 10)
    
    (12 rows)
  • Scenario 2: Multi-table join query.
    gaussdb=# DCREATE TABLE t1(c1 INT, c2 INT, c3 INT, c4 INT, c5 INT) DISTRIBUTE BY HASH(c1);
    CREATE TABLE
    gaussdb=# DINSERT INTO t1 VALUES (GENERATE_SERIES(1,10), GENERATE_SERIES(1,10), GENERATE_SERIES(1,10), GENERATE_SERIES(1,10), GENERATE_SERIES(1,10));
    INSERT 0 10
    gaussdb=# DCREATE TABLE t2(c1 INT, c2 INT, c3 INT, c4 INT, c5 INT) DISTRIBUTE BY HASH(c1);
    CREATE TABLE
    gaussdb=# DINSERT INTO t2 VALUES (3, GENERATE_SERIES(1,10), GENERATE_SERIES(1,10), GENERATE_SERIES(1,10), GENERATE_SERIES(1,10));
    INSERT 0 10
    gaussdb=#  -- Create an ordinary index.
    gaussdb=# CREATE INDEX i_t1 ON t1(c2);
    CREATE INDEX
    gaussdb=#  -- Join two tables. The STREAM plan is used for query and no DN is pushed down.
    gaussdb=# EXPLAIN (VERBOSE ON, COSTS OFF) SELECT t1.c2 FROM t1, t2 WHERE t1.c2 = t2.c1;
                               QUERY PLAN                            
    -----------------------------------------------------------------
     Streaming (type: GATHER)
       Output: t1.c2
       Node/s: All datanodes
       ->  Hash Join
             Output: t1.c2
             Hash Cond: (t2.c1 = t1.c2)
             ->  Seq Scan on public.t2
                   Output: t2.c1, t2.c2, t2.c3, t2.c4, t2.c5
                   Distribute Key: t2.c1
             ->  Hash
                   Output: t1.c2
                   ->  Streaming(type: REDISTRIBUTE)
                         Output: t1.c2
                         Distribute Key: t1.c2
                         Spawn on: All datanodes
                         Consumer Nodes: All datanodes
                         ->  Index Only Scan using i_t1 on public.t1
                               Output: t1.c2
                               Distribute Key: t1.c1
    (19 rows)

    You can create a global secondary index on the associated column to push down the query to a DN.

    gaussdb=# : Create a GSI.
    gaussdb=# CREATE GLOBAL INDEX gsi_t1 ON t1(c2);
    NOTICE:  The 'DISTRIBUTE BY' clause is not specified. Using 'c2' as the distribution column by default.
    HINT:  Please use 'DISTRIBUTE BY' clause to specify suitable data distribution column.
    CREATE GLOBAL INDEX
    gaussdb=#  -- Join two tables and use the GSI to push down all DNs.
    gaussdb=# EXPLAIN (VERBOSE ON, COSTS OFF) SELECT t1.c2 FROM t1, t2 WHERE t1.c2 = t2.c1;
                                             QUERY PLAN                                          
    ---------------------------------------------------------------------------------------------
     Data Node Scan
       Output: gsi_t1.c2
       Node/s: All datanodes
       Remote query: SELECT gsi_t1.c2 FROM ONLY public.gsi_t1, public.t2 WHERE gsi_t1.c2 = t2.c1
    
     Remote SQL: SELECT gsi_t1.c2 FROM ONLY public.gsi_t1, public.t2 WHERE gsi_t1.c2 = t2.c1
     Datanode Name: datanode1
       Hash Join
         Output: gsi_t1.c2
         Hash Cond: (t2.c1 = gsi_t1.c2)
         ->  Seq Scan on public.t2
               Output: t2.c1, t2.c2, t2.c3, t2.c4, t2.c5
         ->  Hash
               Output: gsi_t1.c2
               ->  Index Only Scan using gsi_t1 on public.gsi_t1
                     Output: gsi_t1.c2
    
    (17 rows)
  • Scenario 3: Subquery.
    gaussdb=# CREATE TABLE t3(c1 INT, c2 INT, c3 INT, c4 INT, c5 INT) DISTRIBUTE BY HASH(c1);
    CREATE TABLE
    gaussdb=# INSERT INTO t3 VALUES (GENERATE_SERIES(1,10), GENERATE_SERIES(1,10), GENERATE_SERIES(1,10), GENERATE_SERIES(1,10), GENERATE_SERIES(1,10));
    INSERT 0 10
    gaussdb=# CREATE TABLE t4(c1 INT, c2 INT, c3 INT, c4 INT, c5 INT) DISTRIBUTE BY HASH(c1);
    CREATE TABLE
    gaussdb=# INSERT INTO t4 VALUES (3, GENERATE_SERIES(1,10), GENERATE_SERIES(1,10), GENERATE_SERIES(1,10), GENERATE_SERIES(1,10));
    INSERT 0 10
    gaussdb=# EXPLAIN (VERBOSE ON, COSTS OFF) SELECT t4.c2 FROM t4, (SELECT c2 FROM t3) aa WHERE aa.c2 = t4.c1;
                     QUERY PLAN                  
    ---------------------------------------------
     Streaming (type: GATHER)
       Output: t4.c2
       Node/s: All datanodes
       ->  Hash Join
             Output: t4.c2
             Hash Cond: (t3.c2 = t4.c1)
             ->  Streaming(type: REDISTRIBUTE)
                   Output: t3.c2
                   Distribute Key: t3.c2
                   Spawn on: All datanodes
                   Consumer Nodes: All datanodes
                   ->  Seq Scan on public.t3
                         Output: t3.c2
                         Distribute Key: t3.c1
             ->  Hash
                   Output: t4.c2, t4.c1
                   ->  Seq Scan on public.t4
                         Output: t4.c2, t4.c1
                         Distribute Key: t4.c1
    (19 rows)

    You can create a global secondary index to push down the query to a DN. In this case, GSI force hint may be increased in the query to use the global secondary index.

    gaussdb=# CREATE GLOBAL INDEX gsi_t3 ON t3(c2) CONTAINING(c3, c4) DISTRIBUTE BY HASH(c2);
    CREATE GLOBAL INDEX
    gaussdb=#  -- Use GSI force hint to push down all DNs.
    gaussdb=# -- GSI force hint is not applicable and pushdown cannot be performed.
    gaussdb=# EXPLAIN (VERBOSE ON, COSTS OFF) SELECT t4.c2 FROM t4, (SELECT c2 FROM t3) aa WHERE aa.c2 = t4.c1;
                             QUERY PLAN                          
    -------------------------------------------------------------
     Streaming (type: GATHER)
       Output: t4.c2
       Node/s: All datanodes
       ->  Hash Join
             Output: t4.c2
             Hash Cond: (t4.c1 = t3.c2)
             ->  Seq Scan on public.t4
                   Output: t4.c1, t4.c2, t4.c3, t4.c4, t4.c5
                   Distribute Key: t4.c1
             ->  Hash
                   Output: t3.c2
                   ->  Index Only Scan using gsi_t3 on public.t3
                         Output: t3.c2
                         Distribute Key: t3.c2
    (14 rows)
    
    gaussdb=#  -- Use GSI force hint to push down all DNs.
    gaussdb=# EXPLAIN (VERBOSE ON, COSTS OFF) SELECT /*+gsi()*/ t4.c2 FROM t4, (SELECT c2 FROM t3) aa WHERE aa.c2 = t4.c1;
                                                            QUERY PLAN                                                         
    ---------------------------------------------------------------------------------------------------------------------------
     Data Node Scan
       Output: t4.c2
       Node/s: All datanodes
       Remote query: SELECT/*+ gsi()*/ t4.c2 FROM public.t4, (SELECT gsi_t3.c2 FROM ONLY public.gsi_t3) aa WHERE aa.c2 = t4.c1
    
     Remote SQL: SELECT/*+ gsi()*/ t4.c2 FROM public.t4, (SELECT gsi_t3.c2 FROM ONLY public.gsi_t3) aa WHERE aa.c2 = t4.c1
     Datanode Name: datanode1
       Hash Join
         Output: t4.c2
         Hash Cond: (t4.c1 = gsi_t3.c2)
         ->  Seq Scan on public.t4
               Output: t4.c1, t4.c2, t4.c3, t4.c4, t4.c5
         ->  Hash
               Output: gsi_t3.c2
               ->  Index Only Scan using gsi_t3 on public.gsi_t3
                     Output: gsi_t3.c2
    
    (17 rows)

Global Secondary Index Scanning for Table Access by Index Row ID

  • STREAM distributed plan: Create a base table of the global secondary index. During query, the global secondary index needs to be returned to the table. If the predicate hits a large number of rows (for example, range query), the STREAM distributed plan is preferred. Example:
    gaussdb=#  -- Enable the STEAM mode.
    gaussdb=#  SET ENABLE_STREAM_OPERATOR = ON;
    SET
    gaussdb=#  SET EXPLAIN_PERF_MODE=PRETTY;
    SET
    gaussdb=#  CREATE TABLE test(c1 INT, c2 INT, c3 INT);
    NOTICE:  The 'DISTRIBUTE BY' clause is not specified. Using 'c1' as the distribution column by default.
    HINT:  Please use 'DISTRIBUTE BY' clause to specify suitable data distribution column.
    CREATE TABLE
    gaussdb=# -- Create a global secondary index on column c2 for the test table.
    gaussdb=#  CREATE GLOBAL INDEX gsi_test ON test(c2);
    NOTICE:  The 'DISTRIBUTE BY' clause is not specified. Using 'c2' as the distribution column by default.
    HINT:  Please use 'DISTRIBUTE BY' clause to specify suitable data distribution column.
    CREATE GLOBAL INDEX
    gaussdb=#  -- Use the global secondary index for STREAM distributed plan.
    gaussdb=#  EXPLAIN(VERBOSE ON, COSTS OFF) SELECT /*+ gsitable(test gsi_test)*/ * FROM test WHERE c2 > 1;
    WARNING:  Statistics in some tables or columns(public.test.c1, public.test.c2) are not collected.
    HINT:  Do analyze for them in order to generate optimized plan.
     id |                        operation                         
    ----+----------------------------------------------------------
      1 | ->  Streaming (type: GATHER)
      2 |    ->  GSI Tid Scan on public.test
      3 |       ->  Streaming(type: REDISTRIBUTE GSI)
      4 |          ->  GSI Only Scan using gsi_test on public.test
    (4 rows)
       Predicate Information (identified by plan id)   
    ---------------------------------------------------
       4 --GSI Only Scan using gsi_test on public.test
             Index Cond: (test.c2 > 1)
    (2 rows)
      Targetlist Information (identified by plan id)   
    ---------------------------------------------------
       1 --Streaming (type: GATHER)
             Output: c1, c2, c3
             Node/s: All datanodes
       2 --GSI Tid Scan on public.test
             Output: c1, c2, c3
             Distribute Key: c1
       3 --Streaming(type: REDISTRIBUTE GSI)
             Output: xc_node_hash, ctid
             Spawn on: All datanodes
             Consumer Nodes: All datanodes
       4 --GSI Only Scan using gsi_test on public.test
             Output: xc_node_hash, ctid
             Distribute Key: c2
    (13 rows)
  • Non-STREAM distributed plan: Create a base table of the global secondary index. During query, the global secondary index needs to be returned to the table. If the predicate hits a few rows (for example, point query), the non-STREAM distributed plan is preferred. Example:
    gaussdb=#  -- Disable the STEAM mode.
    gaussdb=#  SET ENABLE_STREAM_OPERATOR = OFF;
    SET
    gaussdb=#  CREATE TABLE test(c1 INT, c2 INT, c3 INT);
    NOTICE:  The 'DISTRIBUTE BY' clause is not specified. Using 'c1' as the distribution column by default.
    HINT:  Please use 'DISTRIBUTE BY' clause to specify suitable data distribution column.
    CREATE TABLE
    gaussdb=# -- Create a global secondary index on column c2 for the test table.
    gaussdb=#  CREATE GLOBAL INDEX gsi_test ON test(c2);
    NOTICE:  The 'DISTRIBUTE BY' clause is not specified. Using 'c2' as the distribution column by default.
    HINT:  Please use 'DISTRIBUTE BY' clause to specify suitable data distribution column.
    CREATE GLOBAL INDEX
    gaussdb=#  -- Use the global secondary index for non-STREAM distributed plan.
    gaussdb=#  EXPLAIN(VERBOSE ON, COSTS OFF) SELECT /*+ gsitable(test gsi_test)*/ * FROM test WHERE c2 = 1;
                                        QUERY PLAN                                     
    -----------------------------------------------------------------------------------
     Select on public.test
       Output: test.c1, test.c2, test.c3
       Node/s: All datanodes
       Remote query: SELECT c1, c2, c3 FROM ONLY public.test WHERE ctid = $2
       ->  Data Node Scan on gsi_test "REMOTE_TABLE_QUERY"
             Output: gsi_test.xc_node_hash, gsi_test.ctid
             Node/s: datanode1
             Remote query: SELECT xc_node_hash, ctid FROM public.gsi_test WHERE c2 = 1
    (8 rows)