CREATE GLOBAL INDEX

Description

A global secondary index (GSI) is created on a specified table.

The GSIs allow users to define indexes that are inconsistent with the distribution of base tables. In this way, single-node plans for querying non-distributed columns of base tables and unique/primary key constraints on non-distributed columns of base tables are achieved.

You are advised to create GSIs on a base table as common indexes to improve IUD execution efficiency. In addition, after the GSIs are created on the base table, the IUD performance is affected and deteriorates.

Precautions

Similar to the base table constraints, the distribution key of GSIs does not support the UPDATE or MERGE INTO operation.
The GSI can be created only in GTM-lite mode. If a GSI is created in other GTM modes, an error is reported.
Astore does not support the creation of UB-tree other than GSI and does not support the creation of partitions for GSI.
The GSI whose distribution is the same as that of the base table cannot be created. Otherwise, an error will be reported during the execution.
CREATE GSI CONCURRENTLY can be executed only for Ustore tables. If CREATE GSI CONCURRENTLY is executed for Astore tables, a syntax error is reported. CREATE GSI CONCURRENTLY cannot be executed for expression indexes and partial indexes. Otherwise, a syntax error is reported. Online GSI rebuild is not supported.
Hash-based GSIs can be created for row-store Astore tables, Ustore tables, partitioned tables, hash bucket tables, and segment-page tables whose base tables are hash-based. Base tables cannot be replica,, level-2 partitioned tables, or list/range distribution tables. GSIs support hash-based distribution only.
GSIs cannot be created for a base table when any column name, such as ctid, xc_node_hash, xmin, xmax, tableoid (for a partitioned table), and tablebucketid (for a hash bucket table) of the base table is duplicate with an existing column name after it is suffixed with "_new$$" or "_NEW$$."
If the base table is a partitioned table that is not a hash bucket, the GSI allows a maximum of 27 columns. If the base table is a non-partitioned table that is a hash bucket, the GSI allows a maximum of 27 columns. If the base table is a partitioned hash bucket table, the GSI allows a maximum of 26 columns. If the base table is a non-partitioned table that is not a hash bucket, the GSI allows a maximum of 28 columns (including index keys and distribution keys).
For GSI creation (offline), GSI rebuild, and operations related to GSI rebuild, for example, partitioning operations (such as DROP, TRUNCATE, MERGE, SPLIT and EXCHANGE PARTITION) on a partitioned table, UPDATE DISTRIBUTED GLOBAL INDEX, ALTER TABLE involving data rebuild, as well as ALTER SET TABLESPACE and MOVE PARTITION operations on a hash bucket table, you are advised to enable the STREAM mode to achieve optimal performance. To enable the STREAM mode, set enable_stream_operator to ON and create_gsi_opt to build.
UPSERT is not supported. IUD returning is not supported in the base table where a GSI is created.
If the VACUUM FULL, CLUSTER, or REINDEX operation on a table is interrupted, the GSI on the table may turn to the UNUSABLE state. In this case, an error will be reported when the GSI is queried. You are advised to run REINDEX INDEX to rebuild the GSI.
When executing COPY or GDS to import data to a base table with a GSI, the enable_stream_operator parameter must be enabled to achieve optimal data import performance.
The following operations will invalidate GSIs: database-level REINDEX, database-level/partition-level CLUSTER, ALTER TABLE PARTITION (DROP, TRUNCATE, MERGE, SPLIT, and EXCHANGE PARTITION). If UPDATE DISTRIBUTED GLOBAL INDEX is not specified, all GSIs on the partitioned table will be invalidated. If UPDATE DISTRIBUTED GLOBAL INDEX is not specified for EXCHANGE PARTITION, all GSIs on the ordinary table will be invalidated.
TABLE ACCESS BY INDEX ROWID is based on STREAM and inherits STREAM-related constraints. Considering the STREAM communication latency, when the selectivity is too low or the number of predicate hit rows is small, the performance is not optimal. In this case, you are advised not to use TABLE ACCESS BY INDEX ROWID. Instead, you are advised to use common indexes.
In the INSERT INTO SELECT scenario, you are advised to enable the enable_stream_operator parameter to insert and execute the STREAM plan. (If the base table is a segment-page table, hash bucket table, or tamper-proof table, the CN plan is still used.) If enable_stream_operator is disabled, the execution plan goes back to the CN, and the performance is poor (similar to the performance of creating GSIs in version 503.1.0).
For INSERT, UPDATE, and DELETE, the execution plan is executed in distributed mode, which causes performance loss. In batch UPDATE/DELETE scenarios, the execution plan goes back to the CN, which causes poor performance.
GSIs can contain expressions but have the following constraints:
- Similar to the base table constraints, distribution keys cannot contain expressions. In addition, GSIs whose index column contains only an expression cannot be created either because the distribution key must be an expression. If such GSI is created, a syntax error is reported.
- Similar to common index constraints, the CONTAINING column cannot contain expressions. Otherwise, a syntax error will be reported.
- If any column name is prefixed with "expr", no GSI containing an expression can be created. Otherwise, a syntax error will be reported.

Syntax

CREATE GLOBAL [ UNIQUE ] INDEX [ CONCURRENTLY ] [ [schema_name.]index_name ] ON table_name [ USING method ]
    ({ { column_name [ ( length ) ] | ( expression ) } [ COLLATE collation ] [ opclass ] [ ASC | DESC ] [ NULLS { FIRST | LAST } ] }[, ...] ) 
    [CONTAINING (column_name [ ,... ])] [DISTRIBUTE BY hash(column_name [ ,... ])][ WITH ( {storage_parameter = value} [, ... ] )]
    [ TABLESPACE tablespace_name ] [ VISIBLE | INVISIBLE ] [ WHERE predicate];

Parameters

UNIQUE
Creates a unique index. In this way, the system checks whether new values are unique in the index column. If the inserted or updated value causes duplicate records, an error is reported.
CONCURRENTLY
Creates an index (with ShareUpdateExclusiveLock) in a mode that does not block DML statements. When an index is created, other statements cannot access the table on which the index depends. If this keyword is specified, DML statements are not blocked during the creation.
- The CREATE GLOBAL INDEX statement can be run within a transaction, but CREATE GLOBAL INDEX CONCURRENTLY cannot.
- The built-in stream accelerates creation optimization, which is controlled by the enable_stream_operator parameter. When this parameter is enabled, online creation is performed in stream mode, improving creation performance.
- An index may fail to be created online. The possible causes are as follows: The user instructs to terminate the online creation command. The definition of the index is modified. The definition of the base table is modified and the definition of the index is affected. When an index fails to be created online, an error message is displayed, and an unusable index is left. This index is ignored by queries, but it still consumes the IUD overhead. Therefore, you are advised to use the DROP INDEX IF EXISTS syntax to delete the index and create the index online, or use the REINDEX/REBUILD syntax to rebuild the index.
schema_name
Specifies the schema name.

Value range: an existing schema name.
index_name
Specifies the name of the index to be created. No schema name can be included here; the index is always created in the same schema as its parent table.

Value range: a string. It must comply with the naming convention.
table_name
Specifies the name of the table to be indexed (optionally schema-qualified).

Value range: an existing table name.
USING method
Specifies the name of the index method to be used.

Value range: UB-tree. The multi-version B-tree index is provided. The index page contains transaction information.
column_name
Specifies the name of the column on which an index is to be created.

If the index mode supports composite indexes, multiple columns can be declared. A maximum of 28 columns can be declared for a non-partitioned base table, and a maximum of 27 columns can be declared for a partitioned base table.
column_name ( length )
Specifies the supported prefix key. For details, see column_name ( length ).

As an expression index, the prefix index has the same constraints as the expression key in GSI.
expression
Specifies an expression based on one or more columns of the table. The expression usually must be written with surrounding parentheses, as shown in the syntax. However, the parentheses can be omitted if the expression has the form of a function call. The expression index can be used to obtain fast access to data based on some transformation of the basic data. For example, an index computed on upper(col) would allow the WHERE upper(col) = 'JIM' clause to use an index. If an expression contains the IS NULL clause, the index for this expression is invalid. In this case, you are advised to create a partial index.
COLLATE collation
Assigns a collation to the column (which must be of a collatable data type). If no collation is specified, the default collation is used. You can run the select * from pg_collation command to query collation rules from the pg_collation system catalog. The default collation rule is the row starting with default in the query result.
opclass
Specifies the name of an operator class. An operator class can be specified for each column of an index. The operator class identifies the operators to be used by the index for that column.
ASC
Specifies an ascending (default) sort order.
DESC
Specifies a descending sort order.
NULLS FIRST
Specifies that null values appear before non-null values in the sort ordering. This is the default when DESC is specified.
NULLS LAST
Specifies that null values appear after non-null values in the sort ordering. This is the default when DESC is not specified.
CONTAINING
Specifies the base table attributes other than index keys contained in the GSI.
CONTAINING (column_name [, ...])
Specifies the base table attributes other than the index keys contained in the GSI.
DISTRIBUTE BY
Specifies the distribution key of the GSI, which is different from that of the base table, and only the hash distribution can be used.
DISTRIBUTE BY hash(column_name [,...])
Specifies the distribution key of the GSI.

It needs to be included in the index key.
WITH ( {storage_parameter = value} [, ... ] )
Specifies the storage parameters of the GSI.
Value range:
- FILLFACTOR
  The fill factor of an index is a percentage between 10 and 100.
  
  Value range: 10–100
- STORAGE TYPE
  Specifies the GSI storage type.
  
  Value range: USTORE
- INDEXSPLIT
  Controls the method of selecting a proper split point for UB-tree, which is the same as that for B-tree.
  
  Value range: DEFAULT or INSERTPT
- enable_tde
  Specifies the index as an encryption index. The database automatically encrypts the data in the encrypted index before storing it. Before using this parameter, ensure that the TDE function has been enabled using the GUC parameter enable_tde, the information for accessing the key service has been set using the GUC parameter tde_key_info, and the enable_tde attribute has been set for the base table. For details about how to use this parameter, see section "Transparent Data Encryption" in Feature Guide. This parameter supports only B-tree and UB-tree indexes. Other indexes such as hash indexes are not supported.
  
  Value range: on and off If enable_tde is set to on, the values of key_type, tde_cmk_id, and dek_cipher are automatically generated in the database.
  
  Default value: off
- encrypt_algo
  Specifies the algorithm for encrypting indexes. This parameter must be used together with enable_tde.
  
  Value range: a string. The value can be AES_128_CTR or SM4_CTR.
  
  Default value: null if enable_tde is not set, or AES_128_CTR if enable_tde is set.
- dek_cipher
  Specifies the DEK ciphertext. After you set the enable_tde parameter for an index, the index automatically copies the dek_cipher parameter of the base table. This parameter cannot be set or modified.
  
  Value range: a string
  
  Default value: null
- key_type
  Specifies the type of the master key. After you set the enable_tde parameter for a table, the index automatically copies the key_type parameter of the base table. This parameter cannot be set or modified.
  
  Default value: null
- cmk_id
  Specifies the ID of the master key. After you set the enable_tde parameter for a table, the index automatically copies the cmk_id parameter of the base table. This parameter cannot be set or modified.
  
  Value range: a string
  
  Default value: null
TABLESPACE tablespace_name
Specifies the tablespace for an index. If no tablespace is specified, the default tablespace is used.

Value range: an existing table name.
VISIBLE
Sets the index to be visible. This is the default option.
- VISIBLE can be set only in the ORA-compatible database (that is, sql_compatibility = 'ORA').
- When disable_keyword_options is set to "visible", this keyword cannot be used.
- This keyword is not supported in the upgrade uncommitted phase.
INVISIBLE
Sets the index to be invisible.
- INVISIBLE can be set only in the ORA-compatible database (that is, sql_compatibility = 'ORA').
- When disable_keyword_options is set to "invisible", this keyword cannot be used.
- This keyword is not supported in the upgrade uncommitted phase.
WHERE predicate
Creates a partial index. A partial index contains entries for only a portion of a table, usually a portion that is more useful for indexing than the rest of the table. For example, if you have a table that contains both billed and unbilled orders where the unbilled orders take up a small portion of the total table and yet that is an often used portion, you can improve performance by creating an index on just that portion. In addition, the WHERE clause with a UNIQUE constraint can be used to enforce uniqueness over a subset of a table.

Value range: The predicate expression can only refer to columns of the underlying table, but it can use all columns, not just the ones being indexed. Currently, subqueries and aggregate expressions are forbidden in the WHERE clause. You are advised not to use a predicate of numeric types such as int, because such types can be implicitly converted to Boolean values (non-zero values are implicitly converted to true and 0 is implicitly converted to false), which may cause unexpected results.

Examples

    
         -- Create a base table test1 that contains three columns.
gaussdb=# CREATE TABLE test1(c1 int, c2 int, c3 int);

-- Create a GSI on the c2 column of the test table containing the c3 column based on the hash distribution of the c2 column.
gaussdb=# CREATE GLOBAL INDEX idx_gsi_1 ON test1(c2) CONTAINING(c3) DISTRIBUTE BY HASH(c2);

-- Create a base table test2 that contains three columns.
gaussdb=# CREATE TABLE test2(c1 int, c2 int, c3 int);

-- Create a GSI on the c2 column of the test2 table containing the c3 column based on the hash distribution of the c2 column.
gaussdb=# CREATE GLOBAL INDEX idx_gsi_2 ON test2(c2) CONTAINING(c3);

-- Create a base table test3 that contains three columns.
gaussdb=# CREATE TABLE test3(c1 int, c2 int, c3 int);

-- Create a GSI in UNIQUE form on the c2 column of the test3 table, which is based on the hash distribution of the c2 column by default.
gaussdb=# CREATE GLOBAL UNIQUE INDEX idx_gsi_3 ON test3(c2) DISTRIBUTE BY HASH(c2);

-- Delete the index.
gaussdb=# DROP INDEX idx_gsi_1;
gaussdb=# DROP INDEX idx_gsi_2;
gaussdb=# DROP INDEX idx_gsi_3;

-- Delete the table.
gaussdb=# DROP TABLE test1;
gaussdb=# DROP TABLE test2;
gaussdb=# DROP TABLE test3;