CREATE GLOBAL INDEX

Description

Creates a global secondary index (GSI) on a specified table.

The GSIs allow users to define indexes that are inconsistent with the distribution of base tables. In this way, single-node plans for querying non-distributed columns of base tables and unique/primary key constraints on non-distributed columns of base tables are achieved.

GSI creation is not allowed during the upgrade uncommitted observation.
You are advised to create a NOT NULL constraint on the index key of at least one column of GSI to improve IUD performance.
You are advised to create common indexes in the GSI base table to improve IUD execution efficiency.
The execution plan generated when a GSI is created is to pull the base table data back to the CN and then deliver the data to the DN where the GSI is located. Therefore, the performance of a large-scale cluster with a large amount of data is much worse than that of using a common index. For example, it may take about one hour to create a GSI with tens of millions of data records on six SSDs. The GSI creation mechanism will be optimized in later versions and the current single point will be changed to parallel DNs to improve the creation efficiency.

Precautions

Similar to the base table constraints, the distribution keys of GSIs do not support the UPDATE operation.
The GSI can be created only in GTM-LITE mode. If a GSI is created in other GTM modes, an error is reported.
Common UB-tree indexes can be created. Expression indexes and some other indexes are not supported. Astore does not support UB-tree indexes other than GSIs and does not support partition creation for GSIs.
The GSI whose distribution is the same as that of the base table cannot be created. Otherwise, an error will be reported during the execution.
Online GSI creation or rebuild is not supported and a syntax error will be reported. PARALLEL is not supported, and parallel_workers will be set to 0 on the DN.
Hash-distributed GSIs can be created for row-store Astore tables and partitioned tables whose base tables are hash-distributed. The base tables cannot be replicate, hash bucket, list/range distribution, or Ustore table. GSIs of these tables do not support distribution other than hash distribution.
If the base table is a partitioned table, the GSI supports a maximum of 27 columns. If the base table is not a partitioned table, the GSI supports a maximum of 28 columns (including the index key and distribution key).
UPSERT, IUD returning on base tables with GSIs, TABLE ACCESS BY INDEX ROWID, MERGE INTO, CLUSTER, and SQL PATCH are not supported.
Operations that will invalidate the GSIs in the current version:
- VACUUM FULL
  - VACUUM FULL on a single table: All GSIs on the table are invalidated.
  - VACUUM FULL on a database: All GSIs in the database are invalidated.
- REINDEX
  - REINDEX on a single table (offline): All GSIs on the table are invalidated.
  - REINDEX on a database (offline): All GSIs in the database are invalidated.
- CLUSTER
  - CLUSTER on a single table using a common index: All GSIs on the table are invalidated.
  - CLUSTER on a single table: All GSIs on the table are invalidated.
  - CLUSTER on a database: All GSIs on the tables that have been clustered in the database are invalidated.
- COPY/GDS
  All GSIs on the table are invalidated.
- PARTITION
  MERGE PARTITION, EXCHANGE PARTITION, TRUNCATE PARTITION, DROP PARTITION and SPLIT PARTITION invalidate all GSIs on a partitioned table. EXCHANGE PARTITION invalidates all GSIs on an ordinary table.
For batch scenarios such as INSERT INTO SELECT and UPDATE/DELETE, the execution plan goes back to the CN, and the performance is poor (similar to the performance of GSI creation).
For INSERT, UPDATE, and DELETE, distributed execution plans are used, which may cause performance loss.
If _new$$ or _NEW$$ is added to a column name or ctid, xc_node_hash, xmin, xmax or tableoid (when the base table is a partitioned table), a GSI fails to be created for the base table whose column names are repeated.
If the VACUUM FULL, CLUSTER, or REINDEX operation on a table is interrupted, the GSI on the table may turn to the UNUSABLE state. In this case, an error will be reported when the GSI is queried. You are advised to run REINDEX INDEX to rebuild the GSI.

Syntax

CREATE GLOBAL [ UNIQUE ] INDEX [ [schema_name.]index_name ] ON table_name [ USING method ]
    ({ column_name [ COLLATE collation ] [ opclass ] [ ASC | DESC ] [ NULLS { FIRST | LAST } ] }[, ...] ) 
    [CONTAINING (containing_colname)] [DISTRIBUTE BY hash(dist_colnames)]
    [ TABLESPACE tablespace_name ];

Parameters

UNIQUE
Creates a unique index. In this way, the system checks whether new values are unique in the index column. If the inserted or updated value causes duplicate records, an error is reported.
schema_name
Specifies the schema name.

Value range: an existing schema name
index_name
Specifies the name of the index to be created. No schema name can be included here; the index is always created in the same schema as its parent table.

Value range: a string. It must comply with the naming convention.
table_name
Specifies the name of the table to be indexed (optionally schema-qualified).

Value range: an existing table name
USING method
Specifies the name of the index method to be used.

Value range: UB-tree. The multi-version B-tree index is provided. The index page contains transaction information.
column_name
Specifies the name of the column on which an index is to be created.

If the index mode supports multi-column indexes, multiple columns can be declared. A maximum of 28 columns can be declared for a non-partitioned base table, and a maximum of 27 columns can be declared for a partitioned base table.
COLLATE collation
Assigns a collation to the column (which must be of a collatable data type). If no collation is specified, the default collation is used. You can run the select * from pg_collation command to query collation rules from the pg_collation system catalog. The default collation rule is the row starting with default in the query result.
opclass
Specifies the name of an operator class. An operator class can be specified for each column of an index. The operator class identifies the operators to be used by the index for that column.
ASC
Specifies an ascending (default) sort order.
DESC
Specifies a descending sort order.
NULLS FIRST
Specifies that null values appear before non-null values in the sort ordering. This is the default when DESC is specified.
NULLS LAST
Specifies that null values appear after non-null values in the sort ordering. This is the default when DESC is not specified.
CONTAINING
Specifies the base table attributes other than index keys contained in the GSI.
containing_colname
Specifies the base table attributes other than the index keys contained in the GSI.
DISTRIBUTE BY
Specifies the distribution key of the GSI, which is different from that of the base table, and only the hash distribution can be used.
dist_colnames
Specifies the distribution key of the GSI.

It must be contained in column_name.
TABLESPACE tablespace_name
Specifies the tablespace for an index. If no tablespace is specified, the default tablespace is used.

Value range: an existing table name

Example

    
         -- Create a base table test that contains three columns.
gaussdb=# CREATE TABLE test(c1 int, c2 int, c3 int);

-- Create a GSI on the c2 column of the test table containing the c3 column based on the hash distribution of the c2 column.
gaussdb=#  CREATE GLOBAL INDEX idx_gsi_1 ON test(c2) CONTAINING(c3) DISTRIBUTE BY HASH(c2);

-- Create a base table test2 that contains three columns.
gaussdb=#  CREATE TABLE test2(c1 int, c2 int, c3 int);

-- Create a GSI on the c2 column of the test2 table containing the c3 column based on the hash distribution of the c2 column.
gaussdb=#  CREATE GLOBAL INDEX idx_gsi_2 ON test2(c2) CONTAINING(c3) ;

-- Create a base table test3 that contains three columns.
gaussdb=#  CREATE TABLE test3(c1 int, c2 int, c3 int);

-- Create a GSI in UNIQUE form on the c2 column of the test3 table, which is based on the hash distribution of the c2 column by default.
gaussdb=#  CREATE GLOBAL UNIQUE INDEX idx_gsi_3 ON test3(c2) DISTRIBUTE BY HASH(c2);