Updated on 2024-05-07 GMT+08:00

CREATE INDEX

Description

Defines a new index.

Indexes are primarily used to enhance database performance (though inappropriate use can result in database performance deterioration). You are advised to create indexes on:

  • Columns that are often queried
  • Join conditions. For a query on joined columns, you are advised to create a composite index on the columns. For example, for select * from t1 join t2 on t1.a=t2.a and t1.b=t2.b, you can create a composite index on columns a and b in table t1.
  • Columns having filter criteria (especially scope criteria) of a WHERE clause
  • Columns that appear after ORDER BY, GROUP BY, and DISTINCT

Partitioned tables do not support partial index creation (when indexes contain the GLOBAL or LOCAL keyword or the created index is a GLOBAL index). If the GLOBAL or LOCAL keyword is declared during index creation, the index of the corresponding type is created. Otherwise, if the partition name is specified during index creation, the LOCAL index is created. If a unique index contains non-partition keys, a GLOBAL index is created. If a unique index contains all partition keys, a LOCAL index is created. Otherwise, a GLOBAL index is created by default.

Precautions

  • Indexes consume storage and computing resources. Creating too many indexes has negative impact on database performance (especially the performance of data import. Therefore, you are advised to import the data before creating indexes). Therefore, create indexes only when they are necessary.
  • All functions and operators used in an index definition must be immutable, that is, their results must depend only on their parameters and never on any outside influence (such as the contents of another table or the current time). This restriction ensures that the behavior of the index is well-defined. To use a user-defined function in an index or WHERE clause, mark it as an immutable function.
  • Partitioned table indexes are classified into local indexes and global indexes. A local index binds to a specific partition, and a global index corresponds to the entire partitioned table.
  • A user granted with the CREATE ANY INDEX permission can create indexes in both the public and user schemas.
  • If a user-defined function is called in the expression index, the expression index function is executed based on the permission of the function creator.
  • Data of the XML type cannot be used as common indexes, unique indexes, global indexes, local indexes, or partial indexes.
  • Only B-tree and UB-tree indexes can be created online and only common, global, and local indexes are supported. Online concurrent index creation supports only Astore common indexes, global indexes, and local indexes. Ustore indexes are not supported.

Syntax

  • Create an index on a table.
    CREATE [ UNIQUE ] INDEX [ CONCURRENTLY ] [ [schema_name.] index_name ] ON table_name [ USING method ]
        ({ { column_name [ ( length ) ] | ( expression ) } [ COLLATE collation ] [ opclass ] [ ASC | DESC ] [ NULLS { FIRST | LAST } ] }[, ...] )
        [ INCLUDE ( column_name [, ...] ) ]
        [ WITH ( {storage_parameter = value} [, ... ] ) ]
        [ TABLESPACE tablespace_name ]
        [ WHERE predicate ];
  • Create an index on a partitioned table.
    CREATE [ UNIQUE ] INDEX [ CONCURRENTLY ] [ [schema_name.] index_name ] ON table_name [ USING method ]
        ( { { column_name [ ( length ) ] | ( expression ) } [ COLLATE collation ] [ opclass ] [ ASC | DESC ] [ NULLS LAST ] } [, ...] )
        [ LOCAL [ ( { PARTITION index_partition_name | SUBPARTITION index_subpartition_name [ TABLESPACE index_partition_tablespace ] } [, ...] ) ] | GLOBAL ]
        [ INCLUDE ( column_name [, ...] ) ]
        [ WITH ( { storage_parameter = value } [, ...] ) ]
        [ TABLESPACE tablespace_name ];

Parameters

  • UNIQUE

    Creates a unique index. In this way, the system checks whether new values are unique in the index column. Attempts to insert or update data which would result in duplicate entries will generate an error.

    Currently, only B-tree indexes and UB-tree indexes support unique indexes.

  • CONCURRENTLY

    Creates an index (with ShareUpdateExclusiveLock) in a mode that does not block DML statements. When an index is created, other statements cannot access the table on which the index depends. If this keyword is specified, DML is not blocked during the creation.

    • This option can only specify a name of one index.
    • The CREATE INDEX statement can be run within a transaction, but CREATE INDEX CONCURRENTLY cannot.
    • For temporary tables, you can use CONCURRENTLY to create indexes. However, indexes are created in blocking mode because no other sessions concurrently access the temporary tables and the blocking mode is more cost-effective.
    • This keyword is specified when an index is created. For Astore, the entire table needs to be scanned twice and then built. When the table is scanned for the first time, an index is created and the read and write operations are not blocked. During the second scan, changes that have occurred since the first scan are merged and updated. For Ustore, only one scan is required to create indexes.
    • For Astore, the table needs to be scanned and built twice, and all existing transactions that may modify the table must be completed. This means that the index creation takes a longer time than normal. In addition, the CPU and I/O consumption affects other services. Although the Ustore only needs to scan the entire table once to complete the index creation, the preceding consumption still exists.
    • If an index build fails, it leaves an "unusable" index. This index is ignored by queries, but it still consumes the update overhead. In this case, you are advised to delete the index and create the index again using CONCURRENTLY, or rebuild the index by using CLUSTER, TRUNCATE, VACUUM FULL, or REINDEX. Note that if ALTER TABLE involves table and index rebuild, residual "unusable" indexes will be automatically cleared.
    • After the second scan, index creation must wait for any transaction that holds a snapshot earlier than the snapshot taken by the second scan to terminate. In addition, the ShareUpdateExclusiveLock (level 4) added during index creation conflicts with a lock whose level is greater than or equal to 4. Therefore, when such an index is created, the system is prone to hang or deadlock. Example:
      • If two sessions create an index by using CONCURRENTLY for the same table, a deadlock occurs.
      • If a session creates an index by using CONCURRENTLY for a table and another session drops a table, a deadlock occurs.
      • There are three sessions. Session 1 locks table a and does not commit it. Session 2 creates an index by using CONCURRENTLY for table b. Session 3 writes data to table a. Before the transaction of session 1 is committed, session 2 is blocked.
      • The transaction isolation level is set to repeatable read (read committed by default). Two sessions are started. Session 1 writes data to table a and does not commit it. Session 2 creates an index by using CONCURRENTLY for table b. Before the transaction of session 1 is committed, session 2 is blocked.
    • When the I/O and CPU resources are not limited, the service performance deterioration caused by online index creation can be controlled within 10%. However, in special scenarios, the service performance deterioration may exceed 10%. This is because online index creation is a long transaction that consumes a large number of I/O and CPU resources. It consumes more resources than offline index creation. The longer the online index creation transaction lasts, the greater the impact on service performance. The time for creating indexes online is positively correlated with the data volume of base tables and the data volume generated by concurrent DMLs. When the I/O and CPU resources are not limited, the time for creating indexes online is about two to six times that for creating indexes offline. However, when the number of concurrent transactions is large (> 10000 TPS) or resource contention occurs, the time may be even longer. In Astore mode, you can create indexes in parallel to shorten the index creation time. The performance of online parallel index creation increases to a certain value and becomes stable as the number of parallel worker threads increases. Compared with the performance of creating indexes in serial mode, the performance of creating indexes in parallel online is improved by about 30%. You are advised to create indexes online during off-peak hours to avoid great impact on services. Although online index creation provides the capability of uninterrupted services to some extent, it still needs to be implemented with caution.
  • schema_name

    Specifies the schema name.

    Value range: an existing schema name

  • index_name

    Specifies the name of the index to be created. The schema of the index is the same as that of the table.

    Value range: a string. It must comply with the naming convention.

  • table_name

    Specifies the name of the table to be indexed (optionally schema-qualified).

    Value range: an existing table name

  • USING method

    Specifies the name of the index method to be used.

    Value range:

    • btree: B-tree indexes store key values of data in a B+ tree structure. This structure helps users to quickly search for indexes. B-tree supports comparison query and query range. When an index is created in a Ustore table, the index is automatically changed to UB-tree.
    • ubtree: Multi-version B-tree index used only for Ustore tables. The index page contains transaction information and can be recycled. By default, the INSERTPT function is enabled for UB-tree indexes.

    Row-store tables (Astore storage engine) support the following index types: btree (default). Row-store tables (Ustore storage engine) support the following index type: ubtree.

  • column_name

    Specifies the name of the column on which an index is to be created.

    Multiple columns can be specified if the index method supports composite indexes. A global index supports a maximum of 31 columns, and other indexes support a maximum of 32 columns.

  • column_name ( length )

    Creates a prefix key index based on a column in the table. column_name indicates the column name of the prefix key, and length indicates the prefix length.

    The prefix key uses the prefix of the specified column data as the index key value, which reduces the storage space occupied by the index. Indexes can be used for partial filter and join conditions that contain prefix key columns.

    • The prefix key supports the following index methods: btree and ubtree.
    • The data type of the prefix key column must be binary or character (excluding special characters).
    • The prefix length must be a positive integer that does not exceed 2676 and cannot exceed the maximum length of the column. For the binary type, the prefix length is measured in bytes. For non-binary character types, the prefix length is measured in characters. The actual length of the key value is restricted by the internal page. If a column contains multi-byte characters or an index has multiple keys, the length of the index line may exceed the upper limit. As a result, an error is reported. Consider this situation when setting a long prefix length.
    • In the CREATE INDEX syntax, the following keywords cannot be used as prefix keys for column names: COALESCE, EXTRACT, GREATEST, LEAST, LNNVL, NULLIF, NVL, NVL2, OVERLAY, POSITION, REGEXP_LIKE, SUBSTRING, TIMESTAMPDIFF, TREAT, TRIM, XMLCONCAT, XMLELEMENT, XMLEXISTS, XMLFOREST, XMLPARSE, XMLPI, XMLROOT and XMLSERIALIZE.
    • The prefix key is a special expression key. Some constraints and restrictions that are not described are the same as those of the expression key. For details, see the description of the expression index.
  • expression

    Specifies an expression based on one or more columns of the table. The expression usually must be written with surrounding parentheses, as shown in the syntax. However, the parentheses can be omitted if the expression has the form of a function call.

    Expression can be used to obtain fast access to data based on some transformation of the basic data. For example, an index computed on upper(col) would allow the clause WHERE upper(col) = 'JIM' to use an index.

    If an expression contains IS NULL, the index for this expression is invalid. In this case, you are advised to create a partial index.

  • COLLATE collation

    Assigns a collation to the column (which must be of a collatable data type). If no collation is specified, the default collation is used. You can run the select * from pg_collation command to query collation rules from the pg_collation system catalog. The default collation rule is the row starting with default in the query result.

  • opclass

    Specifies the name of an operator class. An operator class can be specified for each column of an index. The operator class identifies the operators to be used by the index for that column. For example, a B-tree index on the type int4 would use the int4_ops class; this operator class includes comparison functions for values of type int4. In practice, the default operator class for the column's data type is sufficient. The operator class applies to data with multiple sorts. For example, users might want to sort a complex-number data type either by absolute value or by real part. They could do this by defining two operator classes for the data type and then selecting the proper class when making an index.

  • ASC

    Specifies an ascending (default) sort order.

  • DESC

    Specifies a descending sort order.

  • NULLS FIRST

    Specifies that null values appear before non-null values in the sort ordering. This is the default when DESC is specified.

  • NULLS LAST

    Specifies that null values appear after non-null values in the sort ordering. This is the default when DESC is not specified.

  • LOCAL

    Specifies that the partitioned index to be created is a local index.

  • GLOBAL

    Specifies the partitioned index to be created as a global index. If no keyword is specified, a global index is created by default.

  • INCLUDE ( column_name [, ...] )

    The optional INCLUDE clause specifies that some non-key columns are included in indexes. Non-key columns cannot be used as search criteria for accelerating index scans, and they are omitted when the unique constraints of the indexes are checked.

    An index-only scan can directly return content in the non-key columns without accessing the heap table corresponding to the indexes.

    Exercise caution when adding non-key columns as INCLUDE columns, especially for wide columns. If the size of an index tuple exceeds the maximum size allowed by the index type, data insertion fails. Note that in any case, adding non-key columns to an index increases the space occupied by the index, which may slow down the search speed.

    Currently, only UB-tree indexes access mode supports this feature. Non-key columns are stored in the index leaf tuple corresponding to the heap tuple and are not included in the tuple on the upper-layer index page.

  • WITH ( {storage_parameter = value} [, ... ] )

    Specifies the storage parameter used for an index.

    Value range:

    For indexes other than Psort, set it to FILLFACTOR. Only UB-tree indexes support INDEXSPLIT. Only non-partitioned B-tree indexes support the DEDUPLICATION parameter.
    • FILLFACTOR

      The fill factor of an index is a percentage from 10 to 100. In the scenario where a large number of concurrent insertions are performed and the key value range is dense, select a smaller fill factor when the contention of the same index page is high during the insertion.

      Value range: 10–100

    • INDEXSPLIT

      Specifies the splitting policy of UB-tree indexes. The DEFAULT policy is the same as the splitting policy of B-tree indexes. The INSERTPT policy can significantly reduce the index space usage in some scenarios.

      Value range: INSERTPT and DEFAULT

      Default value: INSERTPT

    • ACTIVE_PAGES

      Specifies the number of index pages, which may be less than the actual number of physical file pages and can be used for optimization. Currently, this parameter is valid only for the local index of the Ustore partitioned table and will be updated by VACUUM and ANALYZE (including AUTOVACUUM). You are advised not to manually set this parameter.

    • DEDUPLICATION

      Specifies whether to deduplicate and compress tuples with duplicate key values. This is an index parameter. When there are a large number of indexes with duplicate key values, enabling this parameter can effectively reduce the space occupied by indexes. This parameter does not take effect for primary key indexes and unique indexes. If non-unique indexes are used and the index key value duplication rate is low or the index key values are unique, enabling this parameter will slightly deteriorate the index insertion performance. Currently, local and global indexes of partitioned tables are not supported.

      Value range: Boolean value. The default value is the value of the enable_default_index_deduplication GUC parameter (the default value is off).

  • TABLESPACE tablespace_name

    Specifies the tablespace for an index. If no tablespace is specified, the default tablespace is used.

    Value range: an existing table name

  • WHERE predicate

    Creates a partial index. A partial index is an index that contains entries for only a portion of a table, usually a portion that is more useful for indexing than the rest of the table. For example, if you have a table that contains both billed and unbilled orders where the unbilled orders take up a small fraction of the total table and yet that is an often used portion, you can improve performance by creating an index on just that portion. In addition, WHERE with UNIQUE can be used to enforce uniqueness over a subset for a table.

    Value range: The predicate expression can only refer to columns of the underlying table, but it can use all columns, not just the ones being indexed. Currently, subqueries and aggregate expressions are forbidden in WHERE. You are advised not to use numeric types such as int for predicate, because such types can be implicitly converted to bool values (non-zero values are implicitly converted to true and 0 is implicitly converted to false), which may cause unexpected results.

    For a partitioned table index, if the created index contains the GLOBAL or LOCAL keyword or the created index is a GLOBAL index, the WHERE clause cannot be used to create an index.

  • PARTITION index_partition_name

    Specifies the name of an index partition.

    Value range: a string. It must comply with the naming convention.

  • SUBPARTITION index_subpartition_name

    Specifies the name of a level-2 index partition.

    Value range: a string. It must comply with the naming convention.

  • TABLESPACE index_partition_tablespace

    Specifies the tablespace of an index partition.

    Value range: If this parameter is not specified, the value of index_tablespace is used.

Examples

-- Create the tpcds.ship_mode_t1 table.
gaussdb=# create schema tpcds;
gaussdb=# CREATE TABLE tpcds.ship_mode_t1
(
    SM_SHIP_MODE_SK           INTEGER               NOT NULL,
    SM_SHIP_MODE_ID           CHAR(16)              NOT NULL,
    SM_TYPE                   CHAR(30)                      ,
    SM_CODE                   CHAR(10)                      ,
    SM_CARRIER                CHAR(20)                      ,
    SM_CONTRACT               CHAR(20)
) 
;

-- Create a common unique index on the SM_SHIP_MODE_SK column in the tpcds.ship_mode_t1 table.
gaussdb=# CREATE UNIQUE INDEX ds_ship_mode_t1_index1 ON tpcds.ship_mode_t1(SM_SHIP_MODE_SK);

-- Create a B-tree index on the SM_SHIP_MODE_SK column in the tpcds.ship_mode_t1 table.
gaussdb=# CREATE INDEX ds_ship_mode_t1_index4 ON tpcds.ship_mode_t1 USING btree(SM_SHIP_MODE_SK);

-- Create an expression index on the SM_CODE column in the tpcds.ship_mode_t1 table:
gaussdb=# CREATE INDEX ds_ship_mode_t1_index2 ON tpcds.ship_mode_t1(SUBSTR(SM_CODE,1 ,4));

-- Create a partial index on the SM_SHIP_MODE_SK column where SM_SHIP_MODE_SK is greater than 10 in the tpcds.ship_mode_t1 table.
gaussdb=# CREATE UNIQUE INDEX ds_ship_mode_t1_index3 ON tpcds.ship_mode_t1(SM_SHIP_MODE_SK) WHERE SM_SHIP_MODE_SK>10;

-- Create an index online on the SM_SHIP_MODE_SK column in the tpcds.ship_mode_t1 table.
gaussdb=# CREATE INDEX CONCURRENTLY ds_ship_mode_t1_index6 ON tpcds.ship_mode_t1(SM_SHIP_MODE_SK);

-- Create a prefix key index on the SM_TYPE column in the tpcds.ship_mode_t1 table.
gaussdb=# CREATE INDEX ds_ship_mode_t1_prefix_index ON tpcds.ship_mode_t1(SM_TYPE(4));

-- Rename an existing index.
gaussdb=# ALTER INDEX tpcds.ds_ship_mode_t1_index1 RENAME TO ds_ship_mode_t1_index5;

-- Set the index as unusable.
gaussdb=# ALTER INDEX tpcds.ds_ship_mode_t1_index2 UNUSABLE;

-- Rebuild an index.
gaussdb=# ALTER INDEX tpcds.ds_ship_mode_t1_index2 REBUILD;

-- Delete an existing index.
gaussdb=# DROP INDEX tpcds.ds_ship_mode_t1_index2;

-- Delete the table.
gaussdb=# DROP TABLE tpcds.ship_mode_t1;

-- Create a tablespace.
gaussdb=# CREATE TABLESPACE example1 RELATIVE LOCATION 'tablespace1/tablespace_1';
gaussdb=# CREATE TABLESPACE example2 RELATIVE LOCATION 'tablespace2/tablespace_2';
gaussdb=# CREATE TABLESPACE example3 RELATIVE LOCATION 'tablespace3/tablespace_3';
gaussdb=# CREATE TABLESPACE example4 RELATIVE LOCATION 'tablespace4/tablespace_4';
-- Create the tpcds.customer_address_p1 table.
gaussdb=# CREATE TABLE tpcds.customer_address_p1
(
    CA_ADDRESS_SK             INTEGER               NOT NULL,
    CA_ADDRESS_ID             CHAR(16)              NOT NULL,
    CA_STREET_NUMBER          CHAR(10)                      ,
    CA_STREET_NAME            VARCHAR(60)                   ,
    CA_STREET_TYPE            CHAR(15)                      ,
    CA_SUITE_NUMBER           CHAR(10)                      ,
    CA_CITY                   VARCHAR(60)                   ,
    CA_COUNTY                 VARCHAR(30)                   ,
    CA_STATE                  CHAR(2)                       ,
    CA_ZIP                    CHAR(10)                      ,
    CA_COUNTRY                VARCHAR(20)                   ,
    CA_GMT_OFFSET             DECIMAL(5,2)                  ,
    CA_LOCATION_TYPE          CHAR(20)
)
TABLESPACE example1
PARTITION BY RANGE(CA_ADDRESS_SK)
( 
   PARTITION p1 VALUES LESS THAN (3000),
   PARTITION p2 VALUES LESS THAN (5000) TABLESPACE example1,
   PARTITION p3 VALUES LESS THAN (MAXVALUE) TABLESPACE example2
)
ENABLE ROW MOVEMENT;
-- Create the partitioned table index ds_customer_address_p1_index1 without specifying the index partition name.
gaussdb=# CREATE INDEX ds_customer_address_p1_index1 ON tpcds.customer_address_p1(CA_ADDRESS_SK) LOCAL; 
-- Create the partitioned table index ds_customer_address_p1_index2 with the name of the index partition specified.
gaussdb=# CREATE INDEX ds_customer_address_p1_index2 ON tpcds.customer_address_p1(CA_ADDRESS_SK) LOCAL
(
    PARTITION CA_ADDRESS_SK_index1,
    PARTITION CA_ADDRESS_SK_index2 TABLESPACE example3,
    PARTITION CA_ADDRESS_SK_index3 TABLESPACE example4
) 
TABLESPACE example2;

-- Create a global partitioned index.
gaussdb=# CREATE INDEX ds_customer_address_p1_index3 ON tpcds.customer_address_p1(CA_ADDRESS_ID) GLOBAL;

-- If no keyword is specified, a global partitioned index is created by default.
gaussdb=# CREATE INDEX ds_customer_address_p1_index4 ON tpcds.customer_address_p1(CA_ADDRESS_ID);

-- Create the partitioned table index ds_customer_address_p1_index5 online without specifying the partitioned index name.
gaussdb=# CREATE INDEX CONCURRENTLY ds_customer_address_p1_index5 ON tpcds.customer_address_p1(CA_ADDRESS_SK) LOCAL; 

-- Create the global partitioned index ds_customer_address_p1_index6 online.
gaussdb=# CREATE INDEX CONCURRENTLY ds_customer_address_p1_index6 ON tpcds.customer_address_p1(CA_ADDRESS_ID) GLOBAL;

-- Change the tablespace of the partitioned table index CA_ADDRESS_SK_index2 to example1.
gaussdb=# ALTER INDEX tpcds.ds_customer_address_p1_index2 MOVE PARTITION CA_ADDRESS_SK_index2 TABLESPACE example1;

-- Change the tablespace of the partitioned table index CA_ADDRESS_SK_index3 to example2.
gaussdb=# ALTER INDEX tpcds.ds_customer_address_p1_index2 MOVE PARTITION CA_ADDRESS_SK_index3 TABLESPACE example2;

-- Rename a partitioned table index.
gaussdb=# ALTER INDEX tpcds.ds_customer_address_p1_index2 RENAME PARTITION CA_ADDRESS_SK_index1 TO CA_ADDRESS_SK_index4;

-- Delete the created indexes and the partitioned table.
gaussdb=# DROP INDEX tpcds.ds_customer_address_p1_index1;
gaussdb=# DROP INDEX tpcds.ds_customer_address_p1_index2;
gaussdb=# DROP TABLE tpcds.customer_address_p1;
-- Delete the tablespace.
gaussdb=# DROP TABLESPACE example1;
gaussdb=# DROP TABLESPACE example2;
gaussdb=# DROP TABLESPACE example3;
gaussdb=# DROP TABLESPACE example4;

Helpful Links

ALTER INDEX and DROP INDEX

Suggestions

  • create index

    You are advised to create indexes on:

    • Columns that are often queried
    • Join conditions. For a query on joined columns, you are advised to create a composite index on the columns. For example, for select * from t1 join t2 on t1.a=t2.a and t1.b=t2.b, you can create a composite index on columns a and b in table t1.
    • Columns having filter criteria (especially scope criteria) of a where clause
    • Columns that appear after order by, group by, and distinct
    Constraints:
    • An index of an ordinary table supports a maximum of 32 columns. A GLOBAL index of a partitioned table supports a maximum of 31 columns.
    • The size of a single index cannot exceed the size of the index page (8 KB). The size of a B-tree or UB-tree index cannot exceed one-third of the page size.
    • Partial indexes cannot be created in a partitioned table.
    • When a GLOBAL index is created on a partitioned table, the following constraints apply:
      • Expression indexes and partial indexes are not supported.
      • Only B-tree indexes are supported.
    • In the same attribute column, the local index and global index of a partition cannot coexist.
    • If the ALTER statement does not contain UPDATE GLOBAL INDEX, the original GLOBAL index is invalid. In this case, other indexes are used for query. If the ALTER statement contains UPDATE GLOBAL INDEX, the original GLOBAL index is still valid and the index function is correct.