CREATE INDEX_Indexes, Sequences, and File Directories_DDL Syntax_SQL Syntax Reference (9.1.0.x)_SQL Syntax Reference

Function

CREATE INDEX creates an index in a specified table. An index in a database is a specialized data structure designed to significantly drive data retrieval operations. An index acts like a sorted directory, allowing the database system to quickly locate specific rows without needing to scan the entire table.

An index has the following functions:

Accelerating queries
In database applications, full table scans on large datasets cause slow queries. Creating indexes resolves this by allowing the database to locate data directly, bypassing the need to scan every row. For example, an index on an id column lets the system instantly find the record where id=100, dramatically improving performance.
Optimizing sorting and grouping operations
Indexes such as B-tree indexes are ordered. When a query involves ORDER BY or GROUP BY, you can directly use the order of indexes to avoid extra sorting operations and improve efficiency.
Restricting data integrity
Some indexes (such as primary key indexes and unique indexes) can enforce unique values of columns (for example, primary keys cannot be duplicate), helping maintain data integrity.

Index Types

DWS supports the following types of indexes:

B-tree indexes store data keys using a structure like the B+ tree, enabling fast index searches. B-tree indexes support range queries, sorting, and equality queries.
GIN indexes are inverted index designed for complex data types that contain multiple searchable keys, such as arrays. It is ideally suited for multi-value matching, full-text search, and fuzzy queries.
GiST indexes apply to multi-dimensional data types (such as geometry and geography) and set data types. They apply to multi-dimensional data, spatial search, and complex scopes.
PSORT indexes are used to perform partial sort on column-store tables.

Notes:

Row-store tables: B-tree (default), GIN, and GiST indexes
Column-store tables: PSORT (default), B-tree, and GIN indexes

Partitioned tables have two index types: local and global.

Local indexes (partitioned indexes) apply to individual partitions within a partitioned table. Each local index matches one partition and shares its count and range with the table. If a table partition changes, only that specific index partition requires maintenance.
Global indexes span across all partitions, covering the entire table's data regardless of how the base table is partitioned. However, DWS currently does not support global indexes.

Indexing Methods

**Table 1** Indexing methods
Method	Description
Unique indexes	An index ensures the uniqueness of one or multiple attributes by acting as a constraint. When a table defines a unique constraint or primary key, DWS generates a unique index—possibly for multiple columns—on those specific columns to enforce this rule. In DWS, only B-tree supports creating unique indexes.
Composite indexes	A table's index can include multiple attributes. In DWS, B-tree allows multi-column indexing with up to 32 columns.
Partial indexes	Refers to an index that can be created for subsets of a table. This indexing method contains only tuples that meet condition expressions.
Expression index	An index can be built on a function or expression derived from one or more table columns. This type of index functions only if queries use the exact expression used during its creation.

Principles for Creating Indexes

Indexes are primarily used to enhance database performance (though inappropriate use can result in slower database performance). You are advised to create indexes on:

Columns that are often queried
Join conditions. For a query on joined columns, you are advised to create a composite index on the columns, for example, select * from t1 join t2 on t1.a=t2.a and t1.b=t2.b. You can create a composite index on the a and b columns of table t1.
Columns having filter criteria (especially scope criteria) of a where clause
Columns that appear after order by, group by, and distinct.
For point queries, create a B-tree index.

The index creation syntax on a partitioned table differs from that on an ordinary table. The partitioned table does not support concurrent index creation, partial index creation, or NULL FIRST.

Precautions

Indexes consume storage and computing resources. Creating too many indexes has negative impact on database performance (especially the performance of data import. Therefore, you are advised to import the data before creating indexes). Create indexes only when they are necessary.
All functions and operators used in an index definition must be immutable, that is, their results must depend only on their arguments and never on any outside influence (such as the contents of another table or the current time). This restriction ensures that the behavior of the index is well-defined. When using a user-defined function on an index or in a WHERE statement, mark it as an immutable function.
An index created on a partitioned table must include a partition column and all the partition keys.
DWS can only create local indexes on partitioned tables.
Column-store and HDFS tables support B-tree indexes. If the indexes are used, you cannot create expression and partial indexes.
Column-store tables support creating unique indexes using B-tree indexes.
Column-store and HDFS tables support PSORT indexes. If the PSORT indexes are used, you cannot create expression, partial, or unique indexes.
Column-store tables support GIN indexes, rather than partial indexes and unique indexes. If GIN indexes are used, you can create expression indexes. However, an expression in this situation cannot contain empty splitters, empty columns, or multiple columns.
Column-store indexes do not support OR query filter criteria and inlist operations.
A primary key or unique index cannot be created for a round-robin table.
Performing CREATE INDEX or REINDEX operations on a table triggers index rebuilding. During this process, data is dumped to a new data file, and once rebuilding is complete, the original file is deleted. For large tables, index rebuilding can consume a significant amount of disk space. Exercise caution when disk space is insufficient to prevent the cluster from becoming read-only.

For a table where a large amount of data needs to be added, deleted, or modified, it is recommended that the number of indexes be less than or equal to three. The maximum number of indexes is five.
Do not perform the CREATE INDEX and REINDEX operations on large tables during peak hours.
For more information about development and design specifications, see Development and Design Proposal.

Syntax

Create an index on a table.

       
            CREATE [ UNIQUE ] INDEX [ [IF NOT EXISTS] [ schema_name. ] index_name ] ON table_name [ USING method ]
    ({ { column_name | ( expression ) } [ COLLATE collation ] [ opclass ] [ ASC | DESC ] [ NULLS { FIRST | LAST } ] }[, ...] )
    [ NULLS [ NOT ] DISTINCT | NULLS IGNORE ]
    [ COMMENT 'text' ]
    [ WITH ( {storage_parameter = value} [, ... ] ) ]
    
    [ WHERE predicate ];

Create an index for a partitioned table.

       
            CREATE [ UNIQUE ] INDEX [ [IF NOT EXISTS] [ schema_name. ] index_name ] ON table_name [ USING method ]
    ( {{ column_name | ( expression ) } [ COLLATE collation ] [ opclass ] [ ASC | DESC ] [ NULLS LAST ] }[, ...] )
    [ NULLS [ NOT ] DISTINCT | NULLS IGNORE ]
    [ COMMENT 'text' ]
    LOCAL [ ( { PARTITION index_partition_name  } [, ...] ) ]
    [ WITH ( { storage_parameter = value } [, ...] ) ]
    ;

Parameter Description

**Table 2** CREATE INDEX parameters
Parameter	Description	Value Range
UNIQUE	Causes the system to check for duplicate values in the table when the index is created (if data exists) and each time data is added. If the inserted or updated value causes duplicate records, an error is reported. Currently, only B-tree indexes of row-store tables and column-store tables support unique indexes.	-
IF NOT EXISTS	If IF NOT EXISTS is specified and no index with the same name exists, the index can be created. If an index with the same name already exists, the system will simply display a message stating that the index already exists and will not perform any additional operations during creation. When IF NOT EXISTS is used, the index name must be specified. This parameter is supported only by clusters of version 9.1.0 or later.	-
schema_name	Name of the schema where the index to be created is located. The specified schema name must be the same as the schema of the table.	-
index_name	Specifies the name of the index to be created. The schema of the index is the same as that of the table.	A string, which must comply with the identifier naming convention.
table_name	Specifies the name of the table to be indexed (optionally schema-qualified).	Name of an existing table
USING method	Specifies the name of the index method to be used. Row-based tables support the following index types: btree (default), gin, and gist. Column-based tables support the following index types: psort (default), btree, and gin.	Value range: btree: B-tree indexes store data keys using a structure like the B+ tree, enabling fast index searches. B-tree indexes support comparison and range queries. gin: GIN indexes work like reverse lists and handle data with many elements, such as arrays. gist: GiST indexes work with multidimensional data like geometry, geography, and collections. psort: PSORT indexes are used to perform partial sort on column-store tables.
column_name	Specifies the name of a column of the table.	Multiple columns can be specified if the index method supports multi-column indexes. A maximum of 32 columns can be specified.
expression	Specifies an expression based on one or more columns of the table. The expression usually must be written with surrounding parentheses, as shown in the syntax.	However, the parentheses can be omitted if the expression has the form of a function call. Expression can be used to obtain fast access to data based on some transformation of the basic data. For example, an index computed on upper(col) would allow the clause WHERE upper(col) = 'JIM' to use an index. If an expression contains IS NULL, the index for this expression is invalid. In this case, you are advised to create a partial index.
COLLATE collation	Assigns a collation to the column (which must be of a collatable data type). If no collation is specified, the default collation is used.	-
opclass	Specifies the name of an operator class.	Specifies an operator class for each column of an index. The operator class identifies the operators to be used by the index for that column. For example, a B-tree index on the type int4 would use the int4_ops class; this operator class includes comparison functions for values of type int4. In practice, the default operator class for the column's data type is sufficient. The operator class applies to data with multiple sorts. For example, we might want to sort a complex-number data type either by absolute value or by real part. We could do this by defining two operator classes for the data type and then selecting the proper class when making an index.
ASC	Indicates ascending sort order (default). This option is supported only by row storage.	-
DESC	Indicates descending sort order. This option is supported only by row storage.	-
NULLS FIRST	Specifies that nulls sort before not-null values. This is the default when DESC is specified.	-
NULLS LAST	Specifies that nulls sort after not-null values. This is the default when DESC is not specified.	-
NULLS DISTINCT NULLS NOT DISTINCT NULLS IGNORE	Specifies how NULL values of index columns in a Unique index are processed.	Default value: This parameter is left empty by default. NULL values can be inserted repeatedly. When the inserted data is compared with the original data in the table, the NULL value can be processed in any of the following ways: NULLS DISTINCT: Each NULL is treated as unique, allowing repeated insertions of NULLs. NULLS NOT DISTINCT: All NULLs are seen as identical, so only one NULL is kept in the index, preventing repeated NULL insertions; only distinct non-NULL values can be added. NULLS IGNORE: NULL values are skipped during the equivalent comparison. If all indexed columns are NULL, NULLs can be inserted repeatedly; if some columns are NULL, only new non-NULL values differing from existing ones can be added. The following table Table 4 lists the behaviors of the three processing modes.
COMMENT 'text'	Specifies the comment of an index.	-
WITH ( {storage_parameter = value} [, ... ] )	Specifies the name of an index-method-specific storage parameter. Its value can be FILLFACTOR, INVISIBLE, FASTUPDATE, or GIN_PENDING_LIST_LIMIT.	For details about the value range and usage, see Table 3.
WHERE predicate	Creates a partial index. A partial index is an index that contains entries for only a portion of a table, usually a portion that is more useful for indexing than the rest of the table. For example, if you have a table that contains both billed and unbilled orders where the unbilled orders take up a small fraction of the total table and yet that is an often used section, you can improve performance by creating an index on just that portion. Another possible application is to use WHERE with UNIQUE to enforce uniqueness over a subset of a table.	Predicate expression can refer only to columns of the underlying table, but it can use all columns, not just the ones being indexed. Presently, subquery and aggregate expressions are also forbidden in WHERE.
LOCAL	Local indexes (partitioned indexes) are created on each partition of a partitioned table. A local index corresponds to a partition of the partitioned table, and has the same number of partitions and the same partition range as the table. When the table partition changes, you only need to maintain the corresponding index partition.	-
PARTITION index_partition_name	Specifies the name of the index partition.	A string, which must comply with the naming convention.

Table 3 WITH ( {storage_parameter = value} [, ... ] ) parameter

Storage Parameter

Description

Example

FILLFACTOR

This parameter is unavailable for GIN and PSORT indexes. It indicates the fill factor of an index and its value is a percentage between 10 and 100. Index pages are packed only to the indicated percentage and the remaining space is used for subsequent INSERT and UPDATE operations, thereby reducing page splits and improving performance.

This parameter determines the percentage of space on index pages to be packed, which affects the database performance and storage space usage. You can adjust its value to reduce page splits caused by subsequent INSERT and UPDATE operations, optimizing the storage and performance of indexes.

Impact of FILLFACTOR on index performance:

Performance: When an index page is fully packed and a new record needs to be inserted, the page will be split, consuming more I/O resources and time. A lower fillfactor can reduce page splits and improve the INSERT and UPDATE performance, but will increase the storage space usage. A higher fillfactor can improve the storage space usage, but may cause more page splits and degrade the INSERT and UPDATE performance.
Storage space: A lower fillfactor increases the free space of each data page. A higher fillfactor stores data more compactly, reducing disk space usage.

How to configure FILLFACTOR:

For tables that are frequently updated, set FILLFACTOR to a small value, for example, 60 to 80, to reserve space and reduce page splits.
For tables that are rarely updated, set FILLFACTOR to a large value, for example, 90 to 100, to maximize storage space.

Create a specified B-tree index on the SM_SHIP_MODE_SK column in the tpcds. ship_mode_t1 table and set FILLFACTOR of the index to 70%.

This means that each data page is only 70% full when initially packed. The remaining 30% space is reserved for future INSERT and UPDATE operations.

          
               CREATE INDEX ds_ship_mode_t1_index6 ON tpcds.ship_mode_t1 USING btree(SM_SHIP_MODE_SK) WITH (FILLFACTOR = 70);

INVISIBLE

Controls whether the optimizer generates index scan plans.

Scenario: If you want to disable indexes temporarily during maintenance or test the impact of indexes on performance, you can use this parameter.

Value range:

ON indicates that no index scan plan is generated.
OFF indicates that an index scan plan is generated.

Default value: OFF

Create a specified B-tree index on the SM_SHIP_MODE_SK column in the tpcds.ship_mode_t1 table and set INVISIBLE to ON (ON indicates that optimizer does not generate the index scan plan.)

          
               CREATE INDEX ds_ship_mode_t1_index7 ON tpcds.ship_mode_t1 USING btree(SM_SHIP_MODE_SK) WITH (INVISIBLE = ON);

FASTUPDATE

This parameter is available only for GIN indexes. It controls whether to enable fast update for GIN indexes. Once enabled, data changes are temporarily stored in the pending list.

Value range:

ON: Fast update is enabled.
OFF: Fast update is disabled.

Default value: ON

Create a GIN index on the SM_SHIP_MODE_SK column in the tpcds. ship_mode_t1 table, enable fast update by configuring the FASTUPDATE parameter, and set the maximum pending list size of the index to 8 MB.

CREATE INDEX ds_ship_mode_t1_index8 ON tpcds.ship_mode_t1 USING gin(SM_SHIP_MODE_SK) WITH (FASTUPDATE = ON, GIN_PENDING_LIST_LIMIT = 8192);

GIN_PENDING_LIST_LIMIT

This parameter is supported only by GIN indexes. When fast update is enabled for GIN indexes, you can use this parameter to set the maximum size of the pending list of GIN indexes.

Pending List is a data structure specific to GIN indexes and is used to temporarily store index update operations. When the FASTUPDATE parameter is enabled for a GIN index, new index items are not directly written to the main index structure, but are stored in the pending list. When conditions are met, the index items are merged into the main index in batches.

Value range: 64 to INT_MAX. The unit is KB.

Default value: determined by the value of gin_pending_list_limit (4 MB by default).

**Table 4** Processing of NULL values in index columns in unique indexes
Constraint	All Index Columns Are NULL	Some Index Columns Are NULL.
NULLS DISTINCT	Can be inserted repeatedly.	Can be inserted repeatedly.
NULLS NOT DISTINCT	Cannot be inserted repeatedly.	Cannot be inserted if the non-null values are equal. Can be inserted if the non-null values are not equal.
NULLS IGNORE	Can be inserted repeatedly.	Cannot be inserted if the non-null values are equal. Can be inserted if the non-null values are not equal.

Example

Create a sample table named tpcds.ship_mode_t1.

       
            CREATE SCHEMA tpcds;
DROP TABLE IF EXISTS tpcds.ship_mode_t1;
CREATE TABLE tpcds.ship_mode_t1
(
    SM_SHIP_MODE_SK           INTEGER               NOT NULL,
    SM_SHIP_MODE_ID           CHAR(16)              NOT NULL,
    SM_TYPE                   CHAR(30)                      ,
    SM_CODE                   CHAR(10)                      ,
    SM_CARRIER                CHAR(20)                      ,
    SM_CONTRACT               CHAR(20)
) 
DISTRIBUTE BY HASH(SM_SHIP_MODE_SK);

Create a unique index on the SM_SHIP_MODE_SK column in the tpcds.ship_mode_t1 table.

       
            CREATE UNIQUE INDEX ds_ship_mode_t1_index1 ON tpcds.ship_mode_t1(SM_SHIP_MODE_SK);

Create a unique index on the SM_SHIP_MODE_SK column in the tpcds.ship_mode_t1 table and specify how to process null values.

       
            CREATE UNIQUE INDEX ds_ship_mode_t1_index5 ON tpcds.ship_mode_t1(SM_SHIP_MODE_SK) NULLS NOT DISTINCT;

Add comment to the index when creating an index on the SM_SHIP_MODE_SK column of table tpcds.ship_mode_t1.

       
            CREATE INDEX ds_ship_mode_t1_index_comment ON tpcds.ship_mode_t1(SM_SHIP_MODE_SK) COMMENT 'index';

Create a B-tree index on the SM_SHIP_MODE_SK column in the tpcds.ship_mode_t1 table.

       
            CREATE INDEX ds_ship_mode_t1_index4 ON tpcds.ship_mode_t1 USING btree(SM_SHIP_MODE_SK);

Create a specified B-tree index on the SM_SHIP_MODE_SK column in the tpcds. ship_mode_t1 table and set FILLFACTOR of the index to 70%.

       
            CREATE INDEX ds_ship_mode_t1_index6 ON tpcds.ship_mode_t1 USING btree(SM_SHIP_MODE_SK) WITH (FILLFACTOR = 70);

Create a specified B-tree index on the SM_SHIP_MODE_SK column in the tpcds.ship_mode_t1 table and set INVISIBLE to ON (ON indicates that optimizer does not generate the index scan plan. The default value is OFF).

       
            CREATE INDEX ds_ship_mode_t1_index7 ON tpcds.ship_mode_t1 USING btree(SM_SHIP_MODE_SK) WITH (INVISIBLE = ON);

Create a GIN index on the SM_SHIP_MODE_SK column in the tpcds. ship_mode_t1 table, enable fast update by configuring the FASTUPDATE parameter, and set the maximum pending list size of the index to 8 MB.

CREATE INDEX ds_ship_mode_t1_index8 ON tpcds.ship_mode_t1 USING gin(SM_SHIP_MODE_SK) WITH (FASTUPDATE = ON, GIN_PENDING_LIST_LIMIT = 8192);

Create an expression index on the SM_CODE column in the tpcds.ship_mode_t1 table.

       
            CREATE INDEX ds_ship_mode_t1_index2 ON tpcds.ship_mode_t1(SUBSTR(SM_CODE,1 ,4));

Create a partial index on the SM_SHIP_MODE_SK column where SM_SHIP_MODE_SK is greater than 10 in the tpcds.ship_mode_t1 table.

CREATE UNIQUE INDEX ds_ship_mode_t1_index3 ON tpcds.ship_mode_t1(SM_SHIP_MODE_SK) WHERE SM_SHIP_MODE_SK>10;

Create a sample table named tpcds.customer_address_p1.

       
        
          
          
            CREATE TABLE tpcds.customer_address_p1
(
    CA_ADDRESS_SK             INTEGER               NOT NULL,
    CA_ADDRESS_ID             CHAR(16)              NOT NULL,
    CA_STREET_NUMBER          CHAR(10)                      ,
    CA_STREET_NAME            VARCHAR(60)                   ,
    CA_STREET_TYPE            CHAR(15)                      ,
    CA_SUITE_NUMBER           CHAR(10)                      ,
    CA_CITY                   VARCHAR(60)                   ,
    CA_COUNTY                 VARCHAR(30)                   ,
    CA_STATE                  CHAR(2)                       ,
    CA_ZIP                    CHAR(10)                      ,
    CA_COUNTRY                VARCHAR(20)                   ,
    CA_GMT_OFFSET             DECIMAL(5,2)                  ,
    CA_LOCATION_TYPE          CHAR(20)
)
DISTRIBUTE BY HASH(CA_ADDRESS_SK)
PARTITION BY RANGE(CA_ADDRESS_SK)
( 
   PARTITION p1 VALUES LESS THAN (3000),
   PARTITION p2 VALUES LESS THAN (5000) ,
   PARTITION p3 VALUES LESS THAN (MAXVALUE) 
)
ENABLE ROW MOVEMENT;

           

         

       
      

Create the partitioned table index ds_customer_address_p1_index1 with the name of the index partition not specified.

       
            CREATE INDEX ds_customer_address_p1_index1 ON tpcds.customer_address_p1(CA_ADDRESS_SK) LOCAL;

Create the partitioned table index ds_customer_address_p1_index2 with the name of the index partition specified.

       
            CREATE INDEX ds_customer_address_p1_index2 ON tpcds.customer_address_p1(CA_ADDRESS_SK) LOCAL
(
    PARTITION CA_ADDRESS_SK_index1,
    PARTITION CA_ADDRESS_SK_index2,
    PARTITION CA_ADDRESS_SK_index3 
) 
;

Create the partitioned table index ds_customer_address_p1_index_comment and add index comments.

       
            CREATE INDEX ds_customer_address_p1_index_comment ON tpcds.customer_address_p1(CA_ADDRESS_SK) COMMENT 'index' LOCAL
(
    PARTITION CA_ADDRESS_SK_index1,
    PARTITION CA_ADDRESS_SK_index2,
    PARTITION CA_ADDRESS_SK_index3 
) 
;

Application Differences Among NULLS DISTINCT, NULLS NOT DISTINCT, and NULLS IGNORE

Examples are provided below to illustrate the differences.

Example

Assume that there is a users table that contains ID and Email (which may be NULL, indicating that no email address is provided).

Scenario 1: NULLS DISTINCT (NULL can be inserted repeatedly) (default)

       
            --Create a table.
DROP TABLE IF EXISTS users;
CREATE TABLE users (
  id INT,
  email TEXT
)
DISTRIBUTE BY HASH(email);
--Create a unique index (NULLS DISTINCT by default).
CREATE UNIQUE INDEX idx_email_unique ON users (email);
--Insert data.
INSERT INTO users (id, email) VALUES (1, NULL); -- Success
INSERT INTO users (id, email) VALUES (2, NULL); -- Success (with each NULL treated as unique)
INSERT INTO users (id, email) VALUES (3, 'a@example.com'); -- Success
INSERT INTO users (id, email) VALUES (4, 'a@example.com'); -- Failure (duplicate non-NULL values)

Click to enlarge

Scenario 2: NULLS NOT DISTINCT (NULL cannot be inserted repeatedly)

       
            --Re-create a unique index and specify NULLS NOT DISTINCT.
DROP TABLE IF EXISTS users;
CREATE TABLE users (
  id INT,
  email TEXT
)
DISTRIBUTE BY HASH(email);
CREATE UNIQUE INDEX idx_email_unique ON users (email) NULLS NOT DISTINCT;
--Insert data.
INSERT INTO users (id, email) VALUES (5, NULL); -- Success (the first NULL)
INSERT INTO users (id, email) VALUES (6, NULL); -- Failure (the second NULL, violating the unique constraint)
INSERT INTO users (id, email) VALUES (7, 'b@example.com'); -- Success
INSERT INTO users (id, email) VALUES (8, 'b@example.com'); -- Failure (duplicate non-NULL values)

Scenario 3: NULLS IGNORE (NULL values are ignored.)

       
            --Create an index and ignore NULL values.
DROP TABLE IF EXISTS users;
CREATE TABLE users (
  id INT,
  email TEXT
)
DISTRIBUTE BY HASH(email);
CREATE UNIQUE INDEX idx_email_ignore ON users (email) NULLS IGNORE;
--Insert data.
INSERT INTO users (id, email) VALUES (9, NULL); -- Success (the first NULL)
INSERT INTO users (id, email) VALUES (10, NULL); -- Success (NULL values are ignored.)
INSERT INTO users (id, email) VALUES (11, 'c@example.com'); -- Success
INSERT INTO users (id, email) VALUES (12, 'c@example.com'); -- Failure (duplicate non-NULL values)

Helpful Links

For details about how to modify and delete indexes, see ALTER INDEX and DROP INDEX.
For more index examples, see Creating and Managing Indexes.
For more examples of GIN indexes, see Best Practices of GIN Index.

CREATE INDEX

Function

Index Types

Indexing Methods

Principles for Creating Indexes

Precautions

Syntax

Parameter Description

Example

Application Differences Among NULLS DISTINCT, NULLS NOT DISTINCT, and NULLS IGNORE

Example

Helpful Links

FAQs

Feedback

Was this page helpful?