Help Center/ GaussDB/ M-Compatibility Developer Guide(Distributed)/ SQL Reference/ SQL Syntax/ SQL Statements/ C/ CREATE TABLE

Updated on 2025-10-23 GMT+08:00

View PDF

CREATE TABLE

Description

Creates an initially empty table in the current database. The table will be owned by the creator.

Precautions

If an error occurs during table creation, after it is fixed, the system may fail to delete the empty disk files (whose size is 0) created before the last automatic clearance. This problem seldom occurs and does not affect system running of the database.
When JDBC is used, the DEFAULT value can be set through PrepareStatement.
A user granted the CREATE ANY TABLE permission can create tables in the public and user schemas.
Prefix keys are not supported in primary key indexes.
Foreign key constraints are not supported for distributed instances.
If a distribution key of the character type is used, you are advised not to use case-insensitive collations (such as utf8mb4_general_ci, utf8mb4_unicode_ci, and utf8mb4_0900_ai_ci). Otherwise, data skew may occur and data cannot be evenly distributed. The utf8mb4_bin collation is recommended.

Syntax

Create a table.

CREATE [ [ [LOCAL | GLOBAL] TEMPORARY | UNLOGGED]  ] TABLE [ IF NOT EXISTS ] table_name 
    ({ column_name data_type [ {CHARACTER SET | CHAR SET | CHARSET} charset_name ] [ compress_mode ] [ COLLATE collation_name][COMMENT [=] 'string'] [ column_constraint [ ... ] ]
        | table_constraint
        | {INDEX | KEY} [ [schema_name.] index_name ] [ USING method ] ({column_name [(length)] | (expr) [ASC | DESC]}[,...]) [[COMMENT 'string' | USING method][...]]
        | LIKE source_table [ like_option [...] ] }
        [, ... ])
    [ table_option ][[,...]|[...]]
    [ WITH ( {storage_parameter = value} [, ... ] ) ]
    [ ON COMMIT { PRESERVE ROWS | DELETE ROWS } ]
    [ TABLESPACE tablespace_name ]
    [ DISTRIBUTE BY 
        { REPLICATION 
        | HASH ( column_name [, ...] ) 
        | RANGE ( column_name [, ...] ) { SLICE REFERENCES tablename | ( slice_less_than_item [, ...] )
        | LIST ( column_name [, ...] ) { SLICE REFERENCES tablename | ( slice_values_item [, ...] ) }
        }
    [ TO { GROUP groupname | NODE ( nodename [, ... ] ) } ]
    [ INTERNAL DATA data_content];

Create a table based on an existing source table.
```
CREATE [ [LOCAL | GLOBAL] TEMPORARY ] TABLE [IF NOT EXISTS] table_name LIKE source_table;
```
By default, attributes of the source table are migrated to the target table, including the column definition, index, distribution and partition method, storage definition, and comment. This syntax is also applicable to the adaptation of GaussDB to advanced enhanced capabilities in distributed mode, such as GSI.
table_option is as follows:
```
{
[DEFAULT] {CHARACTER SET | CHAR SET | CHARSET} [=] charset_name
  | [DEFAULT] COLLATE [=] collation_name
  | COMMENT [=] 'string'
  | ENGINE [=] engine_name
  | ROW_FORMAT [=] {DEFAULT | DYNAMIC | FIXED | COMPRESSED | REDUNDANT | COMPACT}
}
```
- If COMMENT is set for multiple times in the same place, only the last value takes effect.
- If COLLATE is set for multiple times in the same place, only the last value takes effect.
- When ENGINE and ROW_FORMAT are used, no error message is displayed. However, the settings do not take effect.
- WITH (ORIENTATION = column) is not supported.
- WITH (HASHBUCKET = ON/OFF) is not supported.

column_constraint is as follows:

[ CONSTRAINT constraint_name ]
{ NOT NULL |
  NULL |
  CHECK ( expression ) |
  DEFAULT default_expr |
  ON UPDATE now_expr |
  UNIQUE [KEY] index_parameters |
  [ PRIMARY ] KEY index_parameters |
 }

compress_mode of a column is as follows:
```
{  DICTIONARY  }
```

table_constraint is as follows:

[ CONSTRAINT [ constraint_name ] ]
{ CHECK ( expression ) |
UNIQUE [ INDEX | KEY ] [ index_name ][ USING method ] ( { { column_name [ ( length ) ] | ( expression) } [ ASC | DESC ] } [, ... ] ) index_parameters [USING method| COMMENT 'string']|
  PRIMARY KEY [index_name][ USING method ] ( { column_name  }[ ASC | DESC ] } [, ... ] ) index_parameters [USING method| COMMENT 'string']|
  }

index_parameters is as follows:

[ WITH ( {storage_parameter = value} [,...]
[ USING INDEX TABLESPACE tablespace_name ]

like_option is as follows:

{ INCLUDING | EXCLUDING }  { DEFAULTS | CONSTRAINTS | INDEXES | STORAGE | COMMENTS
   | PARTITION | RELOPTIONS | DISTRIBUTION | UPDATE | ALL }

Range distribution rules

For slice_less_than_item:
SLICE slice_name VALUES LESS THAN ({ literal | MAXVALUE } [, ...]) [ DATANODE dn_name | ( dn_name [, ... ] ) ]
For slice_start_end_item:
SLICE slice_name_prefix {
{ START ( literal ) END ( { literal | MAXVALUE } ) } |
{ START ( literal ) } |
{ END ( { literal | MAXVALUE } ) }
}

List distribution rules are as follows:

For slice_values_item:
[SLICE slice_name VALUES (list_values_item) [DATANODE dn_name | ( dn_name [, ... ] ) ]
For list_values_item:
{ DEFAULT | { expression[, ...] } }

now_expr is as follows:

{ CURRENT_TIMESTAMP | LOCALTIMESTAMP  | LOCALTIME | NOW() }

Parameters

UNLOGGED
If this keyword is specified, the created table is an unlogged table.
- Data written to unlogged tables is not written to WALs. When the OS is restarted, the database is restarted, a switchover occurs, the power supply is cut off, or the system is shut down abnormally, the write operation on the unlogged table is automatically truncated, which may cause data loss. Contents of an unlogged table are also not replicated to standby nodes. Any indexes created on an unlogged table are not automatically logged as well.
- Unlogged tables have no data on the standby node, and an error is reported when an unlogged table is queried on the standby node.
Usage scenario: Unlogged tables do not ensure data security. Users can back up data before using unlogged tables; for example, users should back up the data before a system upgrade.

Troubleshooting: If data is missing in the indexes of unlogged tables due to an abnormal shutdown or other unexpected operations, users should rebuild indexes with errors.
GLOBAL | LOCAL
When creating a temporary table, you can specify GLOBAL or LOCAL before TEMPORARY. If the GLOBAL keyword is specified, "WARNING: GLOBAL is deprecated in temporary table creation." is reported. In M-compatible mode, a local temporary table is created.
TEMPORARY
If TEMPORARY is specified, the created table is a temporary table.

A temporary table is visible only in the current session and is automatically dropped at the end of the current session. Therefore, you can create and use temporary tables in the current session as long as the connected CN in the session is normal. Temporary tables are created only in the current session. If a DDL statement involves operations on temporary tables, a DDL error will be generated. Therefore, you are advised not to perform operations on temporary tables in DDL statements.
- Local temporary tables are visible to the current session through the schema starting with pg_temp. Users should not delete schemas starting with pg_temp or pg_toast_temp.
- If TEMPORARY is not specified when creating the table and the schema of the table is set to that starting with pg_temp_ in the current session, the table will be created as a temporary table.
- If global temporary tables or indexes are being used by other sessions, do not perform ALTER or DROP on the tables or indexes.
- The DDL of a global temporary table affects only the user data and indexes of the current session. For example, TRUNCATE, REINDEX, and ANALYZE are valid only for the current session.
- You can set the GUC parameter max_active_global_temporary_table to determine whether to enable the global temporary table function. If max_active_global_temporary_table is set to 0, the global temporary table function is disabled.
- A temporary table is visible only to the current session. Therefore, it cannot be used together with \parallel on.
- Temporary tables do not support primary/standby switchover.
- Global temporary tables do not respond to autovacuum. In persistent connection scenarios, it is advisable to use global temporary tables with the ON COMMIT DELETE ROWS clause or conduct regular vacuum operations manually. Failure to do so may result in Clogs not being recycled.
- Global temporary tables do not support the following scenarios:
  - Global temporary sequences cannot be created. Global temporary tables of each session use shared sequences, which can only ensure uniqueness but not continuity.
  - Global temporary views cannot be created.
  - Partitioned tables cannot be created.
  - Hash bucket tables cannot be created.
  - Extended statistics are not supported.
IF NOT EXISTS
Sends a notice, but does not throw an error, if a table with the same name exists.
table_name
Specifies the name of the table to be created.
column_name
Specifies the name of a column to be created in the target table.
constraint_name
Specifies the name of the constraint specified during table creation.
index_name
Index name.

For a unique key constraint, if both constraint_name and index_name are specified, the value of index_name is used as the index name.
USING method
Specifies the method of creating an index.

For details about the value range, see "USING method" in Parameters.

Currently, only the B-tree index method is supported. However, B-tree indexes cannot be created for Ustore tables.
ASC | DESC
ASC specifies an ascending (default) sort order. DESC specifies a descending sort order.
expression
Specifies an expression index constraint based on one or more columns of the table. It must be written in parentheses.
data_type
Specifies the data type of a column.
compress_mode
Specifies whether to compress a table column. The option specifies the compression algorithm preferentially used by table columns. Row-store tables do not support compression. This is unavailable in M-compatible databases.

Value range: DICTIONARY
{CHARACTER SET | CHAR SET | CHARSET} charset_name
Specifies the character set of a table column. If this parameter is specified separately, the collation of the table column is set to the default collation of the specified character set.

COLLATE collation_name

Assigns a collation to the column (which must be of a collatable data type). If no collation is specified, the default collation is used. You can run the select * from pg_collation; command to query collation rules from the pg_collation system catalog. The default collation rule is the row starting with default in the query result. The utf8mb4_bin, utf8mb4_general_ci, utf8mb4_unicode_ci, binary, gbk_chinese_ci, gbk_bin, gb18030_chinese_ci, and gb18030_bin collations are supported. For details, see Table-level Character Sets and Collations.

Only the character type supports the specified character set. If the BINARY character set or collation is specified, the character type is converted to the corresponding binary type. If the type mapping does not exist, an error is reported. Currently, only the mapping from the TEXT type to the BLOB type is available.
Except the BINARY character set and collation, only the character set that is the same as the database encoding can be specified.
If the character set or collation of a column is not explicitly specified and the default character set or collation of the table is specified, the character set or collation of the column is inherited from the table.

**Table 1** Supported character sets and collations
Collation	Character Set	Description
utf8mb4_general_ci	utf8mb4 (utf8)	The general collation is used, which is case-insensitive.
utf8mb4_unicode_ci	utf8mb4 (utf8)	The general collation is used, which is case-insensitive.
utf8mb4_bin	utf8mb4 (utf8)	The binary collation is used, which is case-sensitive.
binary	binary	The binary collation is used.
gbk_chinese_ci	gbk	The Chinese collation is used.
gbk_bin	gbk	The binary collation is used, which is case-sensitive.
gb18030_chinese_ci	gb18030	The Chinese collation is used.
gb18030_bin	gb18030	The binary collation is used, which is case-sensitive.
utf8mb4_0900_ai_ci	utf8mb4	The unicode collation algorithm (UCA) rule is used, which is case-insensitive.
utf8_general_ci	utf8	The general collation is used, which is case-insensitive.
utf8_bin	utf8	The binary collation is used, which is case-sensitive.

LIKE source_table [like_option...]
Specifies a table, known as the source table, for the target table to automatically inherit all column names, data types, constraints, and indexes.

The target table and the source table are decoupled after creation is complete. This means that any subsequent modifications to the source table will not be propagated to the target table, and data from the target table will not be included in scans of the source table.
- The copied columns and constraints are not merged based on the same name. If the same name is explicitly specified or is included in another LIKE clause, an error will be reported.
- The NOT NULL constraints of the source table will be copied to the target table.
- If INCLUDING is specified, the target table will copy the relevant content from the source table. Conversely, specifying EXCLUDING will result in the opposite behavior compared to INCLUDING.
- If INCLUDING DEFAULTS is specified, the default values in the source table are copied to the target table.
- If INCLUDING UPDATE is specified, the ON UPDATE now_expr attribute of the source table column is copied to the target table column.
- If INCLUDING CONSTRAINTS is specified, the CHECK constraints in the source table are copied to the target table.
- If INCLUDING INDEXES is specified, indexes on the source table are also created on the target table.
- If INCLUDING STORAGE is specified, the STORAGE setting of the source table column is copied to the target table.
- If INCLUDING COMMENTS is specified, comments for the copied columns, constraints, and indexes are copied.
- If INCLUDING PARTITION is specified, the partition definition in the source table is copied to the target table.
- If INCLUDING RELOPTIONS is specified, the storage parameters (WITH clauses) of the source table are copied to the target table.
- If INCLUDING DISTRIBUTION is specified, the distribution information (including the distribution type and distribution key) in the source table is copied to the target table.
- If neither INCLUDING nor EXCLUDING is specified, it is equivalent to specifying INCLUDING ALL. INCLUDING ALL includes the content of INCLUDING DEFAULTS, INCLUDING UPDATE, INCLUDING CONSTRAINTS, INCLUDING INDEXES, INCLUDING STORAGE, INCLUDING COMMENTS, INCLUDING PARTITION, INCLUDING RELOPTIONS, and INCLUDING DISTRIBUTION.
  
  Normal tables cannot be created from temporary tables.
  
  If the LIKE, WITH, PARTITION BY, and DISTRIBUTE BY clauses are included during table creation, the created table will use the WITH clause information, partition information, and distribution information specified in the SQL statement rather than those from the source table. If both CHARSET and COMMENT are specified, the relevant information from the source table will be used.
COMMENT [ = ] 'string'
- The COMMENT [ = ] 'string' clause is used to add comments to a table.
- The COMMENT 'string' in column_constraint indicates that comments are added to a column.
- The COMMENT 'string' in table_constraint indicates that comments are added to the indexes corresponding to the primary key and unique key.
- A table-level comment can contain a maximum of 2048 characters, and a column-level or index-level comment can contain a maximum of 1024 characters.
- Comments in table_constraint support only primary keys and unique keys.
WITH ( { storage_parameter = value } [, ... ] )
Specifies an optional storage parameter for a table or an index. The WITH clause used for tables can also contain OIDS=FALSE, indicating that OIDs are not allocated.

The description of parameters is as follows:
- FILLFACTOR
  The fill factor of a table is a percentage from 10 to 100. If the Ustore is used, the default value is 92. If the Astore is used, the default value is 100 (completely filled). When a smaller fill factor is specified, INSERT operations fill table pages only to the indicated percentage. The remaining space on each page is reserved for updating rows on that page. This gives UPDATE a chance to place the updated copy of a row on the same page, which is more efficient than placing it on a different page. For a table whose entries are never updated, setting the fill factor to 100 (complete filling) is the best choice, but in heavily updated tables a smaller fill factor would be appropriate.
  
  Value range: 10–100
- ORIENTATION
  Specifies the storage mode of table data. This parameter cannot be modified once it is set. Only row-store mode is supported in M-compatible mode.
  
  Value range:
  - ROW indicates that table data is stored in rows.
    Row store applies to OLTP service and scenarios with a large number of point queries or addition/deletion operations.
  Default value:
  
  If an ordinary tablespace is specified, the default is ROW.
- STORAGE_TYPE
  Specifies the storage engine type. This parameter cannot be modified once it is set.
  
  Value range:
  - USTORE indicates that tables support the in-place update storage engine. Note that the track_counts and track_activities parameters must be enabled when the Ustore table is used. Otherwise, space bloating may occur.
  - ASTORE indicates that tables support the append-only storage engine.
  Default value:
  
  If ORIENTATION and STORAGE_TYPE are not specified, Ustore is used by default, indicating that the table supports the in-place update-based storage engine.
- INIT_TD
  Specifies the number of TDs to be initialized when a Ustore table is created. This parameter can be modified by running the ALTER TABLE command. Note that this parameter affects the maximum size of a single tuple stored on the data page. The conversion method is MAX_TUPLE_SIZE = BLCKSZ – INIT_TD x TD_SIZE. For example, if you change the number of INIT_TD from 4 to 8, the maximum size of a single tuple decreases by 4 x INIT_TD.
  
  Value ranges: 2–128. The default value is 4.
- COMPRESSION
  Specifies the compression level of table data. It determines the compression ratio and time. Generally, the higher the level of compression, the higher the ratio, the longer the time; and the lower the level of compression, the lower the ratio, the shorter the time. The actual compression ratio depends on the distribution mode of table data loaded.
  
  Value range: The row-store table does not support compression. The default value is NO.
- COMPRESSLEVEL
  Specifies the table data compression ratio and duration at the same compression level. This divides a compression level into sublevels, providing more choices for compression ratio and duration. As the value becomes greater, the compression ratio becomes higher and duration longer at the same compression level.
  
  Value range: 0 to 3. The default value is 0.
- segment
  The data is stored in segment-page mode. This parameter supports only row-store tables. Protection against unauthorized deletion and damage of physical files 1 to 5 is not supported.
  
  Value range: on and off
  
  Default value: off
- enable_tde
  Specifies that the table is an encryption table. The database automatically encrypts the data in the encryption table before storing it. Before using this parameter, ensure that TDE function has been enabled using the GUC parameter enable_tde and the information for accessing the key service has been set using the GUC parameter tde_key_info. For details about how to use this parameter, see section "Transparent Data Encryption" in Feature Guide. This parameter supports only row-store tables, segment-page tables, temporary tables, and unlogged tables.
  
  Value range: on and off When enable_tde is set to on, key_type, tde_cmk_id, and dek_cipher are automatically generated by the database and cannot be manually specified or modified.
  
  Default value: off
- parallel_workers
  Number of bgworker threads started when an index is created. For example, value 2 indicates that two bgworker threads are started to create indexes concurrently.
  
  Value range: [0,32], int type. The value 0 indicates that concurrent index creation is disabled.
  
  Default value: If this parameter is not set, the concurrent index creation function is disabled.
- encrypt_algo
  Specifies the encryption algorithm of the encryption table. This parameter must be used together with enable_tde.
  
  Value range: AES_128_CTR and SM4_CTR (character string).
  
  Default value: the value assigned by table_algorithm in the tde_encrypt_config parameter.
- parallel_workers
  Number of bgworker threads started when an index is created. For example, value 2 indicates that two bgworker threads are started to create indexes concurrently.
  
  Value range: [0,32], int type. The value 0 indicates that concurrent index creation is disabled.
  
  Default value: If this parameter is not set, the concurrent index creation function is disabled.
- dek_cipher
  Specifies the DEK ciphertext. After the enable_tde parameter is set for a table, the database automatically generates a data key.
  
  Value range: character strings.
  
  Default value: empty.
- key_type
  Specifies the master key type. After the enable_tde parameter is set for a table, the database automatically obtains the master key type from the GUC parameter tde_key_info.
  
  Value range: character strings.
  
  Default value: empty.
- cmk_id
  Specifies the master key ID. After the enable_tde parameter is set for a table, the database automatically obtains the master key ID from the GUC parameter tde_key_info.
  
  Value range: character strings.
  
  Default value: empty.
- hasuids
  If this parameter is set to on, a unique table-level ID is allocated to a tuple when the tuple is updated.
  
  Value range: on and off
  
  Default value: off
- collate
  Records the default collation of tables. Generally, this parameter is used only for internal storage, import, and export. You are advised not to set or modify this parameter.
  
  Value range: OIDs in the supported collation
  
  Default value: 0
- stat_state
  Determines whether table statistics are locked. If locked, the table statistics cannot be updated.
  
  Value range: locked and unlock.
  
  Default value: unlock.
  
  Currently, the distributed 505.2 environment does not support compression algorithms, COMPRESSTYPE, COMPRESS_LEVEL, COMPRESS_CHUNK_SIZE, COMPRESS_PREALLOC_CHUNKS, COMPRESS_BYTE_CONVERT, and COMPRESS_DIFF_CONVERT options.
ON COMMIT { PRESERVE ROWS | DELETE ROWS }
ON COMMIT determines what to do when you commit a temporary table creation operation. The options are PRESERVE ROWS and DELETE ROWS.
- PRESERVE ROWS (default): No special action is taken at the ends of transactions. The temporary table and its table data are unchanged.
- DELETE ROWS: All rows in the temporary table will be deleted at the end of each transaction block.
TABLESPACE tablespace_name
Specifies the tablespace where a table is created. If not specified, the default tablespace is used.

Note: You need to create or delete tablespaces in a non-M-compatible database.
CONSTRAINT constraint_name
Specifies the name of a column or table constraint. The optional constraint clauses specify constraints that new or updated rows must satisfy for an INSERT or UPDATE operation to succeed.

There are two ways to define constraints:
- A column constraint is defined as part of a column definition, and it is bound to a particular column.
- A table constraint is not bound to a particular column and can be applied to more than one column.
NOT NULL
Specifies that the column value cannot be NULL.
NULL
Specifies that the column value can be NULL, which is the default value.

This clause is only provided for compatibility with non-standard SQL databases. It is not recommended.
CHECK ( expression )
Specifies an expression producing a Boolean result where the INSERT or UPDATE operation of new or updated rows can succeed only when the expression result is true or unknown; otherwise, an error is thrown and the database is not altered.

<>NULL and !=NULL are invalid in an expression. Change them to IS NOT NULL.
DEFAULT default_expr
- The DEFAULT clause is used to specify a default expression for a column. The default expression will be used in any insert operation that does not specify a value for the column. If there is no default value for a column, then the default value is NULL.
- The following content can be specified: constants, numeric constants with positive and negative signs, and update_expr.
- The value of update_expr can be used as the default value only for columns of the TIMESTAMP or DATETIME type, and the precision of the columns must be the same as that of update_expr.
ON UPDATE now_expr
The ON UPDATE clause is an attribute constraint of a column.

When an UPDATE operation is performed on a tuple in a table, if new values of updated columns are different from old values in the table, column values with this attribute but not in updated columns are automatically updated to the current timestamp. If new values of updated columns are the same as old values in the table, column values with this attribute but not in updated columns remain unchanged. If columns with this attribute are in updated columns, column values are updated according to the specified update value.
- In terms of syntax, update_expr supports four keywords: CURRENT_TIMESTAMP, LOCALTIMESTAMP, LOCALTIME, and NOW(). You can also specify or not specify the precision of a keyword with parentheses. For example, ON UPDATE CURRENT_TIMESTAMP(), ON UPDATE CURRENT_TIMESTAMP(5), ON UPDATE LOCALTIMESTAMP(), and ON UPDATE LOCALTIMESTAMP(6). If the keyword does not contain parentheses or contains empty parentheses, the precision is 0. The NOW keyword cannot contain parentheses. The four types of keywords are synonyms of each other and have the same attribute effect.
- This attribute can be specified on columns of the TIMESTAMP and DATETIME types.
- The precision specified by this attribute must be the same as that specified by the type in the corresponding column. Otherwise, an error is reported. For example, CREATE TABLE t1 (col1 timestamp(6) ON UPDATE CURRENT_TIMESTAMP(6));. If the precision is inconsistent, ERROR: Invalid ON UPDATE clause for "col1" is reported.
- The same column cannot be specified for this attribute and the generated column constraint at the same time.
- This attribute cannot be specified for the partition key in a partitioned table.
[DEFAULT] {CHARACTER SET | CHAR SET | CHARSET} [ = ] default_charset
Specifies the default character set of a table. If this parameter is specified separately, the default collation of the table is set to the default collation of the specified character set.
[DEFAULT] COLLATE [ = ] default_collation
Specifies the default collation of a table. If this parameter is specified separately, the default character set of the table is set to the character set corresponding to the specified collation.

For details about the collation, see Character Sets and Collations.

If the character set or collation of a table is not explicitly specified and the default character set or collation of the schema is specified, the character set or collation of the table is inherited from the schema.
Column-level unique constraint: UNIQUE [KEY] index_parameters
Specifies that a group of one or more columns of a table can contain only unique values.

UNIQUE KEY has the same semantics as UNIQUE.
Table-level unique constraint: UNIQUE [INDEX | KEY][ index_name ][ USING method ]( {{ column_name [ ( length ) ] | ( expression ) } [ ASC | DESC ] }[, ... ] ) index_parameters
Specifies that a group of one or more columns of a table can contain only unique values.

For the purpose of a unique constraint, NULL is not considered equal.

column_name (length) indicates the prefix key. For details, see •column_name ( length ).

index_name indicates the index name.

For a unique key constraint, if both constraint_name and index_name are specified, the value of index_name is used as the index name.
Column-level primary key constraint: [PRIMARY] KEY index_parameters
Table-level primary key constraint: PRIMARY KEY [index_name] [ USING method ] ( { column_name [ ASC | DESC ] } [, ... ] ) index_parameters

Specifies that a column or columns of a table can contain only unique and not-null values.

Only one primary key can be specified for a table.
USING INDEX TABLESPACE tablespace_name
Allows selection of the tablespace in which the index associated with a UNIQUE or PRIMARY KEY constraint will be created. If not specified, the index is created in default_tablespace. If default_tablespace is empty, the default tablespace of the database is used.

This attribute is not supported when the version compatibility control parameter m_format_dev_version of an M-compatible database is set to 's1' or later.
DISTRIBUTE BY
Specifies how the table is distributed or replicated between nodes.

Value range:
- REPLICATION: Each row in the table exists on all DNs, that is, each DN has complete table data.
- HASH ( column_name ): Each row of the table will be placed into specified DNs based on the hash value of the specified column.
- RANGE(column_name): maps a specified column based on the range and distributes data to the corresponding DNs.
- LIST(column_name): maps a specified column based on a specific value and distributes data to the corresponding DNs.
  - For HASH distribution, the maximum number of distribution keys is the same as that of columns. A maximum of 1600 distribution keys are supported. For RANGE (VALUE LESS THAN) and LIST distributions, the distribution key supports a maximum of four columns. For RANGE (START END) distribution, the distribution key supports only one column.
  - For a RANGE distribution policy using the VALUE LESS THAN clause, the distribution rules are as follows:
    
    The comparison starts from the first column of values to be inserted.
    
    If the value of the inserted first column is smaller than the boundary value of the current column in the local slice, the values are directly inserted.
    
    If the value of the inserted first column is equal to the boundary value of the current column in the local slice, compare the value of the inserted second column with the boundary value of the next column in the local slice. If the value of the inserted second column is smaller than the boundary value of the next column in the local slice, the values are directly inserted. If they are equal, the comparison of the next columns between the source and target continues.
    
    If the value of the inserted first column is greater than the boundary value of the current column in the local slice, compare the value with that in the next slice.
  - If the shard of the RANGE distributed table corresponds to multiple DNs, the hash value of the distribution key is calculated to perform modulo operation on the number of DN and map a new DN. For details, see the example.
  - If the shard of the list distributed table corresponds to multiple DNs, the hash value of the distribution key is calculated for the default shards. Then, use the calculated hash value to perform modulo operation on the number of DN and map a new DN. For non-default shards, the round-robin method is used to map values in the values list to DNs. For details, see the example.
  - The RANGE/LIST distributed tables support only scale-out but not scale-in. For details about the slice scale-out rule, contact the administrator.
For the hash distribution, column_name supports the following data types:
- INTEGER TYPES includes TINYINT, SMALLINT, INTEGER, MEDIUMINT, BIGINT, TINYINT UNSIGNED, SMALLINT UNSIGNED, INTEGER UNSIGNED, MEDIUMINT UNSIGNED, BIGINT UNSIGNED, NUMERIC[(p[,s])], and DECIMAL[(p[,s])].
- CHARACTER TYPES include CHAR[(n)], VARCHAR[(n)], TINYTEXT, TEXT, MEDIUMTEXT, and LONGTEXT.
- DATE/TIME TYPES include DATE, TIME[(p)], TIMESTAMP[(p)], DATETIME[(p)], and YEAR.
For the RANGE (VALUES LESS THAN) or LIST distribution, column_name supports the following data types:
- INTEGER TYPES include TINYINT, SMALLINT, INT, MEDIUMINT, BIGINT, TINYINT UNSIGNED, SMALLINT UNSIGNED, INTEGER UNSIGNED, MEDIUMINT UNSIGNED, BIGINT UNSIGNED, NUMERIC[(p[,s])], and DECIMAL[(p[,s])].
- CHARACTER TYPES include CHAR[(n)], VARCHAR[(n)], BINARY[(n)], VARBINARY(n), TINYTEXT, TEXT, MEDIUMTEXT, and LONGTEXT.
- DATE/TIME TYPES include DATE, TIMESTAMP[(p)], DATETIME[(p)], and YEAR.
For the range (start end) distribution, the data type of column_name must be one of the following:
- INTEGER TYPES include TINYINT, SMALLINT, INT, MEDIUMINT, BIGINT, TINYINT UNSIGNED, SMALLINT UNSIGNED, INTEGER UNSIGNED, MEDIUMINT UNSIGNED, BIGINT UNSIGNED, NUMERIC[(p[,s])], and DECIMAL[(p[,s])].
- DATE/TIME TYPES include DATE, TIMESTAMP[(p)], DATETIME[(p)], and YEAR.
When you create a table, the choices of distribution keys and partition keys have major impact on SQL query performance. Therefore, select appropriate distribution keys and partition keys with strategies.
- Select appropriate distribution keys.
  A hash table's distribution key should evenly distribute data on each DN to prevent skewing the data or distributing it unevenly across DNs. Determine appropriate distribution keys based on the following principles:
  
  Determine whether data is skewed.
  Connect to the database and check the number of tuples on each DN. Replace tablename with the actual name of the table to be analyzed.
  
  gaussdb=# SELECT a.count,b.node_name FROM (SELECT count(*) AS count,xc_node_id FROM tablename GROUP BY xc_node_id) a, pgxc_node b WHERE a.xc_node_id=b.node_id ORDER BY a.count DESC;
  
  If tuple numbers vary greatly (several times or tenfold) on each DN, a data skew occurs. Change the data distribution key based on the following principles:
  
  Rebuild a table to change its distribution keys. ALTER TABLE cannot change distribution keys. Therefore, you need to rebuild a table when changing its distribution keys.
  Principles for selecting distribution keys are as follows:
  
  The value of the distribution key should be discrete so that data can be evenly distributed on each DN. You can select the primary key of the table as the distribution key. For example, for a person information table, choose the ID card number column as the distribution key.
  
  With the above principles are met, you can select join conditions as distribution keys so that join tasks can be pushed down to DNs, reducing the amount of data transferred between the DNs.
- Select appropriate partition keys.
  In range partitioning, a table is partitioned based on ranges defined by one or more columns, with no overlap between the ranges of values assigned to different partitions. Each range has a dedicated partition for data storage.
  
  Modify partition keys to make the query result stored in the same or least partitions (partition pruning). Obtain consecutive I/O to improve the query performance.
  
  In actual services, time is used to filter query objects. Therefore, you can use time as a partition key, and change the key value based on the total data volume and data volume of a single query.
- RANGE/LIST distribution
  If no DN is specified for the shards of a RANGE/LIST distributed table, the database uses the Round Robin algorithm to allocate DNs to the shards. In addition, if RANGE/LIST distribution is used, you are advised to define as many shards as possible when creating a table for future capacity expansion. If the defined number of shards is less than the number of DNs before scale-out, data redistribution cannot be performed on new DNs. Note that the sharding rules are designed by users. In some extreme cases, scale-out may not solve the problem of insufficient storage space.
TO { GROUP groupname | NODE ( nodename [, ... ] ) }
TO GROUP specifies the node group to which the table to be created belongs. TO NODE is used for internal scale-out tools.
INTERNAL DATA data_content
INTERNAL DATA data_content is only for internal scale-out.

Examples

Temporary table

-- Create a temporary table.
m_db=# CREATE TEMPORARY TABLE test_t1(
    id       CHAR(7),
    name     VARCHAR(20),
    province VARCHAR(60),                       -- Province
    country   VARCHAR(30) DEFAULT 'China'        -- Country
);

-- Insert data to the current session.
m_db=# INSERT INTO test_t1 VALUES ('0000009','Jack','Guangzhou','China');

-- Data in the temporary table is valid only in the current transaction. Therefore, the table does not contain data in another session.
m_db=# SELECT * FROM test_t1;
 id | name | age 
----+------+-----
(0 rows)

-- Create a temporary table in a transaction and specify that this table is deleted when the transaction is committed.
m_db=# START TRANSACTION;
m_db=# CREATE TEMPORARY TABLE test_t2(
    id       CHAR(7),
    name     VARCHAR(20),
    province VARCHAR(60),                       -- Province
    country   VARCHAR(30) DEFAULT 'China'        -- Country
) ON COMMIT DELETE ROWS;
m_db=# INSERT test_t2 VALUES('aid','aname','aprovince','China');
m_db=# COMMIT;
m_db=# SELECT * FROM test_t2;
 id | name | province | country
----+------+----------+---------
(0 rows)
-- Drop the table.
m_db=# DROP TABLE test_t1;
m_db=# DROP TABLE test_t2;

IF NOT EXIST keyword

If this keyword is used, NOTICE is reported when the table does not exist. If this keyword is not used, ERROR is reported. In either case, the table fails to be created.

m_db=# CREATE TABLE test_t3(id INT);
-- Create a table named test_t3.
m_db=# CREATE TABLE test_t3(id INT);
ERROR:  relation "test_t3" already exists in schema "public"
DETAIL:  creating new table with existing name in the same schema

-- Use the IF NOT EXISTS keyword.
m_db=# CREATE TABLE IF NOT EXISTS test_t3(id INT);
NOTICE:  relation "test_t3" already exists, skipping
CREATE TABLE

-- Drop the table.
m_db=# DROP TABLE test_t3;

Create a table using CREATE TABLE ... LIKE.

-- Create the source table t1.
m_db=# CREATE TABLE t1(col INT);
CREATE TABLE

m_db=# \d t1
      Table "public.t1"
 Column |  Type   | Modifiers 
--------+---------+-----------
 col    | integer |

-- Create the target table t2.
m_db=# CREATE TABLE t2(LIKE t1);
CREATE TABLE

m_db=# \d t2
      Table "public.t2"
 Column |  Type   | Modifiers 
--------+---------+-----------
 col    | integer | 

-- Create the target table t3.
m_db=# CREATE TABLE t3 LIKE t1;
CREATE TABLE

m_db=# \d t3
      Table "public.t3"
 Column |  Type   | Modifiers 
--------+---------+-----------
 col    | integer |

Examples of Creating a Table and Adding Constraints to the Table

NOT NULL constraints

If no value is specified for a column with a NOT NULL constraint when data is added, an error is reported. You can add NOT NULL constraints to multiple columns in a table.

-- Create a table and add a NOT NULL constraint to the id column.
m_db=# CREATE TABLE test_t4(
    id       CHAR(7) NOT NULL,
    name     VARCHAR(20),
    province VARCHAR(60),                       -- Province
    country   VARCHAR(30) DEFAULT 'China'        -- Country
);

-- If the value of id is not specified or is NULL during data insertion, the NOT NULL constraint is triggered. As a result, the insertion fails.
m_db=# INSERT INTO test_t4 (name,province) VALUES ('scott','Shanghai');
ERROR:  null value in column "id" violates not-null constraint
DETAIL:  Failing row contains (null, scott, Shanghai, China) 

-- Drop the table.
m_db=# DROP TABLE test_t4;

Unique constraint

The keyword UNIQUE is used to add a unique constraint to a column. When data is inserted, the constraint is triggered if the column is duplicate. Multiple NULL values are not duplicate values. When a unique constraint is added, a unique index is automatically added. You can add unique constraints to multiple columns in a table.

-- Create a table and add a unique constraint.
m_db=# CREATE TABLE test_t5(
    id       CHAR(7) UNIQUE,
    name     VARCHAR(20),
    province VARCHAR(60),                       -- Province
    country   VARCHAR(30) DEFAULT 'China'        -- Country
);

-- You can also use the following method to manually name unique constraints and add constraints to multiple columns:
m_db=# CREATE TABLE test_t6(
    id       CHAR(7),
    name     VARCHAR(20),
    province VARCHAR(60),                        -- Province
    country   VARCHAR(30) DEFAULT 'China',        -- Country
    CONSTRAINT unq_test_id UNIQUE (id,name)
);

-- When data with duplicate IDs is inserted, constraints are triggered. As a result, the insertion fails.
m_db=# INSERT INTO test_t5(id) VALUES('0000010');
INSERT 0 1
m_db=# INSERT INTO test_t5(id) VALUES('0000010');
ERROR:  duplicate key value violates unique constraint "test_t5_id_key"
DETAIL:  Key (id)=(0000010) already exists.

-- The constraint is not triggered when data whose id is NULL is inserted for multiple times.
m_db=# INSERT INTO test_t5(id) VALUES (NULL);
INSERT 0 1
m_db=# INSERT INTO test_t5(id) VALUES (NULL);
INSERT 0 1

m_db=# SELECT * FROM test_t5;
   id    | name | province | country 
---------+------+----------+--------
 0000010 |      |          | China
         |      |          | China
         |      |          | China

-- Drop the table.
m_db=# DROP TABLE test_t5;
m_db=# DROP TABLE test_t6;

Primary key constraints

The keyword PRIMARY KEY is used to add a primary key constraint to a column. The column must be unique and cannot be null. When a primary key constraint is added, a unique index is automatically created for the table, and a NOT NULL constraint is automatically added for the column.

Only one primary key constraint can be defined in each table.

-- Create a table and add a primary key constraint to the table.
m_db=# CREATE TABLE test_t6(
    id       CHAR(7) PRIMARY KEY,
    name     VARCHAR(20),
    province VARCHAR(60),                       -- Province
    country   VARCHAR(30) DEFAULT 'China'        -- Country
);
m_db=# INSERT INTO test_t6 (id,name,province) VALUES ('0000001','july','Beijing');

-- You can also use the following method to manually name unique constraints and add constraints to multiple columns:
m_db=# CREATE TABLE test_t7(
    id       CHAR(7),
    name     VARCHAR(20),
    province VARCHAR(60),                       -- Province
    country   VARCHAR(30) DEFAULT 'China',        -- Country
    CONSTRAINT pk_test_t6_id PRIMARY KEY (id,name)
);

-- Insert data whose id is NULL to trigger the constraint.
m_db=# INSERT INTO test_t6 (id,name,province) VALUES (NULL,'july','Beijing'); 
ERROR:  null value in column "id" violates not-null constraint
DETAIL:  Failing row contains (null, july, Beijing, China).

-- Insert data with duplicate id values to trigger the constraint.
m_db=# INSERT INTO test_t6 (id,name,province) VALUES ('0000001','ben','Shanghai');
ERROR:  duplicate key value violates unique constraint "test_t6_pkey"
DETAIL:  Key (id)=(0000001) already exists.

-- Drop the table.
m_db=# DROP TABLE test_t6;
m_db=# DROP TABLE test_t7;

Check constraints

The keyword CHECK adds a check constraint to a column. The check constraint must reference one or more columns in the table, and the result returned by the expression must be a Boolean value. Currently, expressions cannot contain subqueries. Both CHECK and NOT NULL constraints can be defined for the same column.

-- Create a table and add check constraints.
m_db=# CREATE TABLE test_t8 (
    id       CHAR(7),
    name     VARCHAR(20),
    age      INT CHECK(age > 0 AND age < 150)
);

-- You can also use the following SQL statements to name a check constraint and add the check constraint for one or more columns:
m_db=# CREATE TABLE test_t9 (
    id       CHAR(7),
    name     VARCHAR(20),
    age      INT,
    CONSTRAINT chek_test_t8_age CHECK(age > 0 AND age < 150)
);

-- If a value that does not comply with the expression is inserted, the check constraint is triggered. As a result, the insertion fails.
m_db=# INSERT INTO test_t8 (id,name,age) VALUES ('0000007','scott',200);
ERROR:  new row for relation "test_t8" violates check constraint "test_t8_age_check"
DETAIL:  N/A

-- Drop the table.
m_db=# DROP TABLE test_t8;
m_db=# DROP TABLE test_t9;

Data Distribution Example

REPLICATION

gaussdb=# CREATE TABLE test_replication(
    id       CHAR(7),
    name     VARCHAR(20),
    province VARCHAR(60),                       -- Province
    country   VARCHAR(30) DEFAULT 'China'        -- Country
)DISTRIBUTE BY REPLICATION;

-- Query table information.
gaussdb=# \d+ test_replication
                                        Table "public.test_replication"
  Column  |         Type          |             Modifiers              | Storage  | Stats target | Description 
----------+-----------------------+------------------------------------+----------+--------------+-------------
 id       | character(7)          |                                    | extended |              | 
 name     | character varying(20) |                                    | extended |              | 
 province | character varying(60) |                                    | extended |              | 
 country  | character varying(30) | default 'China'::character varying | extended |              | 
Has OIDs: no
Distribute By: REPLICATION
Location Nodes: ALL DATANODES
Options: orientation=row, logical_repl_node=-1, compression=no

-- Drop.
gaussdb=# DROP TABLE test_replication;

HASH

-- Define a hash table.
gaussdb=# CREATE TABLE test_hash(
    id       CHAR(7),
    name     VARCHAR(20),
    province VARCHAR(60),                       -- Province
    country   VARCHAR(30) DEFAULT 'China'        -- Country
)DISTRIBUTE BY HASH(id);

-- Insert data.
gaussdb=# INSERT INTO test_hash VALUES ('0000001', 'Bob',   'Shanghai', 'China'),
                            ('0000002', 'Jack',  'Beijing',  'China'), 
                            ('0000003', 'Scott', 'Beijing',  'China');

-- View data distribution.
gaussdb=# SELECT a.count,b.node_name 
FROM (SELECT COUNT(*) AS count, xc_node_id FROM test_hash GROUP BY xc_node_id) a, 
     pgxc_node b 
WHERE a.xc_node_id=b.node_id ORDER BY a.count DESC;
 
count |     node_name     
-------+-------------------
     2 | dn_6001_6002_6003
     1 | dn_6004_6005_6006

-- Drop the table.
gaussdb=# DROP TABLE test_hash;

RANGE

-- Define a range-distributed table. (Change the DN name based on the actual situation. You can run the SELECT node_name FROM PGXC_NODE WHERE node_type = 'D' command to query the DN name.)
gaussdb=# CREATE TABLE test_range(
    id       INT,
    name     VARCHAR(20),
    province VARCHAR(60),                       -- Province
    country   VARCHAR(30) DEFAULT 'China'        -- Country
)DISTRIBUTE BY RANGE(id)(
    SLICE s1 VALUES LESS THAN (100) DATANODE dn_6001_6002_6003,
    SLICE s2 VALUES LESS THAN (200) DATANODE dn_6004_6005_6006,
    SLICE s3 VALUES LESS THAN (MAXVALUE) DATANODE dn_6007_6008_6009
);

-- Insert data.
gaussdb=# INSERT INTO test_range VALUES (52,  'Bob',    'Beijing',   'China');
gaussdb=# INSERT INTO test_range VALUES (100, 'Ben',    'Shanghai',  'China');
gaussdb=# INSERT INTO test_range VALUES (150, 'Scott',  'Guangzhou', 'China');
gaussdb=# INSERT INTO test_range VALUES (300, 'Jordan', 'Beijing',   'China');

-- View data distribution.
gaussdb=# SELECT a.count,b.node_name 
FROM (SELECT COUNT(*) AS count, xc_node_id FROM test_range GROUP BY xc_node_id) a, 
     pgxc_node b 
WHERE a.xc_node_id=b.node_id ORDER BY a.count DESC;
 count |     node_name     
-------+-------------------
     2 | dn_6004_6005_6006
     1 | dn_6001_6002_6003
     1 | dn_6007_6008_6009
(3 rows)

-- Query data stored on each DN.
gaussdb=# SELECT b.node_name, a.* 
FROM (SELECT *, xc_node_id FROM test_range) a, 
     pgxc_node b 
WHERE a.xc_node_id=b.node_id order by node_name;
     node_name     | id  |  name  | province  | country | xc_node_id  
-------------------+-----+--------+-----------+---------+-------------
 dn_6001_6002_6003 |  52 | Bob    | Beijing   | China   | -1072999043
 dn_6004_6005_6006 | 100 | Ben    | Shanghai  | China   |  -564789568
 dn_6004_6005_6006 | 150 | Scott  | Guangzhou | China   |  -564789568
 dn_6007_6008_6009 | 300 | Jordan | Beijing   | China   |  1532339558
(4 rows)

-- Drop the table.
gaussdb=# DROP TABLE test_range;

LIST

-- Define a list-distributed table. (Change the DN name based on the actual situation. You can run the SELECT node_name FROM PGXC_NODE WHERE node_type = 'D' command to query the DN name.)
gaussdb=# CREATE TABLE test_list(
    id       INT,
    name     VARCHAR(20),
    country   VARCHAR(30) DEFAULT 'China'        -- Country
)DISTRIBUTE BY LIST(country)(
    SLICE s1 VALUES ('China') DATANODE dn_6001_6002_6003,
    SLICE s2 VALUES ('USA')   DATANODE dn_6004_6005_6006,
    SLICE s3 VALUES (DEFAULT) DATANODE dn_6007_6008_6009
);

-- Insert data.
gaussdb=# INSERT INTO test_list VALUES (1,'Scott','China');
gaussdb=# INSERT INTO test_list VALUES (2,'Henry','USA');
gaussdb=# INSERT INTO test_list VALUES (3,'Michael','France');
gaussdb=# INSERT INTO test_list VALUES (4,'Jack','UK');

-- Query data stored on each DN.
gaussdb=# SELECT b.node_name, a.* 
FROM (SELECT *, xc_node_id FROM test_list) a, 
     pgxc_node b 
WHERE a.xc_node_id=b.node_id order by node_name;
     node_name     | id |  name   | country | xc_node_id  
-------------------+----+---------+--------+-------------
 dn_6001_6002_6003 |  1 | Scott   | China  | -1072999043
 dn_6004_6005_6006 |  2 | Henry   | USA    |  -564789568
 dn_6007_6008_6009 |  3 | Michael | France |  1532339558
 dn_6007_6008_6009 |  4 | Jack    | UK     |  1532339558
(4 rows)

-- Drop the table.
gaussdb=# DROP TABLE test_list;

Helpful Links

ALTER TABLE and DROP TABLE

Suggestions

UNLOGGED
- The unlogged table and its indexes do not use the WAL mechanism during data writing. Their write speed is much higher than that of ordinary tables. Therefore, they can be used for storing intermediate result sets of complex queries to improve query performance.
- The unlogged table has no primary/standby mechanism. In case of system faults or abnormal breakpoints, data loss may occur. Therefore, the unlogged table cannot be used to store basic data.
TEMPORARY
- A temporary table is visible only in the current session and is automatically dropped at the end of the current session.
LIKE
- The target table automatically inherits all column names, data types, and NOT NULL constraints from this table. The target table is irrelevant to the source table after the creation.
ORIENTATION ROW
- Creates a row-store table. Row store applies to the OLTP service, which has many interactive transactions. An interaction involves many columns in the table. Using row store can improve the efficiency.
DISTRIBUTE BY
- It is recommended that a fact table or dimension table containing a large amount of data be created as a distributed table. Each row of the table will be placed into specified DNs based on the hash value of the specified column. The syntax is DISTRIBUTE BY HASH(column_name).
- It is recommended that a dimension table containing a small amount of data be created as a replication table. Each row in the table exists on all DNs. That is, each DN has complete table data. The syntax is DISTRIBUTE BY REPLICATION.