CREATE TABLE
Description
Creates an initially empty table in the current database. The table will be owned by the creator. Row-store tables are created by default.
Precautions
- If a PRIMARY KEY constraint or UNIQUE constraint is added to a non-distribution key, a global secondary index is created by default.
- By default, HASH(column_name) is used in a distributed system. column_name is the primary key column or UNIQUE constraint column (if any) of the table, or the first column of a data type that can be used as the distribution key. The priority is as follows: primary key column > UNIQUE constraint column > first column of a data type that can be used as the distribution key. If a primary key column and multiple UNIQUE constraint columns exist and the distribution mode of the table is not specified, GSIs are created for other UNIQUE constraint columns by default after the distribution key of the table is determined based on the priority.
NOTE:
In MySQL-compatible mode (sql_compatibility set to 'MYSQL'), if a distribution key of the character type is used, you are advised not to use case-insensitive collations (such as utf8mb4_general_ci, utf8mb4_unicode_ci, and utf8mb4_0900_ai_ci). Otherwise, data skew may occur and data cannot be evenly distributed. The utf8mb4_bin collation is recommended.
- Distribution keys can be updated only when enable_update_distkey is set to on or K-means distribution is used, and constraints are met. Otherwise, distribution keys do not support the UPDATE operation.
NOTE:
The constraints on updating distribution keys are as follows:
- Distribution keys can be updated only when enable_update_distkey is set to on or K-means distribution is used.
- The UPDATE statement cannot be pushed down to DNs for execution. The PGXC plan is directly generated and is not changed based on the values before and after the distribution key is updated.
- Tables with row-level UPDATE TRIGGER are not supported. Otherwise, the execution fails and an error message is displayed. The row-level INSERT/DELETE TRIGGER does not take effect, and the update-statement-level TRIGGER is executed normally.
- Concurrent update of the same row is not supported. You need to obtain a lock first. The result (returning 0 or reporting an error) of obtaining a lock on DNs depends on the setting of the GUC parameter concurrent_dml_mode. An error will be reported in the following cases: (1) The error message "update distribute column conflict" is displayed. (2) When the time spent for obtaining a lock exceeds the threshold, an error message is displayed, indicating that the lock times out.
- Tables with global secondary indexes (GSIs) are not supported. Otherwise, an error is reported.
- Only Hash or K-means distribution is supported. LIST/RANGE distributed tables are not supported. Otherwise, an error is reported.
- MERGE INTO and UPSERT cannot be used to update distribution keys. Otherwise, an error is reported.
- GTM_FREE is not supported. Otherwise, an error is reported.
- UPDATE RETURNING is not supported. Otherwise, an error is reported.
- Statements with joined tables are not supported. Otherwise, an error is reported.
- UPDATE+LIMIT is not supported. Otherwise, an error is reported.
- If an error occurs during table creation, after it is fixed, the system may fail to delete the empty disk files (whose size is 0) created before the last automatic clearance. This problem seldom occurs and does not affect system running of the database.
- When JDBC is used, the DEFAULT value can be set through PreparedStatement.
- Row-store tables do not support foreign key as the table-level constraint.
- A user with the CREATE ANY TABLE permission can create tables in the public and user schemas. To create a table that contains serial columns, you must also obtain the CREATE ANY SEQUENCE permission to create sequences.
- The XML type cannot be used as a primary key or foreign key.
- If you add a row-level expression when adding or changing an ILM policy for a data object, note that the row-level expression supports only the functions listed in the whitelist. For details about the whitelist function list, see Row Expression Function Whitelist.
NOTE:
Compatibility parameters may affect the output of some functions supported by the ON(EXPR) row-level expression of the ILM policy. For example, after b_format_version and b_format_dev_version are set to '5.7' and 's2' respectively for the upper function in MySQL-compatible mode, uppercase conversion will be unavailable.
NOTICE:
If the database creates unlimited tables, CNs may be affected as follows:
- Resource exhaustion: Each table occupies certain disk space. Unlimited table creation will occupy a large amount of memory and disk space, which may exhaust CN resources. As a result, the system breaks down or becomes unstable.
- Performance deterioration: Unlimited table creation causes a large number of I/O operations and CPU computing, and the metadata information of the database becomes large, which may deteriorate the CN performance, including operations such as insert, query, update, and deletion. As a result, the system responds slowly or cannot meet service requirements.
- Security issues: Excessive tables make database management and maintenance difficult. Creating unlimited tables may cause security issues such as data leakage or data loss. Database stability decreases, causing immeasurable loss to enterprises.
Therefore, plan the number and size of tables in the database properly to avoid unlimited table creation and ensure system stability, reliability, and security.
- The number of table constraints cannot exceed 32,767.
Syntax
- Create a table.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
CREATE [ [ GLOBAL | LOCAL ] { TEMPORARY | TEMP } | UNLOGGED ] TABLE [ IF NOT EXISTS ] table_name { ( { column_name data_type [ CHARACTER SET | CHARSET charset ] [ compress_mode ] [ COLLATE collation ] [ column_constraint [ ... ] ] | table_constraint | LIKE source_table [ like_option [...] ] } [, ... ] ) | LIKE source_table } [ table_option [ [ , ] ... ] ] [ htap_option ] [ WITH ( {storage_parameter = value} [, ... ] ) ] [ ON COMMIT { PRESERVE ROWS | DELETE ROWS } ] [ ILM ADD POLICY ROW STORE COMPRESS ADVANCED ROW AFTER n { day | month | year } OF NO MODIFICATION [ ON ( EXPR )]] [ TABLESPACE tablespace_name ] [ DISTRIBUTE BY { REPLICATION | HASH ( column_name [, ...] ) | MURMURHASH ( expression ) | KMEANS ( column_name ) distance_type | RANGE ( column_name [, ...] ) { TO SLICEGROUP slicegroupname | SLICE REFERENCES tablename | ( slice_less_than_item [, ...] ) | ( slice_start_end_item [, ...] ) } | LIST ( column_name [, ...] ) { SLICE REFERENCES tablename | ( slice_values_item [, ...] ) } } ] [ TO { GROUP groupname | NODE ( nodename [, ... ] ) } ];
- table_option is as follows:
{ COMMENT [ = ] 'string' | AUTO_INCREMENT [ = ] value | [ DEFAULT ] CHARACTER SET | CHARSET [ = ] default_charset | [ DEFAULT ] COLLATE [ = ] default_collation | ENGINE [ = ] { InnoDB | 'InnoDB' | "InnoDB" } }
- htap_option is as follows:
{ COLVIEW [ PRIORITY { HIGH | LOW | NONE } ] | NOCOLVIEW [ PRIORITY { HIGH | LOW | NONE } ]}
- column_constraint is as follows:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
[ CONSTRAINT constraint_name ] { NOT NULL | NULL | CHECK ( expression ) | DEFAULT default_expr | ON UPDATE update_expr | AUTO_INCREMENT | COMMENT 'string' | COLVIEW | NOCOLVIEW | UNIQUE [KEY] index_parameters | PRIMARY KEY index_parameters | ENCRYPTED WITH ( COLUMN_ENCRYPTION_KEY = column_encryption_key, ENCRYPTION_TYPE = encryption_type_value ) | REFERENCES reftable [ ( refcolumn ) ] [ MATCH FULL | MATCH PARTIAL | MATCH SIMPLE ] [ ON DELETE action ] [ ON UPDATE action ] } [ DEFERRABLE | NOT DEFERRABLE | INITIALLY DEFERRED | INITIALLY IMMEDIATE ]
- compress_mode of a column is as follows:
1
{ DELTA | PREFIX | DICTIONARY | NUMSTR | NOCOMPRESS }
- table_constraint is as follows:
1 2 3 4 5 6
[ CONSTRAINT [ constraint_name ] ] { CHECK ( expression ) | UNIQUE [ index_name ] [ USING method ] ( { { column_name [ ( length ) ] | ( expression ) } [ ASC | DESC ] }[, ... ] ) index_parameters | PRIMARY KEY [ USING method ] ( { column_name [ ASC | DESC ] } [, ... ] ) index_parameters [ DEFERRABLE | NOT DEFERRABLE | INITIALLY DEFERRED | INITIALLY IMMEDIATE ] { [ COMMENT 'string' ] [ ... ] }
- like_option is as follows:
1
{ INCLUDING | EXCLUDING } { DEFAULTS | GENERATED | CONSTRAINTS | INDEXES | STORAGE | COMMENTS | PARTITION | RELOPTIONS | DISTRIBUTION | UPDATE | ILM | ALL }
- index_parameters is as follows:
1 2
[ WITH ( {storage_parameter = value} [, ... ] ) ] [ USING INDEX TABLESPACE tablespace_name ]
- distance_type is as follows:
1
{ L2 | COSINE | HAMMING }
- Range distribution rules
For slice_less_than_item: SLICE name VALUES LESS THAN ({ expression | MAXVALUE } [, ...]) [ DATANODE datanode_name | ( datanode_name_list [, ... ] ) ] For slice_start_end_item: SLICE name { { START ( expression ) END ( expression ) EVERY ( expression ) } | { START ( literal ) END ( { literal | MAXVALUE } ) } | { START ( literal ) } | { END ( { literal | MAXVALUE } ) } }
- The LIST distribution rule slice_values_item is as follows:
[ ( SLICE name VALUES (expression [, ... ]) [DATANODE datanode_name | ( datanode_name_list )] [, ... ] ) | ( SLICE name VALUES (DEFAULT) [DATANODE datanode_name] | ( datanode_name_list ) ) ]
- update_expr is as follows:
{ CURRENT_TIMESTAMP | LOCALTIMESTAMP | NOW() }
- table_option is as follows:
Parameters
- UNLOGGED
If this keyword is specified, the created table is an unlogged table. Data written to unlogged tables is not written to the WALs, which makes them considerably faster than ordinary tables. However, an unlogged table is automatically truncated after conflicts, OS restart, database restart, switchover, power-off, or abnormal shutdown, incurring data loss risks. Contents of an unlogged table are also not replicated to standby nodes. Any indexes created on an unlogged table are not automatically logged as well.
Usage scenario: Unlogged tables do not ensure data security. Users can back up data before using unlogged tables; for example, users should back up the data before a system upgrade.
Troubleshooting: If data is missing in the indexes of unlogged tables due to some unexpected operations such as an unclean shutdown, users should rebuild the indexes with errors.
- GLOBAL | LOCAL
When creating a temporary table, you can specify the GLOBAL or LOCAL keyword before TEMP or TEMPORARY. If the keyword GLOBAL is specified, GaussDB creates a global temporary table. Otherwise, it creates a local temporary table.
- TEMPORARY | TEMP
If TEMP or TEMPORARY is specified, the created table is a temporary table. Temporary tables are classified into global temporary tables and local temporary tables. If the keyword GLOBAL is specified when a temporary table is created, the table is a global temporary table. Otherwise, the table is a local temporary table.
The metadata of the global temporary table is visible to all sessions. After the sessions end, the metadata still exists. The user data, indexes, and statistics of a session are isolated from those of another session. Each session can only view or modify the data committed by itself. Global temporary tables can be created using ON COMMIT PRESERVE ROWS or ON COMMIT DELETE ROWS. The former creates session-level tables, where user data is automatically cleared when a session ends; the latter creates transaction-level tables, where user data is automatically cleared when a COMMIT or ROLLBACK operation is performed. If the ON COMMIT option is not specified during table creation, the session level is used by default. Different from local temporary tables, you can specify a schema when creating a global temporary table.
A local temporary table is visible only in the current session and is automatically dropped at the end of the session. Therefore, you can create and use temporary tables in the current session as long as the connected CN in the session is normal. Temporary tables are created only in the current session. If a DDL statement involves operations on temporary tables, a DDL error will be generated. Therefore, you are advised not to perform operations on temporary tables in DDL statements. TEMP is equivalent to TEMPORARY.
NOTICE:
- Local temporary tables are visible to the current session through the schema starting with pg_temp. Users should not delete schema started with pg_temp or pg_toast_temp.
- If TEMPORARY or TEMP is not specified when you create a table but its schema is set to that starting with pg_temp in the current session, the table will be created as a temporary table.
- If global temporary tables and indexes are being used by other sessions, do not perform ALTER or DROP (except the ALTER INDEX index_name REBUILD command).
- The DDL of a global temporary table affects only the user data and indexes of the current session. For example, TRUNCATE, REINDEX, and ANALYZE are valid only for the current session.
- You can set the GUC parameter max_active_global_temporary_table to determine whether to enable the global temporary table function. If max_active_global_temporary_table is set to 0, the global temporary table function is disabled.
- A temporary table is visible only to the current session. Therefore, it cannot be used together with \parallel on.
- Temporary tables do not support DN faults or primary/standby switchovers.
- Global temporary tables do not respond to automatic vacuum. In long link scenarios, you are advised to use global temporary tables of ON COMMIT DELETE ROWS or manually execute VACUUM periodically. Otherwise, Clogs may not be recycled. (You can set vacuum_gtt_defer_check_age to generate an alarm if global temporary tables are not cleared for a long time.)
- Global temporary tables do not support the following scenarios:
- Global temporary sequences cannot be created. Global temporary tables of each session use shared sequences, which can only ensure uniqueness but not continuity.
- Global temporary views cannot be created.
- Partitioned tables cannot be created.
- Hash bucket tables cannot be created.
- Extended statistics are not supported.
- The ON COMMIT DROP attribute is not supported.
- GSIs are not supported.
- M compatibility is not supported.
- Reading on the distributed standby node is not supported.
- Stream plans are not supported.
- Scaling is not supported, because scaling rollback may cause data loss and the data will be cleared after scaling.
- If a DN switchover, DN fault, or CN fault occurs, data may be lost. (Global temporary tables do not have Xlogs, so data is not synchronized between primary and standby nodes.)
- IF NOT EXISTS
Sends a notice, but does not throw an error, if a table with the same name exists.
- [ ILM ADD POLICY ROW STORE COMPRESS ADVANCED ROW AFTER n { day | month | year } OF NO MODIFICATION [ ON ( EXPR )]]
When creating a table, you can call ILM ADD POLICY ROW STORE COMPRESS ADVANCED ROW to add an advanced compression policy for row store. For example, CREATE TABLE t1 (a int) ILM ADD POLICY ROW STORE COMPRESS ADVANCED ROW AFTER 3 DAY OF NO MODIFICATION ON (a != 0) indicates that the t1 table is created and the advanced compression policy is added: rows that are not modified in three days and a != 0 is specified.
- AFTER n { day | month | year } OF NO MODIFICATION: indicates the rows that are not modified in n days, months, or years.
- ON (EXPR): indicates the row-level expression, which is used to determine whether a row is hot or cold.
NOTE:
Compatibility parameters may affect the output of some functions supported by the ON(EXPR) row-level expression of the ILM policy. For example, after b_format_version and b_format_dev_version are set to '5.7' and 's2' respectively for the upper function in MySQL-compatible mode, uppercase conversion will be unavailable.
- table_name
Specifies the name of the table to be created.
NOTICE:
Some processing logic of materialized views determines whether a table is the log table of a materialized view or a table associated with a materialized view based on the table name prefix. Therefore, do not create a table whose name prefix is mlog_ or matviewmap_. Otherwise, some functions of the table are affected.
- column_name
Specifies the name of a column to be created in the new table.
- data_type
Specifies the data type of the column.
- compress_mode
Specifies whether to compress a table column. The option specifies the algorithm preferentially used by table columns. This parameter is available only to column-store compression.
Value range: DELTA, PREFIX, DICTIONARY, NUMSTR, and NOCOMPRESS
- CHARACTER SET | CHARSET charset
Specifies the character set of a table column. If this parameter is specified separately, the collation of the table column is set to the default collation of the specified character set.
This syntax is supported only in a MySQL-compatible database (that is, sql_compatibility set to 'MYSQL').
- COLLATE collation
Assigns a collation to the column (which must be of a collatable data type). If no collation is specified, the default collation is used. You can run the SELECT * FROM pg_collation command to query collation rules from the pg_collation system catalog. The default collation rule is the row starting with default in the query result. In a MySQL-compatible database (that is, sql_compatibility set to 'MYSQL'), utf8mb4_bin, utf8mb4_general_ci, utf8mb4_unicode_ci, binary, gbk_chinese_ci, gbk_bin, gb18030_chinese_ci, and gb18030_bin are also supported.
NOTE:
- Only the character type supports the specified character set. If the binary character set or collation is specified, the character type is converted to the corresponding binary type. If the type mapping does not exist, an error is reported. Currently, only the mapping from the TEXT type to the BLOB type is available.
- Except the binary character set and collation, only the character set that is the same as the database encoding can be specified.
- If the character set or collation of a column is not explicitly specified and the default character set or collation of the table is specified, the character set or collation of the column is inherited from the table. If the default character set or collation of a table does not exist, the character set and collation of table columns inherit the character set and collation of the current database when b_format_behavior_compat_options is set to 'default_collation'.
Table 1 Character sets and collation supported in MySQL-compatible mode (sql_compatibility set to 'MYSQL') Collation
Character Set
Description
utf8mb4_general_ci
utf8mb4 (utf8)
The general collation is used, which is case-insensitive.
utf8mb4_unicode_ci
utf8mb4 (utf8)
The general collation is used, which is case-insensitive.
utf8mb4_bin
utf8mb4 (utf8)
The binary collation is used, which is case-sensitive.
binary
binary
The binary collation is used.
gbk_chinese_ci
gbk
The Chinese collation is used.
gbk_bin
gbk
The binary collation is used, which is case-sensitive.
gb18030_chinese_ci
gb18030
The Chinese collation is used.
gb18030_bin
gb18030
The binary collation is used, which is case-sensitive.
utf8mb4_0900_ai_ci
utf8mb4
The unicode collation algorithm (UCA) rule is used, which is case-insensitive.
utf8_general_ci
utf8
The general collation is used, which is case-insensitive.
utf8_bin
utf8
The binary collation is used, which is case-sensitive.
- LIKE source_table [ like_option ... ]
Specifies a table from which the new table automatically inherits all column names, their data types, and their NOT NULL constraints, as well as the default expression declared as serial.
The target table and the source table are decoupled after creation is complete. Changes to the source table will not be applied to the target table, and it is not possible to include data of the target table in scans of the source table.
The copied columns and constraints are not merged with similarly named columns and constraints. If the same name is specified explicitly or in another LIKE clause, an error is reported.
- The default expressions are copied from the source table to the target table only if INCLUDING DEFAULTS is specified. The serial column is not copied. The default behavior is to exclude default expressions, resulting in the copied columns in the new table having default values NULL.
- If INCLUDING UPDATE is specified, the ON UPDATE CURRENT_TIMESTAMP attribute of the source table column is copied to the target table column. By default, the generated expression is not copied.
- The CHECK constraints are copied from the source table to the target table only when INCLUDING CONSTRAINTS is specified. Other types of constraints are never copied to the target table. NOT NULL constraints are always copied to the target table. These rules also apply to column constraints and table constraints.
- Any indexes on the source table will not be created on the target table, unless the INCLUDING INDEXES clause is specified.
- STORAGE settings for the source column definitions are copied only if INCLUDING STORAGE is specified. The default behavior is to exclude STORAGE settings.
- If INCLUDING COMMENTS is specified, comments for the copied columns, constraints, and indexes are copied. The default behavior is to exclude comments.
- If INCLUDING PARTITION is specified, the partition definitions of the source table are copied to the target table, and the target table no longer uses the PARTITION BY clause. By default, the partition definition of the source table is not copied.
- If INCLUDING RELOPTIONS is specified, the target table will copy the storage parameter (that is, WITH clause) of the source table. The default behavior is to exclude partition definition of the storage parameter of the original table.
- If INCLUDING DISTRIBUTION is specified, the distribution information of the source table is copied to the target table, including distribution type and key, and the target table no longer use the DISTRIBUTE BY clause. The default behavior is to exclude distribution information of the source table. For K-means distributed tables, the center point information (PGXC_SLICE) of a source table is not copied. You need to import data again and create a center point.
- If INCLUDING ILM is specified, the ILM policy information of a source table is copied to the target table. To copy the ILM policy information of partition objects in the source table, specify INCLUDING PARTITION.
- INCLUDING ALL contains the content of INCLUDING DEFAULTS, INCLUDING UPDATE, INCLUDING CONSTRAINTS, INCLUDING INDEXES, INCLUDING STORAGE, INCLUDING COMMENTS, INCLUDING PARTITION, INCLUDING RELOPTIONS, INCLUDING DISTRIBUTION, and INCLUDING ILM.
NOTICE:
- If the source table contains a sequence with the serial, bigserial, or smallserial data type, or a column in the source table is a sequence by default and the sequence is created for this table (created by using CREATE SEQUENCE ... OWNED BY), these sequences will not be copied to the target table, and another sequence specific to the target table will be created. This is different from earlier versions. To share a sequence between the source table and target table, create a shared sequence (do not use OWNED BY) and set a column in the source table to this sequence.
- You are advised not to set a column in the source table to the sequence specific to another table especially when the table is distributed in specific node groups, because doing so may result in CREATE TABLE ... LIKE execution failures. In addition, doing so may cause the sequence to become invalid in the source sequence because the sequence will also be deleted from the source table when it is deleted from the table that the sequence is specific to. To share a sequence among multiple tables, you are advised to create a shared sequence for them.
- The CREATE TABLE table_name LIKE source_table syntax is supported only when sql_compatibility is set to 'MYSQL' (MySQL-compatible database), b_format_version is set to 5.7, and b_format_dev_version is set to s2.
- In a MySQL-compatible database, if b_format_version is set to 5.7 and b_format_dev_version is set to s2, INCLUDING and EXCLUDING cannot be specified. In this case, it is equivalent to specify INCLUDING ALL by default.
- AUTO_INCREMENT [ = ] value
This clause specifies an initial value for an auto-increment column. The value must be a positive number and cannot exceed 2127 – 1.
NOTICE:
This clause takes effect only when sql_compatibility is set to 'MYSQL'.
- COMMENT [ = ] 'string'
- The COMMENT [ = ] 'string' clause is used to add comments to a table.
- The COMMENT 'string' in column_constraint indicates that comments are added to a column.
- The COMMENT 'string' in table_constraint indicates that comments are added to the indexes corresponding to the primary key and unique key.
NOTICE:
- This clause is valid only when sql_compatibility is set to 'MYSQL'.
- A table-level comment can contain a maximum of 2048 characters, and a column-level or index-level comment can contain a maximum of 1024 characters.
- Comments in table_constraint support only primary keys and unique keys.
- ENGINE
Supported in MySQL-compatible mode and used only for syntax adaptation. Only InnoDB can be set and no actual effect is achieved.
NOTICE:
The ENGINE syntax cannot be used in the CREATE TABLE table_name LIKE source_table syntax.
- COLVIEW | NOCOLVIEW
Specifies whether to create IMCV table metadata for a table, column, or partition. To create an IMCV table, you should enable the enable_htap parameter.
- PRIORITY { HIGH | LOW | NONE }
Specifies the priority for loading data into the IMCV table. The priority can be high, low, or default.
- WITH ( { storage_parameter = value } [, ... ] )
Specifies an optional storage parameter for a table or an index. The WITH clause used for tables can also contain OIDS=FALSE to specify that rows of the new table should not contain OIDs. The WITH clause does not support the OIDS=TRUE syntax.
NOTE:
When using Numeric of arbitrary precision to define a column, specify precision p and scale s. When precision and scale are not specified, the input will be displayed.
The description of parameters is as follows:
- FILLFACTOR
The fill factor of a table is a percentage from 10 to 100. 100 (complete filling) is the default value. If the Ustore is used, the default value is 92. When a smaller fill factor is specified, INSERT operations fill table pages only to the indicated percentage. The remaining space on each page is reserved for updating rows on that page. This gives UPDATE a chance to place the updated copy of a row on the same page, which is more efficient than placing it on a different page. For a table whose entries are never updated, setting the fill factor to 100 (complete filling) is the best choice, but in heavily updated tables a smaller fill factor would be appropriate.
Value range: 10 to 100
- ORIENTATION
Specifies the storage mode of table data. This parameter cannot be modified once it is set.
Value range:
- ROW indicates that table data is stored in rows.
Row store applies to the OLTP service, which has many interactive transactions. An interaction involves many columns in the table. Using row store can improve the efficiency.
Default value:
If an ordinary tablespace is specified, the default is ROW.
- ROW indicates that table data is stored in rows.
- STORAGE_TYPE
Specifies the storage engine type. This parameter cannot be modified once it is set.
Value range:
- USTORE indicates that tables support the in-place update storage engine. Note that the track_counts and track_activities parameters must be enabled when the Ustore table is used. Otherwise, space bloating may occur.
- ASTORE indicates that tables support the append-only storage engine.
Default value:
If this parameter is not specified, the enable_default_ustore_table parameter determines the storage engine mode, which is inplace-update storage by default.
- INIT_TD
Specifies the number of TDs to be initialized when a Ustore table is created. This parameter can be modified by running the ALTER TABLE command. Note that this parameter affects the maximum size of a single tuple stored on the data page. The conversion method is MAX_TUPLE_SIZE = BLCKSZ – INIT_TD * TD_SIZE. For example, if you change the number of INIT_TD from 4 to 8, the maximum size of a single tuple decreases by 4 x INIT_TD.
Value ranges: 2–128. The default value is 4.
- COMPRESSION
Specifies the compression level of the table data. It determines the compression ratio and time. Generally, the higher the level of compression, the higher the ratio, the longer the time; and the lower the level of compression, the lower the ratio, the shorter the time. The actual compression ratio depends on the distribution mode of table data loaded. This parameter is available only to column-store compression.
Value range: YES or NO for row-store tables.
Default value: NO
- COMPRESSLEVEL
Specifies the table data compression ratio and duration at the same compression level. This divides a compression level into sublevels, providing more choices for compression ratio and duration. As the value becomes greater, the compression ratio becomes higher and duration longer at the same compression level.
Value range: 0 to 3
Default value: 0
- segment
The data is stored in segment-page mode. This parameter supports only row-store tables. Protection against unauthorized deletion and damage of physical files 1 to 5 is not supported.
Value range: on and off
Default value: off
- hashbucket
Creates a hash table that uses buckets. This parameter supports only row-store tables.
Value range: on and off
Default value: off
NOTICE:
- For details about constraints on hash bucket tables, see section "Online Scale-Out Based on Hash Bucket Tables" in Administrator Guide.
- When creating a hash bucket table, ensure that the current database is bound to the default node group of the cluster. For details about how to bind a database to a group, see ALTER DATABASE.
- In current version, DDL operations on hash bucket tables are affected. Therefore, you are advised not to frequently perform DDL operations on hash bucket tables.
- Hash bucket tables are bound to segmented paging storage, that is, when hashbucket is set to on, segment is set to on.
- You are advised not to create a small table (ordinary table: < 8 MB; partitioned table: < 8 MB x Number of partitions) as a hash bucket table. The sequential scan performance deteriorates twice that of an ordinary table.
- hasuids
If this parameter is set to on, a unique table-level ID is allocated to a tuple when the tuple is updated.
Value range: on and off
Default value: off
- logical_repl_node
Name of the DN that returns logical logs to the CN during logical decoding of a distributed replication table. For the replication table, if this parameter is not specified, the first node in the node group where the current table is located is used by default. When the RESET operation is performed on this option, the node is reset to the first node of current table.
Value range: a string
Default value: For the non-replication table, this parameter is empty by default. For the replication table, this parameter is set to the name of the first node by default.
- skip_logical_deparse
The kernel skips the table during logical decoding. Users are advised not to use this parameter. If it is used improperly, the logic decoding content may be missing.
Value range: on and off
Default value: off
- STAT_STATE
Determines whether table statistics are locked. If locked, the table statistics cannot be updated. This parameter is invalid in distributed mode.
Value range: locked and unlock
Default value: unlock
- statistic_granularity
Records the default partition_mode when the table analyzes statistics. For details about partition_mode, see Parameters. This parameter is invalid for non-partitioned tables.
Value range: See the value range of partition_mode.
Default value: AUTO
- distribute_accu_rate
Specifies whether to automatically perform redistribution for K-means distributed tables when the cluster data distribution accuracy falls below the threshold.
Value range: 0 to 1
Default value: 0.9
- autovacuum_enabled
Specifies whether the autovacuum function is enabled for the table.
Value range: on and off
Default value: on
- autovacuum_vacuum_threshold
Specifies the minimum number of tuples required to be updated or deleted in the table to trigger VACUUM for the autovacuum function. It takes effect only for Astore tables.
Value range: 0 to 2147483647
The default value is –1, which is the same as the value of the GUC parameter autovacuum_vacuum_threshold.
- autovacuum_analyze_threshold
Specifies the minimum number of tuples required to be inserted, updated, or deleted in the table to trigger ANALYZE for the autovacuum function.
Value range: 0 to 2147483647
The default value is –1, which is the same as the value of the GUC parameter autovacuum_analyze_threshold.
- autovacuum_vacuum_scale_factor
Specifies the ratio of tuples required to be inserted, updated or deleted in the table to trigger VACUUM for the autovacuum function. It takes effect only for Astore tables.
Value range: 0.0 to 100.0
The default value is –1, which is the same as the value of the GUC parameter autovacuum_vacuum_scale_factor.
- autovacuum_analyze_scale_factor
Specifies the ratio of tuples required to be inserted, updated or deleted in the table to trigger ANALYZE for the autovacuum function.
Value range: 0.0 to 100.0
The default value is –1, which is the same as the value of the GUC parameter autovacuum_analyze_scale_factor.
- autovacuum_freeze_min_age
Specifies the minimum age of a row version for the autovacuum function. Only rows older than the minimum age are frozen.
Value range: 0 to 1000000000
The default value is –1, which is the same as the value of the GUC parameter vacuum_freeze_min_age.
- autovacuum_freeze_max_age
Specifies the number of transactions after which the VACUUM operation is forcibly performed for the pg_class.relfrozenxid column in the table for the autovacuum function. The system starts the AUTOVACUUM process even if autovacuum is disabled. The autovacuum operation also allows old files to be deleted from the pg_clog/ subdirectory. It takes effect only for Astore tables.
Value range: 100000 to 2000000000
The default value is –1, which is the same as the value of the GUC parameter autovacuum_freeze_max_age.
- autovacuum_freeze_table_age
Specifies the amount of time that a table remains unchanged if it is marked as not requiring autovacuum for the autovacuum function. It takes effect only for Astore tables.
Value range: 0 to 2000000000
The default value is –1, which is the same as the value of the GUC parameter vacuum_freeze_table_age.
- enable_update_distkey
Determines whether the table supports the UPDATE operation on distribution keys. Distribution keys can be updated only when this parameter is set to on or K-means distribution is used, and constraints are met. Otherwise, distribution keys do not support the UPDATE operation.
Value range: on and off
Default value: off
- FILLFACTOR
- WITHOUT OIDS
It is equivalent to WITH(OIDS=FALSE).
- ON COMMIT { PRESERVE ROWS | DELETE ROWS }
ON COMMIT determines what to do when you commit a temporary table creation operation. Currently, PRESERVE ROWS and DELETE ROWS are supported.
- PRESERVE ROWS (default): No special action is taken at the ends of transactions. The temporary table and its table data are unchanged.
- DELETE ROWS: All rows in the temporary table will be deleted at the end of each transaction block.
- TABLESPACE tablespace_name
Specifies the tablespace where the new table is created. If not specified, the default tablespace is used.
- DISTRIBUTE BY
Specifies how the table is distributed or replicated between nodes.
Value range:
- REPLICATION: Each row in the table exists on all DNs, that is, each DN has complete table data.
- HASH ( column_name ): Each row of the table will be placed into specified DNs based on the hash value of the specified column.
- MURMURHASH ( diskey_expr ): Hash operations are performed on specified columns. The MURMURHASH algorithm is used to distribute data to corresponding DNs through mapping. diskey_expr can be a single column name or a column name modified by the lpad_s(string text, length int [, fill text]) function. When this function is used, only the first parameter can be a column name, for example, lpad_s(a,10,'0'). When lpad_s is used as a distribution key expression, the third parameter cannot be omitted.
- KMEANS(column_name): performs K-means clustering on a specified column and distributes data to the DN where the nearest center point is located.
- RANGE(column_name): maps a specified column based on the range and distributes data to the corresponding DNs.
- LIST(column_name): maps a specified column based on a specific value and distributes data to the corresponding DNs.
NOTE:
- For HASH distribution, the maximum number of distribution keys is the same as that of columns. A maximum of 1600 distribution keys are supported. For MURMURHASH distribution, the distribution key supports only one column (an expression or a column in a table). For RANGE (VALUE LESS THAN) and LIST distributions, the distribution key supports a maximum of four columns. For RANGE (START END) distribution, the distribution key supports only one column.
- Only the MURMURHASH distribution supports an expression as the distribution key. The distribution key expression supports only the lpad_s(string text, length int [, fill text]) function.
- The restrictions on MURMURHASH distribution are as follows:
- The stream plan, FQS plan, and PGXC plan are not supported.
- Scaling, hash bucket tables, GSIs, and materialized views are not supported.
- When two hash tables are joined and the distribution key has an INNER JOIN condition, the FQS plan is not supported.
- A node group (user-defined node group) must be specified for a MURMURHASH distributed table, and the node group must be a non-installation node group.
- Tables cannot be created using CREATE TABLE AS.
- The distribution key of the MURMURHASH distributed table must contain the NOT NULL constraint.
- For a RANGE distribution policy using the VALUE LESS THAN clause, the distribution rules are as follows:
- The comparison starts from the first column of values to be inserted.
- If the value of the inserted first column is smaller than the boundary value of the current column in the local slice, the values are directly inserted.
- If the value of the inserted first column is equal to the boundary value of the current column in the local slice, compare the value of the inserted second column with the boundary value of the next column in the local slice. If the value of the inserted second column is smaller than the boundary value of the next column in the local slice, the values are directly inserted. If they are equal, the comparison of the next columns between the source and target continues.
- If the value of the inserted first column is greater than the boundary value of the current column in the local slice, compare the value with that in the next slice.
- If the shard of the RANGE distributed table corresponds to multiple DNs, the hash value of the distribution key is calculated to perform modulo operation on the number of DN and map a new DN. For details, see the example.
- If the shard of the LIST distributed table corresponds to multiple DNs, the hash value of the distribution key is calculated for the default shards. Then, use the calculated hash value to perform modulo operation on the number of DN and map a new DN. For non-default shards, the round robin method is used to map values in the values list to DNs. For details, see the example.
- The RANGE/LIST distributed tables support only scale-out but not scale-in. For details about the slice scale-out rule, contact the administrator.
- In a K-means distributed table, only the vector type can be used as a distribution key. Only a single column can be used as a distribution key. GSI is not supported. In the current version, after K-means distributed tables are scaled in or out, you need to manually call system functions gs_vector_distrib_update_centroids and gs_vector_distrib_redistribute_new_centroids to recalculate and distribute center points.
- For a replication table, the following sufficient and necessary conditions must be met for pushing down:
- Different types of window functions have different sufficient and necessary conditions for pushing down. There are four types:
- Unconditional pushdown is supported for functions such as the RANK series functions and RATIO_TO_REPORT functions. These window functions return consistent values for rows with the same PARTITION BY and ORDER BY values, regardless of the data relative sequence.
- Pushdown is supported when the projected columns exclude fields outside of PARTITION BY and ORDER BY or when PARTITION BY and ORDER BY form a primary key together, for example, window functions like ROW_NUMBER and NTILE. Otherwise, when the projection column has the same value (but other column values are different), due to issues with relative sequence, the results returned by the window function on different DNs are different. As a result, data is inconsistent.
- Pushdown is supported when the value of offset is 0. For the LAG and LEAD functions, when offset is 0, a column is queried and does not depend on the relative data sequence. If the value of offset is greater than 0, different tuples with the same value may be NULL due to different relative sequences.
- Pushdown is supported when the fields in the parameter expression are PARTITION BY columns, ORDER BY columns, or PARTITION BY and ORDER BY columns that form a primary key, for example, functions such as FIRST_VALUE, LAST_VALUE, and NTH_VALUE.
- If the query of the replication table contains system columns and volatile functions, the query is not pushed down.
- In the context of concatenating aggregate functions, the sufficient and necessary condition for pushing down is that the columns to be concatenated matches the ORDER BY columns, or the ORDER BY columns form a primary key.
- If the UPDATE or DELETE statement uses the WHERE CURRENT OF cursor_name syntax, the query is not pushed down.
- Different types of window functions have different sufficient and necessary conditions for pushing down. There are four types:
For the hash distribution, column_name supports the following data types:- Integer types: TINYINT, SMALLINT, INT, BIGINT, and NUMERIC/DECIMAL
- Character types: CHAR, BPCHAR, VARCHAR, VARCHAR2, NVARCHAR2, and TEXT
- Date/time types: DATE, TIME, TIMETZ, TIMESTAMP, TIMESTAMPTZ, INTERVAL, and SMALLDATETIME
For the MURMURHASH distribution, the data type of the column name in diskey_expr must be one of the following:For the RANGE (VALUES LESS THAN) or LIST distribution, column_name supports the following data types:- Integer types: TINYINT, SMALLINT, INT, BIGINT, and NUMERIC/DECIMAL
- Character types: CHAR, BPCHAR, VARCHAR, VARCHAR2, NVARCHAR2, and TEXT
- Date/time types: DATE, TIMESTAMP, and TIMESTAMPTZ
For the range (start end) distribution, the data type of column_name must be one of the following:
- Integer types: TINYINT, SMALLINT, INT, BIGINT, and NUMERIC/DECIMAL
- Date/time types: DATE, TIMESTAMP, and TIMESTAMPTZ
NOTE:
When you create a table, the choices of distribution keys and partition keys have major impact on SQL query performance. Therefore, select appropriate distribution keys and partition keys with strategies.
- Select appropriate distribution keys.
A hash table's distribution key should evenly distribute data on each DN to prevent skewing the data or distributing it unevenly across DNs. Determine appropriate distribution keys based on the following principles:
- Determine whether data is skewed.
Connect to the database and check the number of tuples on each DN. Replace tablename with the actual name of the table to be analyzed.
gaussdb=# SELECT a.count,b.node_name FROM (SELECT count(*) AS count,xc_node_id FROM tablename GROUP BY xc_node_id) a, pgxc_node b WHERE a.xc_node_id=b.node_id ORDER BY a.count DESC;
If tuple numbers vary greatly (several times or tenfold) on each DN, a data skew occurs. Change the data distribution key based on the following principles:
- Rebuild a table to change its distribution keys. ALTER TABLE cannot change distribution keys. Therefore, you need to rebuild a table when changing its distribution keys.
Principles for selecting distribution keys are as follows:
The value of the distribution key should be discrete so that data can be evenly distributed on each DN. You can select the primary key of the table as the distribution key. For example, for a person information table, choose the ID card number column as the distribution key.
With the above principles are met, you can select join conditions as distribution keys so that join tasks can be pushed down to DNs, reducing the amount of data transferred between the DNs.
- Determine whether data is skewed.
- Select appropriate partition keys.
In range partitioning, a table is partitioned based on ranges defined by one or more columns, with no overlap between the ranges of values assigned to different partitions. Each range has a dedicated partition for data storage.
Modify partition keys to make the query result stored in the same or least partitions (partition pruning). Obtain consecutive I/O to improve the query performance.
In actual services, time is used to filter query objects. Therefore, you can use time as a partition key, and change the key value based on the total data volume and data volume of a single query.
- RANGE/LIST distribution
If no DN is specified for the shards of a RANGE/LIST distributed table, the database uses the Round Robin algorithm to allocate DNs to the shards. In addition, if RANGE/LIST distribution is used, you are advised to define as many shards as possible when creating a table for future capacity expansion. If the defined number of shards is less than the number of DNs before scale-out, data redistribution cannot be performed on new DNs. Note that the sharding rules are designed by users. In some extreme cases, scale-out may not solve the problem of insufficient storage space.
- TO { GROUP groupname | NODE ( nodename [, ... ] ) }
TO GROUP specifies the node group to which the table to be created belongs. TO NODE is used for internal scale-out tools.
- CONSTRAINT [constraint_name]
Specifies the name of a column or table constraint. The optional constraint clauses specify constraints that new or updated rows must satisfy for an INSERT or UPDATE operation to succeed.
There are two ways to define constraints:
- A column constraint is defined as part of a column definition, and it is bound to a particular column.
- A table constraint is not bound to a particular column but can apply to more than one column.
NOTICE:
For a table constraint, constraint_name is optional in a MySQL-compatible database (that is, sql_compatibility set to 'MYSQL'). In other databases, constraint_name must be added.
- NOT NULL
Forbids NULL values in columns.
- NULL
Allows to contain NULL values. This is the default setting.
This clause is only provided for compatibility with non-standard SQL databases. It is not recommended.
- CHECK ( expression )
Specifies an expression producing a Boolean result where the INSERT or UPDATE operation of rows can succeed only when the expression result is true or unknown; otherwise, an error is thrown and the database is not altered.
A CHECK constraint specified as a column constraint should reference only the column's value, while an expression in a table constraint can reference multiple columns.
NOTE:
<>NULL and !=NULL are invalid in an expression. Change them to IS NOT NULL.
- DEFAULT default_expr
Assigns a default data value to a column. The value can be any variable-free expressions. (Subqueries and cross-references to other columns in the current table are not allowed.) The data type of the default expression must match the data type of the column.
The default expression will be used in any insert operation that does not specify a value for the column. If there is no default value for a column, then the default value is NULL.
- ON UPDATE update_expr
The ON UPDATE clause is an attribute constraint of a column.
When an UPDATE operation is performed on a tuple in a table, if new values of updated columns are different from old values in the table, column values with this attribute but not in updated columns are automatically updated to the current timestamp. If new values of updated columns are the same as old values in the table, column values with this attribute but not in updated columns remain unchanged. If columns with this attribute are in updated columns, column values are updated according to the specified update value.
NOTE:
- This attribute can be specified in a MYSQL-compatible database of version 5.7 (that is, sql_compatibility is set to 'MYSQL', b_format_version is set to '5.7', and b_format_dev_version is set to 's1') and in a PG-compatible database (this syntax is not supported during upgrade observation).
- In terms of syntax, update_expr supports three keywords: CURRENT_TIMESTAMP, LOCALTIMESTAMP, and NOW(). You can also specify or not specify the precision of a keyword with parentheses. The three types of keywords are synonyms of each other and have the same attribute effect. For example, ON UPDATE CURRENT_TIMESTAMP(), ON UPDATE CURRENT_TIMESTAMP(5), ON UPDATE LOCALTIMESTAMP(), and ON UPDATE LOCALTIMESTAMP(6). If the keyword does not contain parentheses or contains empty parentheses, the precision is 0. The NOW keyword cannot contain parentheses. In PG-compatible mode, ON UPDATE CURRENT_TIMESTAMP(), ON UPDATE LOCALTIMESTAMP(), and ON UPDATE NOW(5) are not supported.
- This attribute can be specified only for columns of the following types: timestamp, datetime, date, time without time zone, smalldatetime, and abstime.
- The CREATE TABLE AS syntax does not inherit the column attributes.
- The CREATE TABLE LIKE syntax can use INCLUDING UPDATE or EXCLUDING UPDATE to inherit or exclude a constraint. In PG-compatible mode, only EXCLUDING UPDATE can be used to exclude a constraint. The INCLUDING ILM option is added to copy the ILM policy information of the old table. This option is used together with the INCLUDING PARTITION option to copy the policy information of partition objects in the old table.
- The precision specified by this attribute can be different from the precision specified by the type in the corresponding column. After the column value is updated through this attribute, the minimum precision is displayed. For example, CREATE TABLE t1 (col1 timestamp(6) ON UPDATE CURRENT_TIMESTAMP(3));. If the UPDATE syntax triggers the attribute to take effect, three decimal places in the value of col1 are displayed after the update.
- The same column cannot be specified for this attribute and the generated column constraint at the same time.
- This attribute cannot be specified for the partition key in a partitioned table.
- In distributed scenarios, this attribute cannot be specified for distribution keys and primary keys.
- AUTO_INCREMENT
Specifies an auto-increment column.
Currently, the auto-increment column is a local auto-increment column by default. The auto-increment column increases independently on each DN and does not affect each other. The auto-increment column is not unique globally.
If the value of this column is not specified (or the value of this column is set to 0, NULL, or DEFAULT), the value of this column is automatically increased by the auto-increment counter.
If this column is inserted or updated to a value greater than the current auto-increment counter, the auto-increment counter is updated to this value after the command is executed successfully.
The initial auto-increment value is set by the AUTO_INCREMENT [ = ] value clause. If it is not set, the default value 1 is used.
NOTE:
- The auto-increment column can be specified only when sql_compatibility is set to 'MYSQL'.
- The data type of the auto-increment column can only be integer, 4-byte or 8-byte floating point, or Boolean.
An error occurs if the auto-increment continues after an auto-increment value reaches the maximum value of a column data type.
- Each table can have only one auto-increment column.
- It is recommended that the local auto-increment column be the first column of a non-global secondary index. Otherwise, errors may occur when some operations are performed on a table that contains auto-increment columns, for example, ALTER TABLE EXCHANGE PARTITION.
- The DEFAULT value cannot be specified for an auto-increment column.
- The expression of the CHECK constraint cannot contain auto-increment columns.
- You can specify that the auto-increment column can be NULL. If it is not specified, the auto-increment column contains the NOT NULL constraint by default.
- When a table containing a local auto-increment column is created, a sequence that depends on the column is created on each DN as an auto-increment counter. You are not allowed to modify or delete the sequence using sequence-related functions. You can view the value of the sequence. Do not make other sequences depend on or join with this auto-increment column.
- The auto-increment and refresh operations of the auto-increment counter are not rolled back.
- Before data is inserted into a table, 0 or NULL triggers auto-increment. After data is inserted or updated to a table, the auto-increment counter is updated. If an error is reported after auto-increment, and data is not inserted or updated to the table, the auto-increment counter does not roll back. Subsequent insert statements trigger auto-increment based on the auto-increment counter. As a result, the values of the auto-increment columns in the table are discontinuous.
- If you insert or import reserved auto-increment cache values in batches, the values in the auto-increment column may be discontinuous. For details, see the description of the auto_increment_cache parameter.
- An auto-increment column cannot be used as a distribution key.
-
Currently, local temporary tables do not support auto-increment columns.
-
DISTRIBUTE BY REPLICATION does not support auto-increment columns.
- UNIQUE [KEY] index_parameters
Specifies that a group of one or more columns of a table can contain only unique values.
For the purpose of a UNIQUE constraint, NULL is not considered equal.
UNIQUE KEY can be used only when sql_compatibility is set to 'MYSQL', which has the same semantics as UNIQUE.
- UNIQUE [ index_name ][ USING method ] ( { {column_name | ( expression ) } [ ASC | DESC ] }[, ... ] ) index_parameters
Specifies that a group of one or more columns of a table can contain only unique values.
For the purpose of a UNIQUE constraint, NULL is not considered equal.
index_name indicates the index name.
NOTICE:
- The index_name parameter is supported only in a MySQL-compatible database (that is, sql_compatibility set to 'MYSQL').
- For a unique key constraint, if both constraint_name and index_name are specified, the value of index_name is used as the index name.
- PRIMARY KEY index_parameters
PRIMARY KEY [ USING method ] ( { column_name [ ASC | DESC ] } [, ... ] ) index_parameters
Specifies that a column or columns of a table can contain only unique (non-duplicate) and non-NULL values.
Only one primary key can be specified for a table.
- REFERENCES
The current version does not support the REFERENCES clause.
- USING method
Specifies the name of the index method to be used.
For details about the value range, see the USING method in Parameters.
NOTICE:
- The USING method is supported only in a MySQL-compatible database (that is, sql_compatibility set to 'MYSQL').
- In MySQL-compatible mode, if USING method is not specified, the default index method is B-tree for Astore or UB-tree for Ustore.
- For the constraint of building a global secondary index by default, the bottom layer of the global secondary index uses UB-tree storage, even if the storage mode specified by the user is B-tree.
- If the storage mode of a table is Ustore and the constraint in the SQL statement is specified as USING BTREE, the underlying layer automatically creates the constraint as USING UBTREE.
- ASC | DESC
ASC specifies an ascending (default) sort order. DESC specifies a descending sort order.
NOTICE:
ASC|DESC is supported only in a MySQL-compatible database (that is, sql_compatibility set to 'MYSQL').
- expression
Specifies an expression index constraint based on one or more columns of the table. It must be written in parentheses.
NOTICE:
Expression indexes in the UNIQUE constraint are supported only in a MySQL-compatible database (that is, sql_compatibility set to 'MYSQL').
- DEFERRABLE | NOT DEFERRABLE
Determines whether the constraint can be deferred. A constraint that is not deferrable will be checked immediately after every command. Checking of constraints that are deferrable can be postponed until the end of the transaction using the SET CONSTRAINTS command. NOT DEFERRABLE is the default value. Currently, only UNIQUE and PRIMARY KEY constraints accept this clause. All the other constraints are not deferrable.
- INITIALLY IMMEDIATE | INITIALLY DEFERRED
If a constraint is deferrable, this clause specifies the default time to check the constraint.
- If the constraint is INITIALLY IMMEDIATE (default value), it is checked after each statement.
- If the constraint is INITIALLY DEFERRED, it is checked only at the end of the transaction.
The constraint check time can be altered using the SET CONSTRAINTS statement.
- USING INDEX TABLESPACE tablespace_name
Specifies a tablespace in which an index associated with a UNIQUE or PRIMARY KEY constraint will be created. If not specified, the index is created in default_tablespace. If default_tablespace is empty, the default tablespace of the database is used.
- ENCRYPTION_TYPE = encryption_type_value
For the encryption type in the ENCRYPTED WITH constraint, the value of encryption_type_value is DETERMINISTIC or RANDOMIZED.
- [DEFAULT] CHARACTER SET | CHARSET [ = ] default_charset
Specifies the default character set of a table. If you specify a character set separately, the default collation of the table is set to the default collation of the specified character set.
This syntax is supported only when sql_compatibility is set to 'MYSQL'.
- [DEFAULT] COLLATE [ = ] default_collation
Specifies the default collation of a table. If you specify a collation separately, the default character set of the table is set to the character set corresponding to the specified collation.
This syntax is supported only when sql_compatibility is set to 'MYSQL'. For details about the collation, see Table 1.
NOTE:
If the character set or collation of a table is not explicitly specified and the default character set or collation of the schema is specified, the character set or collation of the table is inherited from the schema. If the default character set or collation of a schema does not exist, the character set and collation of the table inherit the character set and collation of the current database when b_format_behavior_compat_options contains 'default_collation'.
Examples
- Create an ordinary table.
gaussdb=# CREATE TABLE tbl_test1( id int, name varchar(50), province varchar (60), -- Province country varchar (60) DEFAULT 'China' -- Country ); gaussdb=# DROP TABLE tbl_test1;
- The WITH clause adds storage parameters to a table or index.
-- Create a table and specify fill factors. gaussdb=# CREATE TABLE tbl_test2( id int, name varchar(50), province varchar (60), -- Province country varchar (60) DEFAULT 'China' -- Country ) WITH (FILLFACTOR = 70); -- Create a table and specify a storage engine. gaussdb=# CREATE TABLE tbl_test3( id int, name varchar(50), province varchar (60), -- Province country varchar (60) DEFAULT 'China' -- Country ) WITH (STORAGE_TYPE = ASTORE); -- Drop. gaussdb=# DROP TABLE tbl_test2; gaussdb=# DROP TABLE tbl_test3;
- Use a temporary table.
-- Create a temporary table and specify that this table is deleted when the transaction is committed. gaussdb=# CREATE TEMPORARY TABLE test_t2( id CHAR(7), name VARCHAR(20), province VARCHAR(60), -- Province country VARCHAR(30) DEFAULT 'China' -- Country ) ON COMMIT DELETE ROWS; gaussdb=# DROP TABLE test_t2;
- Specify a character set and collation during table creation.
-- Create a frontend database. gaussdb=# CREATE DATABASE testdb1 ENCODING = 'UTF8'; gaussdb=# \c testdb1 -- Create table t1. Set the default character set of t1 to utf8mb4 and the default collation to utf8mb4_bin. Set the character set and collation of the c1 column to the default values of the table. Set the character set of the c2 column to utf8mb4, and its collation to utf8mb4_unicode_ci. testdb1=# CREATE TABLE t1(c1 text, c2 text charset utf8mb4 collate utf8mb4_unicode_ci) charset utf8mb4 collate utf8mb4_bin; -- Drop. testdb1=# DROP TABLE t1; testdb1=# \c postgres gaussdb=# DROP DATABASE testdb1;
- IF NOT EXISTS keyword
If this keyword is used, NOTICE is reported when the table does not exist. If this keyword is not used, ERROR is reported. In either case, the table fails to be created.
gaussdb=# CREATE TABLE test_t3(id INT); -- Create a table named test_t3. gaussdb=# CREATE TABLE test_t3(id INT); ERROR: Relation test_t3 already exists in schema public. DETAIL: Creating new table with existing name in the same schema. -- Use the IF NOT EXISTS keyword. gaussdb=# CREATE TABLE IF NOT EXISTS test_t3(id INT); NOTICE: Relation test_t3 already exists, skipping. CREATE TABLE -- Drop the table. gaussdb=# DROP TABLE test_t3;
- Specifying a tablespace during table creation
-- Create a tablespace. gaussdb=# CREATE TABLESPACE ds_tbs1 RELATIVE LOCATION 'tablespace/tablespace_1'; -- Specify a tablespace when creating a table. gaussdb=# CREATE TABLE test(id CHAR(7), name VARCHAR(20)) TABLESPACE ds_tbs1; -- Drop the table and tablespace. gaussdb=# DROP TABLE test; gaussdb=# DROP TABLESPACE ds_tbs1;
- Specifying the AUTO_INCREMENT column during table creation
-- The local auto-increment column is not used as the distribution key. The value starts from 10 for each DN. (Change the DN name by running SELECT node_name FROM pgxc_node WHERE node_type = 'D' based on the actual situation.) gaussdb=# CREATE TABLE local_autoinc(col int AUTO_INCREMENT, col1 int) AUTO_INCREMENT = 10 DISTRIBUTE BY LIST(col1)( SLICE s1 VALUES (1) DATANODE datanode1, SLICE s2 VALUES (2) DATANODE datanode2 ); -- You are advised to use the auto-increment column as the first column of the index to create an index. gaussdb=# CREATE INDEX local_autoinc_ai ON local_autoinc(col); -- Distribute data to DN1. The value NULL triggers auto-increment, and the auto-increment value is 10. gaussdb=# INSERT INTO local_autoinc(col, col1) VALUES(NULL,1); -- Distribute data to DN2. The value 0 triggers auto-increment, and the auto-increment value is 10. gaussdb=# INSERT INTO local_autoinc(col, col1) VALUES(1 - 1,2); -- Distribute data to DN1. The value 100 does not trigger auto-increment. After data is inserted successfully, the auto-increment value is updated to 100. gaussdb=# INSERT INTO local_autoinc(col, col1) VALUES(100,1); -- Distribute data to DN1. The value 0 triggers auto-increment, and the auto-increment value is 101. gaussdb=# INSERT INTO local_autoinc(col, col1) VALUES(0,1); -- Distribute data to DN2. The value DEFAULT triggers auto-increment, and the auto-increment value is 11. gaussdb=# INSERT INTO local_autoinc(col, col1) VALUES(DEFAULT,2); gaussdb=# SELECT col,col1 FROM local_autoinc ORDER BY 2,1; col | col1 -----+------ 10 | 1 100 | 1 101 | 1 10 | 2 11 | 2 (5 rows) -- Drop. gaussdb=# DROP TABLE local_autoinc;
- Creating a table using CREATE TABLE ... LIKE
-- Create the source table t1. gaussdb=# CREATE TABLE t1(col INT); CREATE TABLE gaussdb=# \d t1 Table "public.t1" Column | Type | Modifiers --------+---------+----------- col | integer | -- Create the target table t2. gaussdb=# CREATE TABLE t2(LIKE t1); CREATE TABLE gaussdb=# \d t2 Table "public.t2" Column | Type | Modifiers --------+---------+----------- col | integer | -- Drop. gaussdb=# DROP TABLE t1,t2;
Examples of Creating a Table and Adding Constraints to the Table
- NOT NULL constraints
If no value is specified for a column with a NOT NULL constraint when data is added, an error is reported. You can add NOT NULL constraints to multiple columns in a table.
-- Create a table and add a NOT NULL constraint to the id column. gaussdb=# CREATE TABLE test_t4( id CHAR(7) NOT NULL, name VARCHAR(20), province VARCHAR(60), -- Province country VARCHAR(30) DEFAULT 'China' -- Country ); -- If the value of id is not specified or is NULL during data insertion, the NOT NULL constraint is triggered. As a result, the insertion fails. gaussdb=# INSERT INTO test_t4 (name,province) VALUES ('scott','Shanghai'); ERROR: Null value in column id violates not-null constraint. DETAIL: Failing row contains (null, scott, Shanghai, China). -- Drop the table. gaussdb=# DROP TABLE test_t4;
- Unique constraint
The keyword UNIQUE is used to add a UNIQUE constraint to a column. When data is inserted, the constraint is triggered if the column is duplicate. Multiple NULL values are not duplicate values. When a UNIQUE constraint is added, a unique index is automatically added. You can add UNIQUE constraints to multiple columns in a table.
-- Create a table to add UNIQUE constraints. gaussdb=# CREATE TABLE test_t5( id CHAR(7) UNIQUE USING INDEX TABLESPACE pg_default, name VARCHAR(20), province VARCHAR(60), -- Province country VARCHAR(30) DEFAULT 'China' -- Country ); -- You can also use the following method to name a UNIQUE constraint and add constraints for multiple columns: gaussdb=# CREATE TABLE test_t6( id CHAR(7), name VARCHAR(20), province VARCHAR(60), -- Province country VARCHAR(30) DEFAULT 'China', -- Country CONSTRAINT unq_test_id UNIQUE (id,name) );
-- When data with duplicate IDs is inserted, constraints are triggered. As a result, the insertion fails. gaussdb=# INSERT INTO test_t5(id) VALUES('0000010'); INSERT 0 1 gaussdb=# INSERT INTO test_t5(id) VALUES('0000010'); ERROR: Duplicate key value violates unique constraint "test_t5_id_key". DETAIL: Key (id)=(0000010) already exists.
-- The constraint is not triggered when data whose id is NULL is inserted for multiple times. gaussdb=# INSERT INTO test_t5(id) VALUES (NULL); INSERT 0 1 gaussdb=# INSERT INTO test_t5(id) VALUES (NULL); INSERT 0 1 -- Drop tables. gaussdb=# DROP TABLE test_t5; gaussdb=# DROP TABLE test_t6;
- Primary key constraints
The keyword PRIMARY KEY is used to add a UNIQUE constraint to a column. The column must be unique and cannot be empty. When a PRIMARY KEY constraint is added, a unique index is automatically created for the table, and a NOT NULL constraint is automatically added for the column.
Only one PRIMARY KEY constraint can be defined in each table.
-- Create a table and add a PRIMARY KEY constraint to the table. gaussdb=# CREATE TABLE test_t6( id CHAR(7) PRIMARY KEY, name VARCHAR(20), province VARCHAR(60), -- Province country VARCHAR(30) DEFAULT 'China' -- Country ); gaussdb=# INSERT INTO test_t6 (id,name,province) VALUES ('0000001','july','Beijing'); -- You can also use the following method to manually name a UNIQUE constraint and add constraints for multiple columns: gaussdb=# CREATE TABLE test_t7( id CHAR(7), name VARCHAR(20), province VARCHAR(60), -- Province country VARCHAR(30) DEFAULT 'China', -- Country CONSTRAINT pk_test_t6_id PRIMARY KEY (id,name) );
-- Insert data whose id is NULL to trigger the constraint. gaussdb=# INSERT INTO test_t6 (id,name,province) VALUES (NULL,'july','Beijing'); ERROR: Null value in column id violates not-null constraint. DETAIL: Failing row contains (null, july, Beijing, China). -- Insert data with duplicate id values to trigger the constraint. gaussdb=# INSERT INTO test_t6 (id,name,province) VALUES ('0000001','ben','Shanghai'); ERROR: Duplicate key value violates unique constraint "test_t6_pkey" DETAIL: Key (id)=(0000001) already exists.
-- Drop the table. gaussdb=# DROP TABLE test_t6; gaussdb=# DROP TABLE test_t7;
- Check constraints
The keyword CHECK adds a check constraint to a column. The check constraint must reference one or more columns in the table, and the result returned by the expression must be a Boolean value. Currently, expressions cannot contain subqueries. Both CHECK and NOT NULL constraints can be defined for the same column.
-- Create a table and add check constraints. gaussdb=# CREATE TABLE test_t8 ( id CHAR(7), name VARCHAR(20), age INT CHECK(age > 0 AND age < 150) ); -- Or run the following SQL statements to manually name check constraints and add check constraints for one or more columns: gaussdb=# CREATE TABLE test_t9 ( id CHAR(7), name VARCHAR(20), age INT, CONSTRAINT chek_test_t8_age CHECK(age > 0 AND age < 150) );
-- If a value that does not comply with the expression is inserted, the check constraint is triggered. As a result, the insertion fails. gaussdb=# INSERT INTO test_t8 (id,name,age) VALUES ('0000007','scott',200); ERROR: New row for relation test_t8 violates check constraint test_t8_age_check. DETAIL: N/A
-- Drop the table. gaussdb=# DROP TABLE test_t8; gaussdb=# DROP TABLE test_t9;
Data Distribution Example
- REPLICATION
gaussdb=# CREATE TABLE test_replication( id CHAR(7), name VARCHAR(20), province VARCHAR(60), -- Province country VARCHAR(30) DEFAULT 'China' -- Country )DISTRIBUTE BY REPLICATION; -- Query table information. gaussdb=# \d+ test_replication Table "public.test_replication" Column | Type | Modifiers | Storage | Stats target | Description ----------+-----------------------+------------------------------------+----------+--------------+------------- id | character(7) | | extended | | name | character varying(20) | | extended | | province | character varying(60) | | extended | | country | character varying(30) | default 'China'::character varying | extended | | Has OIDs: no Distribute By: REPLICATION Location Nodes: ALL DATANODES Options: orientation=row, logical_repl_node=-1, compression=no, storage_type=USTORE, segment=off -- Drop. gaussdb=# DROP TABLE test_replication;
- HASH
-- Define a hash table. gaussdb=# CREATE TABLE test_hash( id CHAR(7), name VARCHAR(20), province VARCHAR(60), -- Province country VARCHAR(30) DEFAULT 'China' -- Country )DISTRIBUTE BY HASH(id); -- Insert data. gaussdb=# INSERT INTO test_hash VALUES ('0000001', 'Bob', 'Shanghai', 'China'), ('0000002', 'Jack', 'Beijing', 'China'), ('0000003', 'Scott', 'Beijing', 'China');
-- View data distribution. gaussdb=# SELECT a.count,b.node_name FROM (SELECT COUNT(*) AS count, xc_node_id FROM test_hash GROUP BY xc_node_id) a, pgxc_node b WHERE a.xc_node_id=b.node_id ORDER BY a.count DESC; count | node_name -------+------------------- 2 | dn_6001_6002_6003 1 | dn_6004_6005_6006
-- Drop the table. gaussdb=# DROP TABLE test_hash;
- MURMURHASH
-- The test environment contains one CN and six DNs. -- Create a node group. You can run the SELECT node_name FROM PGXC_NODE WHERE node_type = 'D' command to query the DN name in the node group. In the query result, replace the DN name following WITH in the CREATE NODE GROUP statement as required. gaussdb=# CREATE NODE GROUP NG1 WITH(datanode1, datanode2, datanode3, datanode4, datanode5, datanode6); -- Define a table using MURMURHASH. gaussdb=# CREATE TABLE test_murmurhash1 (a int NOT NULL, b int) DISTRIBUTE BY MURMURHASH(a) TO GROUP NG1; gaussdb=# CREATE TABLE test_murmurhash2 (a int NOT NULL, b int) DISTRIBUTE BY MURMURHASH(lpad_s(a,10,'0')) TO GROUP NG1; -- Insert data. gaussdb=# INSERT INTO test_murmurhash1 VALUES(0,1); gaussdb=# INSERT INTO test_murmurhash2 VALUES(1,2); -- Query data. gaussdb=# SELECT * FROM test_murmurhash1; a | b ---+--- 0 | 1 (1 row) gaussdb=# SELECT * FROM test_murmurhash2; a | b ---+--- 1 | 2 (1 row) -- Drop the table. gaussdb=# DROP TABLE test_murmurhash1; gaussdb=# DROP TABLE test_murmurhash2; -- Drop a node group. gaussdb=# DROP NODE GROUP NG1;
- RANGE
-- Define a range-distributed table. (Change the DN name based on the actual situation. You can run the SELECT node_name FROM PGXC_NODE WHERE node_type = 'D' command to query the DN name.) gaussdb=# CREATE TABLE test_range( id INT, name VARCHAR(20), province VARCHAR(60), -- Province country VARCHAR(30) DEFAULT 'China' -- Country )DISTRIBUTE BY RANGE(id)( SLICE s1 VALUES LESS THAN (100) DATANODE dn_6001_6002_6003, SLICE s2 VALUES LESS THAN (200) DATANODE dn_6004_6005_6006, SLICE s3 VALUES LESS THAN (MAXVALUE) DATANODE dn_6007_6008_6009 ); -- Insert data. gaussdb=# INSERT INTO test_range VALUES (52, 'Bob', 'Beijing', 'China'); gaussdb=# INSERT INTO test_range VALUES (100, 'Ben', 'Shanghai', 'China'); gaussdb=# INSERT INTO test_range VALUES (150, 'Scott', 'Guangzhou', 'China'); gaussdb=# INSERT INTO test_range VALUES (300, 'Jordan', 'Beijing', 'China');
-- View data distribution. gaussdb=# SELECT a.count,b.node_name FROM (SELECT COUNT(*) AS count, xc_node_id FROM test_range GROUP BY xc_node_id) a, pgxc_node b WHERE a.xc_node_id=b.node_id ORDER BY a.count DESC; count | node_name -------+------------------- 2 | dn_6004_6005_6006 1 | dn_6001_6002_6003 1 | dn_6007_6008_6009 (3 rows)
-- Query data stored on each DN. gaussdb=# SELECT b.node_name, a.* FROM (SELECT *, xc_node_id FROM test_range) a, pgxc_node b WHERE a.xc_node_id=b.node_id order by node_name; node_name | id | name | province | country | xc_node_id -------------------+-----+--------+-----------+---------+------------- dn_6001_6002_6003 | 52 | Bob | Beijing | China | -1072999043 dn_6004_6005_6006 | 100 | Ben | Shanghai | China | -564789568 dn_6004_6005_6006 | 150 | Scott | Guangzhou | China | -564789568 dn_6007_6008_6009 | 300 | Jordan | Beijing | China | 1532339558 (4 rows)
-- Drop the table. gaussdb=# DROP TABLE test_range;
- LIST
-- Define a list-distributed table. (Change the DN name based on the actual situation. You can run the SELECT node_name FROM PGXC_NODE WHERE node_type = 'D' command to query the DN name.) gaussdb=# CREATE TABLE test_list( id INT, name VARCHAR(20), country VARCHAR(30) DEFAULT 'China' -- Country )DISTRIBUTE BY LIST(country)( SLICE s1 VALUES ('China') DATANODE dn_6001_6002_6003, SLICE s2 VALUES ('USA') DATANODE dn_6004_6005_6006, SLICE s3 VALUES (DEFAULT) DATANODE dn_6007_6008_6009 ); -- Insert data. gaussdb=# INSERT INTO test_list VALUES (1,'Scott','China'); gaussdb=# INSERT INTO test_list VALUES (2,'Henry','USA'); gaussdb=# INSERT INTO test_list VALUES (3,'Michael','France'); gaussdb=# INSERT INTO test_list VALUES (4,'Jack','UK');
-- Query data stored on each DN. gaussdb=# SELECT b.node_name, a.* FROM (SELECT *, xc_node_id FROM test_list) a, pgxc_node b WHERE a.xc_node_id=b.node_id order by node_name; node_name | id | name | country | xc_node_id -------------------+----+---------+--------+------------- dn_6001_6002_6003 | 1 | Scott | China | -1072999043 dn_6004_6005_6006 | 2 | Henry | USA | -564789568 dn_6007_6008_6009 | 3 | Michael | France | 1532339558 dn_6007_6008_6009 | 4 | Jack | UK | 1532339558 (4 rows)
-- Drop the table. gaussdb=# DROP TABLE test_list;
Helpful Links
Suggestions
- UNLOGGED
- The unlogged table and its indexes do not use the WAL mechanism during data writing. Their write speed is much higher than that of ordinary tables. Therefore, they can be used for storing intermediate result sets of complex queries to improve query performance.
- The unlogged table has no primary/standby mechanism. In case of system faults or abnormal breakpoints, data loss may occur. Therefore, the unlogged table cannot be used to store basic data.
- TEMPORARY | TEMP
- A temporary table is automatically dropped at the end of a session.
- The temporary table is visible only to the current CN.
- LIKE
- The target table automatically inherits all column names, data types, and NOT NULL constraints from this table. The target table is irrelevant to the source table after the creation.
- LIKE INCLUDING DEFAULTS
- The default expressions are copied from the source table to the target table only if INCLUDING DEFAULTS is specified. The default behavior is to exclude default expressions, resulting in the copied columns in the new table having default values NULL.
- LIKE INCLUDING CONSTRAINTS
- The CHECK constraints are copied from the source table to the target table only when INCLUDING CONSTRAINTS is specified. Other types of constraints are never copied to the target table. NOT NULL constraints are always copied to the target table. These rules also apply to column constraints and table constraints.
- LIKE INCLUDING INDEXES
- Any indexes on the source table will not be created on the target table, unless the INCLUDING INDEXES clause is specified.
- LIKE INCLUDING STORAGE
- STORAGE settings for the copied column definitions are copied only if INCLUDING STORAGE is specified. The default behavior is to exclude STORAGE settings.
- LIKE INCLUDING COMMENTS
- If INCLUDING COMMENTS is specified, comments for the copied columns, constraints, and indexes are copied. The default behavior is to exclude comments.
- LIKE INCLUDING PARTITION
- If INCLUDING PARTITION is specified, the partition definitions of the source table are copied to the target table, and the target table no longer uses the PARTITION BY clause. By default, the partition definition of the source table is not copied.
- LIKE INCLUDING RELOPTIONS
- If INCLUDING RELOPTIONS is specified, the target table will copy the storage parameter (that is, WITH clause) of the source table. The default behavior is to exclude partition definition of the storage parameter of the original table.
- LIKE INCLUDING DISTRIBUTION
- If INCLUDING DISTRIBUTION is specified, the distribution information of the source table is copied to the target table, including distribution type and key, and the target table no longer use the DISTRIBUTE BY clause. The default behavior is to exclude distribution information of the source table.
- LIKE INCLUDING ALL
- INCLUDING ALL contains the content of INCLUDING DEFAULTS, INCLUDING CONSTRAINTS, INCLUDING INDEXES, INCLUDING STORAGE, INCLUDING COMMENTS, INCLUDING PARTITION, INCLUDING RELOPTIONS, INCLUDING DISTRIBUTION, and INCLUDING ILM.
- ORIENTATION ROW
- Creates a row-store table. Row store applies to the OLTP service, which has many interactive transactions. An interaction involves many columns in the table. Using row store can improve the efficiency.
- DISTRIBUTE BY
- It is recommended that a fact table or dimension table containing a large amount of data be created as a distributed table. Each row of the table will be placed into specified DNs based on the hash value of the specified column. The syntax is DISTRIBUTE BY HASH(column_name).
- It is recommended that a dimension table containing a small amount of data be created as a replication table. Each row in the table exists on all DNs. That is, each DN has complete table data. The syntax is DISTRIBUTE BY REPLICATION.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot