CREATE TABLE
Description
Creates an initially empty table in the current database. The table will be owned by the creator. Row-store tables are created by default.
Precautions
- If a primary key constraint or unique constraint is added to a non-distribution key, a global secondary index is created by default.
- By default, HASH(column_name) is used in a distributed system. column_name is the primary key column or unique constraint column (if any) of the table, or the first column of a data type that can be used as the distribution key. The priority is as follows: primary key column > unique constraint column > first column of a data type that can be used as the distribution key. If a primary key column and multiple unique constraint columns exist and the distribution mode of the table is not specified, GSIs are created for other unique constraint columns by default after the distribution key of the table is determined based on the priority.
- The distribution keys can be updated only when enable_update_distkey is set to on and constraints are met. Otherwise, distribution keys do not support the UPDATE operation.
The constraints on updating distribution keys are as follows:
- Distribution keys can be updated only when enable_update_distkey is set to on.
- The UPDATE statement cannot be pushed down to DNs for execution. The PGXC plan is directly generated and is not changed based on the values before and after the distribution key is updated.
- Tables with row-level UPDATE TRIGGER are not supported. Otherwise, the execution fails and an error message is displayed. The row-level INSERT/DELETE TRIGGER does not take effect, and the update-statement-level TRIGGER is executed normally.
- Concurrent update of the same row is not supported. You need to obtain a lock first. The result (returning 0 or reporting an error) of obtaining a lock on DNs depends on the setting of the GUC parameter concurrent_dml_mode. An error will be reported in the following cases: (1) The error message "update distribute column conflict" is displayed. (2) When the time spent for obtaining a lock exceeds the threshold, an error message is displayed, indicating that the lock times out.
- Tables with global secondary indexes (GSIs) are not supported. Otherwise, an error is reported.
- Only Hash distribution is supported. LIST/RANGE distribution tables are not supported. Otherwise, an error is reported.
- MERGE INTO and UPSERT cannot be used to update distribution keys. Otherwise, an error is reported.
- GTM_FREE is not supported. Otherwise, an error is reported.
- UPDATE RETURNING is not supported. Otherwise, an error is reported.
- Statements with joined tables are not supported. Otherwise, an error is reported.
- UPDATE+LIMIT is not supported. Otherwise, an error is reported.
- If an error occurs during table creation, after it is fixed, the system may fail to delete the empty disk files (whose size is 0) created before the last automatic clearance. This problem seldom occurs and does not affect system running of the database.
- When JDBC is used, the DEFAULT value can be set through PrepareStatement.
- Row-store tables do not support foreign key as the table-level constraint.
- A user granted with the CREATE ANY TABLE permission can create tables in the public and user schemas. To create a table that contains serial columns, you must also obtain the CREATE ANY SEQUENCE permission to create sequences.
- The XML type cannot be used as a primary key or foreign key.
- If you add a row-level expression when adding or changing an ILM policy for a data object, note that the row-level expression supports only the functions listed in the whitelist. For details about the whitelist function list, see Row Expression Function Whitelist.
If the database creates unlimited tables, CNs may be affected as follows:
- Resource exhaustion: Each table occupies certain disk space. Unlimited table creation will occupy a large amount of memory and disk space, which may exhaust CN resources. As a result, the system breaks down or becomes unstable.
- Performance deterioration: Unlimited table creation causes a large number of I/O operations and CPU computing, and the metadata information of the database becomes large, which may deteriorate the CN performance, including operations such as insert, query, update, and deletion. As a result, the system responds slowly or cannot meet service requirements.
- Security issues: Excessive tables make database management and maintenance difficult. Creating unlimited tables may cause security issues such as data leakage or data loss. Database stability decreases, causing immeasurable loss to enterprises.
Therefore, plan the number and size of tables in the database properly to avoid unlimited table creation and ensure system stability, reliability, and security.
- The number of table constraints cannot exceed 32,767.
Syntax
- Create a table.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
CREATE [ [ GLOBAL | LOCAL ] { TEMPORARY | TEMP } | UNLOGGED ] TABLE [ IF NOT EXISTS ] table_name {({ column_name data_type [ CHARACTER SET | CHARSET charset ] [ compress_mode ] [ COLLATE collation ] [ column_constraint [ ... ] ] | table_constraint | LIKE source_table [ like_option [...] ] } [, ... ]) | LIKE source_table } [ table_option [ [ , ] ... ] ] [ WITH ( {storage_parameter = value} [, ... ] ) ] [ ON COMMIT { PRESERVE ROWS | DELETE ROWS } ] [ COMPRESS | NOCOMPRESS ] [ ILM ADD POLICY ROW STORE { COMPRESS ADVANCED } { ROW } AFTER n { day | month | year } OF { NO MODIFICATION } [ ON ( EXPR )]] [ TABLESPACE tablespace_name ] [ DISTRIBUTE BY { REPLICATION | HASH ( column_name [, ...] ) | MURMURHASH ( expression ) | RANGE ( column_name [, ...] ) { SLICE REFERENCES tablename | ( slice_less_than_item [, ...] ) | ( slice_start_end_item [, ...] ) } | LIST ( column_name [, ...] ) { SLICE REFERENCES tablename | ( slice_values_item [, ...] ) } } ] [ TO { GROUP groupname | NODE ( nodename [, ... ] ) } ];
- table_option is as follows:
{ COMMENT [ = ] 'string' | AUTO_INCREMENT [ = ] value | [ DEFAULT ] CHARACTER SET | CHARSET [ = ] default_charset | [ DEFAULT ] COLLATE [ = ] default_collation | ENGINE [ = ] { InnoDB | 'InnoDB' | "InnoDB" }}
- column_constraint is as follows:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
[ CONSTRAINT constraint_name ] { NOT NULL | NULL | CHECK ( expression ) | DEFAULT default_expr | ON UPDATE update_expr | GENERATED ALWAYS AS ( generation_expr ) [STORED] | AUTO_INCREMENT | COMMENT 'string' | UNIQUE [KEY] index_parameters | PRIMARY KEY index_parameters | ENCRYPTED WITH ( COLUMN_ENCRYPTION_KEY = column_encryption_key, ENCRYPTION_TYPE = encryption_type_value ) | REFERENCES reftable [ ( refcolumn ) ] [ MATCH FULL | MATCH PARTIAL | MATCH SIMPLE ] [ ON DELETE action ] [ ON UPDATE action ] } [ DEFERRABLE | NOT DEFERRABLE | INITIALLY DEFERRED | INITIALLY IMMEDIATE ]
- compress_mode of a column is as follows:
1
{ DELTA | PREFIX | DICTIONARY | NUMSTR | NOCOMPRESS }
- table_constraint is as follows:
1 2 3 4 5 6 7
[ CONSTRAINT [ constraint_name ] ] { CHECK ( expression ) | UNIQUE [ index_name ][ USING method ] ( { {column_name [ ( length ) ] | ( expression ) } [ ASC | DESC ] }[, ... ] ) index_parameters | PRIMARY KEY [ USING method ] ( { column_name [ ASC | DESC ] } [, ... ] ) index_parameters } [ DEFERRABLE | NOT DEFERRABLE | INITIALLY DEFERRED | INITIALLY IMMEDIATE ] { [ COMMENT 'string' ] [ ... ] }
- like_option is as follows:
1
{ INCLUDING | EXCLUDING } { DEFAULTS | GENERATED | CONSTRAINTS | INDEXES | STORAGE | COMMENTS | PARTITION | RELOPTIONS | DISTRIBUTION | UPDATE | ALL }
- index_parameters is as follows:
1 2
[ WITH ( {storage_parameter = value} [, ... ] ) ] [ USING INDEX TABLESPACE tablespace_name ]
- Range distribution rules
For slice_less_than_item: SLICE name VALUES LESS THAN ({ expression | MAXVALUE } [, ...]) [ DATANODE datanode_name | ( datanode_name_list [, ... ] ) ] For slice_start_end_item: SLICE name { { START ( expression ) END ( expression ) EVERY ( expression ) } | { START ( literal ) END ( { literal | MAXVALUE } ) } | { START ( literal ) } | { END ( { literal | MAXVALUE } ) } }
- The LIST distribution rule slice_values_item is as follows:
[ ( SLICE name VALUES (expression [, ... ]) [DATANODE datanode_name | ( datanode_name_list )] [, ... ] ) | ( SLICE name VALUES (DEFAULT) [DATANODE datanode_name] | ( datanode_name_list ) ) ]
- update_expr is as follows:
{ CURRENT_TIMESTAMP | LOCALTIMESTAMP | NOW() }
- table_option is as follows:
Parameters
- UNLOGGED
If this keyword is specified, the created table is an unlogged table. Data written to unlogged tables is not written to the WALs, which makes them considerably faster than ordinary tables. However, an unlogged table is automatically truncated after conflicts, OS restart, database restart, switchover, power-off, or abnormal shutdown, incurring data loss risks. Contents of an unlogged table are also not replicated to standby nodes. Any indexes created on an unlogged table are not automatically logged as well.
Usage scenario: Unlogged tables do not ensure data security. Users can back up data before using unlogged tables; for example, users should back up the data before a system upgrade.
Troubleshooting: If data is missing in the indexes of unlogged tables due to some unexpected operations such as an unclean shutdown, users should rebuild the indexes with errors.
- GLOBAL | LOCAL
When creating a temporary table, you can specify the GLOBAL or LOCAL keyword before TEMP or TEMPORARY. Currently, the two keywords are used to be compatible with the SQL standard. A local temporary table will be created by the GaussDB regardless of whether GLOBAL or LOCAL is specified.
- TEMPORARY | TEMP
If TEMP or TEMPORARY is specified, the created table is a temporary table. A temporary table is automatically dropped at the end of the current session. Therefore, you can create and use temporary tables in the current session as long as the connected CN in the session is normal. Temporary tables are created only in the current session. If a DDL statement involves operations on temporary tables, a DDL error will be generated. Therefore, you are advised not to perform operations on temporary tables in DDL statements. TEMP is equivalent to TEMPORARY.
- Temporary tables are visible to the current session through a schema starting with pg_temp. Therefore, do not delete schemas starting with pg_temp or pg_toast_temp.
- If TEMPORARY or TEMP is not specified when you create a table but its schema is set to that starting with pg_temp in the current session, the table will be created as a temporary table.
- A temporary table is visible only to the current session. Therefore, it cannot be used together with \parallel on.
- Temporary tables do not support DN faults or primary/standby switchovers.
- IF NOT EXISTS
Sends a notice, but does not throw an error, if a table with the same name exists.
- [ ILM ADD POLICY ROW STORE { COMPRESS ADVANCED } { ROW } AFTER n { day | month | year } OF { NO MODIFICATION } [ ON ( EXPR )]]
When creating a table, you can call ILM ADD POLICY ROW STORE COMPRESS ADVANCED ROW to add an advanced compression policy for row store. For example, CREATE TABLE t1 (a int) ILM ADD POLICY ROW STORE COMPRESS ADVANCED ROW AFTER 3 DAY OF NO MODIFICATION ON (a != 0) indicates that the t1 table is created and the advanced compression policy is added: rows that are not modified in three days and a != 0 is specified.
- AFTER n { day | month | year } OF NO MODIFICATION: indicates the rows that are not modified in n days, months, or years.
- ON (EXPR): indicates the row-level expression, which is used to determine whether a row is hot or cold.
- table_name
Specifies the name of the table to be created.
Some processing logic of materialized views determines whether a table is the log table of a materialized view or a table associated with a materialized view based on the table name prefix. Therefore, do not create a table whose name prefix is mlog_ or matviewmap_. Otherwise, some functions of the table are affected.
- column_name
Specifies the name of a column to be created in the new table.
- data_type
Specifies the data type of the column.
- compress_mode
Specifies whether to compress a table column. The option specifies the algorithm preferentially used by table columns. Row-store tables do not support compression.
Value range: DELTA, PREFIX, DICTIONARY, NUMSTR, and NOCOMPRESS
- CHARACTER SET | CHARSET charset
Specifies the character set of a table column. If this parameter is specified separately, the collation of the table column is set to the default collation of the specified character set.
This syntax is supported only in a MySQL-compatible database (that is, sql_compatibility set to 'MYSQL').
- COLLATE collation
Assigns a collation to the column (which must be of a collatable data type). If no collation is specified, the default collation is used. You can run the SELECT * FROM pg_collation command to query collation rules from the pg_collation system catalog. The default collation rule is the row starting with default in the query result. In a MySQL-compatible database (that is, sql_compatibility set to 'MYSQL'), utf8mb4_bin, utf8mb4_general_ci, utf8mb4_unicode_ci, binary, gbk_chinese_ci, gbk_bin, gb18030_chinese_ci, and gb18030_bin are also supported.
- Only the character type supports the specified character set. If the binary character set or collation is specified, the character type is converted to the corresponding binary type. If the type mapping does not exist, an error is reported. Currently, only the mapping from the TEXT type to the BLOB type is available.
- Except the binary character set and collation, only the character set that is the same as the database encoding can be specified.
- If the character set or collation of a column is not explicitly specified and the default character set or collation of the table is specified, the character set or collation of the column is inherited from the table. If the default character set or collation of a table does not exist, the character set and collation of table columns inherit the character set and collation of the current database when b_format_behavior_compat_options contains 'default_collation'.
Table 1 Character sets and collation supported in MySQL-compatible mode (sql_compatibility set to 'MYSQL') Collation
Character Set
Description
utf8mb4_general_ci
utf8mb4 (utf8)
The general collation is used, which is case-insensitive.
utf8mb4_unicode_ci
utf8mb4 (utf8)
The general collation is used, which is case-insensitive.
utf8mb4_bin
utf8mb4 (utf8)
The binary collation is used, which is case-sensitive.
binary
binary
The binary collation is used.
gbk_chinese_ci
gbk
The Chinese collation is used.
gbk_bin
gbk
The binary collation is used, which is case-sensitive.
gb18030_chinese_ci
gb18030
The Chinese collation is used.
gb18030_bin
gb18030
The binary collation is used, which is case-sensitive.
utf8mb4_0900_ai_ci
utf8mb4
The unicode collation algorithm (UCA) rule is used, which is case-insensitive.
utf8_general_ci
utf8
The general collation is used, which is case-insensitive.
utf8_bin
utf8
The binary collation is used, which is case-sensitive.
- LIKE source_table [ like_option ... ]
Specifies a table from which the new table automatically inherits all column names, their data types, and their not-null constraints, as well as the default expression declared as serial.
The new table and the original table are decoupled after creation is complete. Changes to the original table will not be applied to the new table, and it is not possible to include data of the new table in scans of the original table.
The copied columns and constraints are not merged with similarly named columns and constraints. If the same name is specified explicitly or in another LIKE clause, an error is reported.
- The default expressions are copied from the original table to the new table only if INCLUDING DEFAULTS is specified. The serial column is not copied. The default behavior is to exclude default expressions, resulting in the copied columns in the new table having default values NULL.
- If INCLUDING UPDATE is specified, the ON UPDATE CURRENT_TIMESTAMP attribute of the source table column is copied to the new table column. By default, the generated expression is not copied.
- The CHECK constraints are copied from the original table to the new table only when INCLUDING CONSTRAINTS is specified. Other types of constraints are never copied to the new table. Not-null constraints are always copied to the new table. These rules also apply to column constraints and table constraints.
- Any indexes on the original table will not be created on the new table, unless the INCLUDING INDEXES clause is specified.
- STORAGE settings for the source column definitions are copied only if INCLUDING STORAGE is specified. The default behavior is to exclude STORAGE settings.
- If INCLUDING COMMENTS is specified, comments for the copied columns, constraints, and indexes are copied. The default behavior is to exclude comments.
- If INCLUDING PARTITION is specified, the partition definitions of the source table are copied to the new table, and the new table no longer uses the PARTITION BY clause. By default, the partition definition of the source table is not copied.
- If INCLUDING RELOPTIONS is specified, the new table will copy the storage parameter (that is, WITH clause) of the source table. The default behavior is to exclude partition definition of the storage parameter of the original table.
- If INCLUDING DISTRIBUTION is specified, the distribution information of the original table is copied to the new table, including distribution type and key, and the new table no longer use the DISTRIBUTE BY clause. The default behavior is to exclude distribution information of the original table.
- INCLUDING ALL contains the content of INCLUDING DEFAULTS, INCLUDING UPDATE, INCLUDING CONSTRAINTS, INCLUDING INDEXES, INCLUDING STORAGE, INCLUDING COMMENTS, INCLUDING PARTITION, INCLUDING RELOPTIONS, and INCLUDING DISTRIBUTION.
- If the original table contains a sequence with the serial, bigserial, or smallserial data type, or a column in the original table is a sequence by default and the sequence is created for this table (created by using CREATE SEQUENCE ... OWNED BY), these sequences will not be copied to the new table, and another sequence specific to the new table will be created. This is different from earlier versions. To share a sequence between the source table and new table, create a shared sequence (do not use OWNED BY) and set a column in the source table to this sequence.
- You are advised not to set a column in the source table to the sequence specific to another table especially when the table is distributed in specific node groups, because doing so may result in CREATE TABLE ... LIKE execution failures. In addition, doing so may cause the sequence to become invalid in the source sequence because the sequence will also be deleted from the source table when it is deleted from the table that the sequence is specific to. To share a sequence among multiple tables, you are advised to create a shared sequence for them.
- The CREATE TABLE table_name LIKE source_table syntax is supported only when sql_compatibility is set to 'MYSQL' (MySQL-compatible database), b_format_version is set to 5.7, and b_format_dev_version is set to s2.
- In a MySQL-compatible database, if b_format_version is set to 5.7 and b_format_dev_version is set to s2, INCLUDING and EXCLUDING cannot be specified. In this case, it is equivalent to specify INCLUDING ALL by default.
- AUTO_INCREMENT [ = ] value
This clause specifies an initial value for an auto-increment column. The value must be a positive number and cannot exceed 2127 – 1.
This clause takes effect only when sql_compatibility is set to 'MYSQL'.
- COMMENT [ = ] 'string'
- The COMMENT [ = ] 'string' clause is used to add comments to a table.
- The COMMENT 'string' in column_constraint indicates that comments are added to a column.
- The COMMENT 'string' in table_constraint indicates that comments are added to the indexes corresponding to the primary key and unique key.
- This clause is valid only when sql_compatibility is set to 'MYSQL'.
- A table-level comment can contain a maximum of 2048 characters, and a column-level or index-level comment can contain a maximum of 1024 characters.
- Comments in table_constraint support only primary keys and unique keys.
- ENGINE
Supported in MySQL-compatible mode and used only for syntax adaptation. Only InnoDB can be set and no actual effect is achieved.
The ENGINE syntax cannot be used in the CREATE TABLE table_name LIKE source_table syntax.
- WITH ( { storage_parameter = value } [, ... ] )
Specifies an optional storage parameter for a table or an index. The WITH clause used for tables can also contain OIDS=FALSE to specify that rows of the new table should not contain OIDs.
When using arbitrary precision Numeric to define a column, specify precision p and scale s. When precision and scale are not specified, the input will be displayed.
The description of parameters is as follows:
- FILLFACTOR
The fill factor of a table is a percentage from 10 to 100. 100 (complete filling) is the default value. If the Ustore is used, the default value is 92. When a smaller fill factor is specified, INSERT operations fill table pages only to the indicated percentage. The remaining space on each page is reserved for updating rows on that page. This gives UPDATE a chance to place the updated copy of a row on the same page, which is more efficient than placing it on a different page. For a table whose entries are never updated, setting the fill factor to 100 (complete filling) is the best choice, but in heavily updated tables a smaller fill factor would be appropriate.
Value range: 10–100
- ORIENTATION
Specifies the storage mode of table data. This parameter cannot be modified once it is set.
Value range:
- ROW indicates that table data is stored in rows.
Row store applies to the OLTP service, which has many interactive transactions. An interaction involves many columns in the table. Using row store can improve the efficiency.
Default value:
If an ordinary tablespace is specified, the default is ROW.
- ROW indicates that table data is stored in rows.
- STORAGE_TYPE
Specifies the storage engine type. This parameter cannot be modified once it is set.
Value range:
- USTORE indicates that tables support the in-place update storage engine. Note that the track_counts and track_activities parameters must be enabled when the Ustore table is used. Otherwise, space bloating may occur.
- ASTORE indicates that tables support the append-only storage engine.
Default value:
If ORIENTATION and STORAGE_TYPE are not specified, ASTORE is used by default, indicating that the table supports the append-only storage engine.
- INIT_TD
Specifies the number of TDs to be initialized when a Ustore table is created. This parameter can be modified by running the ALTER TABLE command. Note that this parameter affects the maximum size of a single tuple stored on the data page. The conversion method is MAX_TUPLE_SIZE = BLCKSZ – INIT_TD x TD_SIZE. For example, if you change the number of INIT_TD from 4 to 8, the maximum size of a single tuple decreases by 4 x INIT_TD.
Value ranges: 2–128. The default value is 4.
- COMPRESSION
Specifies the compression level of table data. It determines the compression ratio and time. Generally, the higher the level of compression, the higher the ratio, the longer the time; and the lower the level of compression, the lower the ratio, the shorter the time. The actual compression ratio depends on the distribution mode of table data loaded. Row-store tables do not support compression.
Value range: For row-store tables, the valid values are YES and NO.
Default value: NO
- COMPRESSLEVEL
Specifies the table data compression ratio and duration at the same compression level. This divides a compression level into sublevels, providing more choices for compression ratio and duration. As the value becomes greater, the compression ratio becomes higher and duration longer at the same compression level.
Value range: 0 to 3
Default value: 0
- segment
The data is stored in segment-page mode. This parameter supports only row-store tables. Column-store tables are not supported. Protection against unauthorized deletion and damage of physical files 1 to 5 is not supported.
Value range: on and off
Default value: off
- hashbucket
Creates a hash table that uses buckets. This parameter supports only row-store tables.
Value range: on and off
Default value: off
- For details about constraints on hash bucket tables, see section "Online Scale-Out Based on Hash Bucket Tables" in Administrator Guide.
- When creating a hash bucket table, ensure that the current database is bound to the default node group of the cluster. For details about how to bind a database to a group, see ALTER DATABASE.
- In current version, DDL operations on hash bucket tables are affected. Therefore, you are advised not to frequently perform DDL operations on hash bucket tables.
- Hash bucket tables are bound to segmented paging storage, that is, when hashbucket is set to on, segment is set to on.
- You are advised not to create a small table (ordinary table: < 8 MB; partitioned table: < 8 MB x Number of partitions) as a hash bucket table. The sequential scan performance deteriorates twice that of an ordinary table.
- enable_tde
Indicates that the table is an encrypted table. The database automatically encrypts the data in the encryption table before storing it. Before using this parameter, ensure that TDE has been enabled using the GUC parameter enable_tde and the information for accessing the key service has been set using the GUC parameter tde_key_info. For details about how to use this parameter, see "Transparent Data Encryption" in Feature Guide. This parameter applies only to row-store tables, segment-page tables, hash bucket tables, temporary tables, and unlogged tables.
Value range: on and off When enable_tde is set to on, key_type, tde_cmk_id, and dek_cipher are automatically generated by the database and cannot be manually specified or modified.
Default value: off
- encrypt_algo
Specifies the encryption algorithm of the encryption table. This parameter must be used together with enable_tde.
Value range: a string. The value can be AES_128_CTR or SM4_CTR.
Default value: null if enable_tde is not set, or AES_128_CTR if enable_tde is set.
- parallel_workers
Number of bgworker threads started when an index is created. For example, value 2 indicates that two bgworker threads are started to create indexes concurrently.
Value range: [0,32], int type. The value 0 indicates that concurrent index creation is disabled.
Default value: If this parameter is not set, the concurrent index creation function is disabled.
- dek_cipher
Specifies the DEK ciphertext. After a user sets the enable_tde parameter for a table, the database automatically generates a data key.
Value range: a string
Default value: null
- key_type
Specifies the type of the master key. After the enable_tde parameter is set for a table, the database automatically obtains the master key type from the GUC parameter tde_key_info.
Value range: a string
Default value: null
- cmk_id
Specifies the ID of the master key. After the enable_tde parameter is set for a table, the database automatically obtains the master key ID from the GUC parameter tde_key_info.
Value range: a string
Default value: null
- hasuids
If this parameter is set to on, a unique table-level ID is allocated to a tuple when the tuple is updated.
Value range: on and off
Default value: off
- logical_repl_node
Name of the DN that returns logical logs to the CN during logical decoding of a distributed replication table. For the replication table, if this parameter is not specified, the first node in the node group where the current table is located is used by default. When the RESET operation is performed on this option, logical_repl_node is reset to the first node of current table.
Value range: a string
Default value: For the non-replication table, this parameter is empty by default. For the replication table, this parameter is set to the name of the first node by default.
- STAT_STATE
Determines whether table statistics are locked. If locked, the table statistics cannot be updated. This parameter is invalid in distributed mode.
Value range: locked and unlock
Default value: unlock
- statistic_granularity
Records the default partition_mode when the table analyzes statistics. For details about partition_mode, see ANALYZE | ANALYSE. This parameter is invalid for non-partitioned tables.
Value range: See the value range of partition_mode.
Default value: AUTO
- enable_update_distkey
Determines whether the table supports the UPDATE operation on distribution keys. The distribution keys can be updated only when this attribute of the table is set to on and constraints are met. Otherwise, distribution keys do not support the UPDATE operation.
Value range: on and off
Default value: off
For details about the constraints on updating distributed columns, see Precautions in this section.
- FILLFACTOR
- WITHOUT OIDS
It is equivalent to WITH(OIDS=FALSE).
- ON COMMIT { PRESERVE ROWS | DELETE ROWS }
ON COMMIT determines what to do when you commit a temporary table creation operation. Currently, PRESERVE ROWS and DELETE ROWS are supported.
- PRESERVE ROWS (default): No special action is taken at the ends of transactions. The temporary table and its table data are unchanged.
- DELETE ROWS: All rows in the temporary table will be deleted at the end of each transaction block.
- COMPRESS | NOCOMPRESS
If you specify COMPRESS in the CREATE TABLE statement, the compression feature is triggered in case of a bulk insert operation. If this feature is enabled, a scan is performed for all tuple data within the page to generate a dictionary and then the tuple data is compressed and stored. If NOCOMPRESS is specified, the table is not compressed. Row-store tables do not support compression.
Default value: NOCOMPRESS, that is, tuple data is not compressed before storage.
- TABLESPACE tablespace_name
Specifies the tablespace where the new table is created. If not specified, the default tablespace is used.
- DISTRIBUTE BY
Specifies how the table is distributed or replicated between DNs.
Value range:
- REPLICATION: Each row in the table exists on all DNs, that is, each DN has complete table data.
- HASH ( column_name ): Each row of the table will be placed into specified DNs based on the hash value of the specified column.
- MURMURHASH ( diskey_expr ): Hash operations are performed on specified columns. The MURMURHASH algorithm is used to distribute data to corresponding DNs through mapping. diskey_expr can be a single column name or a column name modified by the lpad_s(string text, length int [, fill text]) function. When this function is used, only the first parameter can be a column name, for example, lpad_s(a,10,'0'). When lpad_s is used as a distribution key expression, the third parameter cannot be omitted.
- RANGE(column_name): maps a specified column based on the range and distributes data to the corresponding DNs.
- LIST(column_name): maps a specified column based on a specific value and distributes data to the corresponding DNs.
- For HASH distribution, the maximum number of distribution keys is the same as that of columns. A maximum of 1600 distribution keys are supported. For MURMURHASH distribution, the distribution key supports only one column (an expression or a column in a table). For RANGE (VALUE LESS THAN) and LIST distributions, the distribution key supports a maximum of four columns. For RANGE (START END) distribution, the distribution key supports only one column.
- Only the MURMURHASH distribution supports an expression as the distribution key. The distribution key expression supports only the lpad_s(string text, length int [, fill text]) function.
- The restrictions on MURMURHASH distribution are as follows:
- The stream plan, FQS plan, and PGXC plan are not supported.
- Scaling, hash bucket tables, GSIs, and materialized views are not supported.
- When two hash tables are joined and the distribution key has an INNER JOIN condition, the FQS plan is not supported.
- A node group (user-defined node group) must be specified for a MURMURHASH distributed table, and the node group must be a non-installation node group.
- Tables cannot be created using CREATE TABLE AS.
- The distribution key of the MURMURHASH distributed table must contain the NOT NULL constraint.
- For a RANGE distribution policy using the VALUE LESS THAN clause, the distribution rules are as follows:
- The comparison starts from the first column of values to be inserted.
- If the value of the inserted first column is smaller than the boundary value of the current column in the local slice, the values are directly inserted.
- If the value of the inserted first column is equal to the boundary value of the current column in the local slice, compare the value of the inserted second column with the boundary value of the next column in the local slice. If the value of the inserted second column is smaller than the boundary value of the next column in the local slice, the values are directly inserted. If they are equal, the comparison of the next columns between the source and target continues.
- If the value of the inserted first column is greater than the boundary value of the current column in the local slice, compare the value with that in the next slice.
- If the shard of the RANGE distributed table corresponds to multiple DNs, the hash value of the distribution key is calculated to perform modulo operation on the number of DN and map a new DN. For details, see the example.
- If the shard of the list distributed table corresponds to multiple DNs, the hash value of the distribution key is calculated for the default shards. Then, use the calculated hash value to perform modulo operation on the number of DN and map a new DN. For non-default shards, the round robin method is used to map values in the values list to DNs. For details, see the example.
- The RANGE/LIST distribution tables support only scale-out but not scale-in. For details about the slice scale-out rule, contact the administrator.
For the hash distribution, column_name supports the following data types:- Integer types: TINYINT, SMALLINT, INT, BIGINT, and NUMERIC/DECIMAL
- Character types: CHAR, BPCHAR, VARCHAR, VARCHAR2, NVARCHAR2, and TEXT
- Date/time types: DATE, TIME, TIMETZ, TIMESTAMP, TIMESTAMPTZ, INTERVAL, and SMALLDATETIME
For the MURMURHASH distribution, the data type of the column name in diskey_expr must be one of the following:For the RANGE (VALUES LESS THAN) or LIST distribution, column_name supports the following data types:- Integer types: TINYINT, SMALLINT, INT, BIGINT, and NUMERIC/DECIMAL
- Character types: CHAR, BPCHAR, VARCHAR, VARCHAR2, NVARCHAR2, and TEXT
- Date/time types: DATE, TIMESTAMP, and TIMESTAMPTZ
For the range (start end) distribution, the data type of column_name must be one of the following:
- Integer types: TINYINT, SMALLINT, INT, BIGINT, and NUMERIC/DECIMAL
- Date/time types: DATE, TIMESTAMP, and TIMESTAMPTZ
When you create a table, the choices of distribution keys and partition keys have major impact on SQL query performance. Therefore, select appropriate distribution keys and partition keys with strategies.
- Select appropriate distribution keys.
A hash table's distribution key should evenly distribute data on each DN to prevent skewing the data or distributing it unevenly across DNs. Determine appropriate distribution keys based on the following principles:
- Determine whether data is skewed.
Connect to the database and check the number of tuples on each DN. Replace tablename with the actual name of the table to be analyzed.
gaussdb=# SELECT a.count,b.node_name FROM (SELECT count(*) AS count,xc_node_id FROM tablename GROUP BY xc_node_id) a, pgxc_node b WHERE a.xc_node_id=b.node_id ORDER BY a.count DESC;
If tuple numbers vary greatly (several times or tenfold) on each DN, a data skew occurs. Change the data distribution key based on the following principles:
- Rebuild a table to change its distribution keys. ALTER TABLE cannot change distribution keys. Therefore, you need to rebuild a table when changing its distribution keys.
Principles for selecting distribution keys are as follows:
The value of the distribution key should be discrete so that data can be evenly distributed on each DN. You can select the primary key of the table as the distribution key. For example, for a person information table, choose the ID card number column as the distribution key.
With the above principles are met, you can select join conditions as distribution keys so that join tasks can be pushed down to DNs, reducing the amount of data transferred between the DNs.
- Determine whether data is skewed.
- Select appropriate partition keys.
In range partitioning, a table is partitioned based on ranges defined by one or more columns, with no overlap between the ranges of values assigned to different partitions. Each range has a dedicated partition for data storage.
Modify partition keys to make the query result stored in the same or least partitions (partition pruning). Obtain consecutive I/O to improve the query performance.
In actual services, time is used to filter query objects. Therefore, you can use time as a partition key, and change the key value based on the total data volume and data volume of a single query.
- RANGE/LIST distribution
If no DN is specified for the shards of a RANGE/LIST distributed table, the database uses the Round Robin algorithm to allocate DNs to the shards. In addition, if RANGE/LIST distribution is used, you are advised to define as many shards as possible when creating a table for future capacity expansion. If the defined number of shards is less than the number of DNs before scale-out, data redistribution cannot be performed on new DNs. Note that the sharding rules are designed by users. In some extreme cases, scale-out may not solve the problem of insufficient storage space.
- TO { GROUP groupname | NODE ( nodename [, ... ] ) }
TO GROUP specifies the node group to which the table to be created belongs. TO NODE is used for internal scale-out tools.
- CONSTRAINT [constraint_name]
Specifies the name of a column or table constraint. The optional constraint clauses specify constraints that new or updated rows must satisfy for an INSERT or UPDATE operation to succeed.
There are two ways to define constraints:
- A column constraint is defined as part of a column definition, and it is bound to a particular column.
- A table constraint is not bound to a particular column but can apply to more than one column.
For a table constraint, constraint_name is optional in a MySQL-compatible database (that is, sql_compatibility set to 'MYSQL'). In other databases, constraint_name must be added.
- NOT NULL
Forbids NULL values in columns.
- NULL
Allows to contain NULL values. This is the default setting.
This clause is only provided for compatibility with non-standard SQL databases. It is not recommended.
- CHECK ( expression )
Specifies an expression producing a Boolean result where the INSERT or UPDATE operation of rows can succeed only when the expression result is TRUE or UNKNOWN; otherwise, an error is thrown and the database is not altered.
A CHECK constraint specified as a column constraint should reference only the column's value, while an expression in a table constraint can reference multiple columns.
<>NULL and !=NULL are invalid in an expression. Change them to IS NOT NULL.
- DEFAULT default_expr
Assigns a default data value to a column. The value can be any variable-free expressions. (Subqueries and cross-references to other columns in the current table are not allowed.) The data type of the default expression must match the data type of the column.
The default expression will be used in any insert operation that does not specify a value for the column. If there is no default value for a column, then the default value is NULL.
- ON UPDATE update_expr
The ON UPDATE clause is an attribute constraint of a column.
When an UPDATE operation is performed on a tuple in a table, if new values of updated columns are different from old values in the table, column values with this attribute but not in updated columns are automatically updated to the current timestamp. If new values of updated columns are the same as old values in the table, column values with this attribute but not in updated columns remain unchanged. If columns with this attribute are in updated columns, column values are updated according to the specified update value.
- This attribute can be specified only in MySQL 5.7 (that is, sql_compatibility is set to 'MYSQL', b_format_version is set to '5.7', and b_format_dev_version is set to 's1').
- In terms of syntax, update_expr supports three keywords: CURRENT_TIMESTAMP, LOCALTIMESTAMP, and NOW(). You can also specify or not specify the precision of a keyword with parentheses. For example, ON UPDATE CURRENT_TIMESTAMP(), ON UPDATE CURRENT_TIMESTAMP(5), ON UPDATE LOCALTIMESTAMP(), and ON UPDATE LOCALTIMESTAMP(6). If the keyword does not contain parentheses or contains empty parentheses, the precision is 0. The NOW keyword cannot contain parentheses. The three types of keywords are synonyms of each other and have the same attribute effect.
- This attribute can be specified only for columns of the following types: timestamp, datetime, date, time without time zone, smalldatetime, and abstime.
- The CREATE TABLE AS syntax does not inherit the column attributes.
- The CREATE TABLE LIKE syntax can use INCLUDING UPDATE or EXCLUDING UPDATE to inherit or exclude a constraint. The LIKE syntax is inherited from the LIKE syntax of PostgreSQL. Currently, the ILM policy information of the old table cannot be copied.
- The precision specified by this attribute can be different from the precision specified by the type in the corresponding column. After the column value is updated through this attribute, the minimum precision is displayed. For example, CREATE TABLE t1 (col1 timestamp(6) ON UPDATE CURRENT_TIMESTAMP(3));. If the UPDATE syntax triggers the attribute to take effect, three decimal places in the value of col1 are displayed after the update.
- The same column cannot be specified for this attribute and the generated column constraint at the same time.
- This attribute cannot be specified for the partition key in a partitioned table.
- In distributed scenarios, this attribute cannot be specified for distribution keys and primary keys.
- AUTO_INCREMENT
Specifies an auto-increment column.
Currently, the auto-increment column is a local auto-increment column by default. The auto-increment column increases independently on each DN and does not affect each other. The auto-increment column is not unique globally.
If the value of this column is not specified (or the value of this column is set to 0, NULL, or DEFAULT), the value of this column is automatically increased by the auto-increment counter.
If this column is inserted or updated to a value greater than the current auto-increment counter, the auto-increment counter is updated to this value after the command is executed successfully.
The initial auto-increment value is set by the AUTO_INCREMENT [ = ] value clause. If it is not set, the default value 1 is used.
- The auto-increment column can be specified only when sql_compatibility is set to 'MYSQL'.
- The data type of the auto-increment column can only be integer, 4-byte or 8-byte floating point, or Boolean.
An error occurs if the auto-increment continues after an auto-increment value reaches the maximum value of a column data type.
- Each table can have only one auto-increment column.
- It is recommended that the local auto-increment column be the first column of a non-global secondary index. Otherwise, errors may occur when some operations are performed on a table that contains auto-increment columns, for example, ALTER TABLE EXCHANGE PARTITION.
- The DEFAULT value cannot be specified for an auto-increment column.
- The expression of the CHECK constraint cannot contain auto-increment columns.
- You can specify that the auto-increment column can be NULL. If it is not specified, the auto-increment column contains the NOT NULL constraint by default.
- When a table containing a local auto-increment column is created, a sequence that depends on the column is created on each DN as an auto-increment counter. You are not allowed to modify or delete the sequence using sequence-related functions. You can view the value of the sequence.
- The auto-increment and refresh operations of the auto-increment counter are not rolled back.
- Before data is inserted into a table, 0 or NULL triggers auto-increment. After data is inserted or updated to a table, the auto-increment counter is updated. If an error is reported after auto-increment, and data is not inserted or updated to the table, the auto-increment counter does not roll back. Subsequent insert statements trigger auto-increment based on the auto-increment counter. As a result, the values of the auto-increment columns in the table are discontinuous.
- If you insert or import reserved auto-increment cache values in batches, the values in the auto-increment column may be discontinuous. For details, see the description of the auto_increment_cache parameter.
- An auto-increment column cannot be used as a distribution key.
-
Currently, local temporary tables do not support auto-increment columns.
-
DISTRIBUTE BY REPLICATION does not support auto-increment columns.
- UNIQUE [KEY] index_parameters
Specifies that a group of one or more columns of a table can contain only unique values.
For the purpose of a unique constraint, null is not considered equal.
UNIQUE KEY can be used only when sql_compatibility is set to 'MYSQL', which has the same semantics as UNIQUE.
- UNIQUE [ index_name ][ USING method ] ( { {column_name | ( expression ) } [ ASC | DESC ] }[, ... ] ) index_parameters
Specifies that a group of one or more columns of a table can contain only unique values.
For the purpose of a unique constraint, null is not considered equal.
index_name indicates the index name.
- The index_name parameter is supported only in a MySQL-compatible database (that is, sql_compatibility set to 'MYSQL').
- For a unique key constraint, if both constraint_name and index_name are specified, the value of index_name is used as the index name.
- PRIMARY KEY index_parameters
PRIMARY KEY [ USING method ] ( { column_name [ ASC | DESC ] } [, ... ] ) index_parameters
Specifies that a column or columns of a table can contain only unique (non-duplicate) and non-null values.
Only one primary key can be specified for a table.
- REFERENCES
The current version does not support the REFERENCES clause.
- USING method
Specifies the name of the index method to be used.
For details about the value range, see the USING method in Parameters.
- The USING method is supported only in a MySQL-compatible database (that is, sql_compatibility set to 'MYSQL').
- In MySQL-compatible mode, if USING method is not specified, the default index method is B-tree for Astore or UB-tree for Ustore.
- For the constraint of building a global secondary index by default, the bottom layer of the global secondary index uses UB-tree storage, even if the storage mode specified by the user is B-tree.
- If the storage mode of a table is Ustore and the constraint in the SQL statement is specified as USING BTREE, the underlying layer automatically creates the constraint as USING UBTREE.
- ASC | DESC
ASC specifies an ascending (default) sort order. DESC specifies a descending sort order.
ASC|DESC is supported only in a MySQL-compatible database (that is, sql_compatibility set to 'MYSQL').
- expression
Specifies an expression index constraint based on one or more columns of the table. It must be written in parentheses.
Expression indexes in the UNIQUE constraint are supported only in a MySQL-compatible database (that is, sql_compatibility set to 'MYSQL').
- DEFERRABLE | NOT DEFERRABLE
Determines whether the constraint can be deferred. A constraint that is not deferrable will be checked immediately after every command. Checking of constraints that are deferrable can be postponed until the end of the transaction using the SET CONSTRAINTS command. NOT DEFERRABLE is the default value. Currently, only UNIQUE and PRIMARY KEY constraints accept this clause. All the other constraints are not deferrable.
- INITIALLY IMMEDIATE | INITIALLY DEFERRED
If a constraint is deferrable, this clause specifies the default time to check the constraint.
- If the constraint is INITIALLY IMMEDIATE (default value), it is checked after each statement.
- If the constraint is INITIALLY DEFERRED, it is checked only at the end of the transaction.
The constraint check time can be altered using the SET CONSTRAINTS statement.
- USING INDEX TABLESPACE tablespace_name
Allows selection of the tablespace in which the index associated with a UNIQUE or PRIMARY KEY constraint will be created. If not specified, the index is created in default_tablespace. If default_tablespace is empty, the default tablespace of the database is used.
- ENCRYPTION_TYPE = encryption_type_value
For the encryption type in the ENCRYPTED WITH constraint, the value of encryption_type_value is DETERMINISTIC or RANDOMIZED.
- [DEFAULT] CHARACTER SET | CHARSET [ = ] default_charset
Specifies the default character set of a table. If you specify a character set separately, the default collation of the table is set to the default collation of the specified character set.
This syntax is supported only when sql_compatibility is set to 'MYSQL'.
- [DEFAULT] COLLATE [ = ] default_collation
Specifies the default collation of a table. If you specify a collation separately, the default character set of the table is set to the character set corresponding to the specified collation.
This syntax is supported only when sql_compatibility is set to 'MYSQL'. For details about the collation, see Table 1.
If the character set or collation of a table is not explicitly specified and the default character set or collation of the schema is specified, the character set or collation of the table is inherited from the schema. If the default character set or collation of a schema does not exist, the character set and collation of the table inherit the character set and collation of the current database when b_format_behavior_compat_options contains 'default_collation'.
Examples
- Temporary table
-- Create a temporary table and specify that this table is deleted when the transaction is committed. gaussdb=# CREATE TEMPORARY TABLE test_t2( id CHAR(7), name VARCHAR(20), province VARCHAR(60), -- Province country VARCHAR(30) DEFAULT 'China' -- Country ) ON COMMIT DELETE ROWS; gaussdb=# DROP TABLE test_t2;
- Specifying the character set and collation during table creation
-- Create a frontend database. gaussdb=# CREATE DATABASE testdb1 ENCODING = 'UTF8'; \c testdb1 -- Create table t1. Set the default character set of t1 to utf8mb4 and the default collation to utf8mb4_bin. Set the character set and collation of the c1 column to the default values of the table. Set the character set of the c2 column to utf8mb4, and its collation to utf8mb4_unicode_ci. testdb1=# CREATE TABLE t1(c1 text, c2 text charset utf8mb4 collate utf8mb4_unicode_ci) charset utf8mb4 collate utf8mb4_bin; -- Delete. testdb1=# DROP TABLE t1; testdb1=# \c postgres gaussdb=# DROP DATABASE testdb1;
- IF NOT EXISTS keyword
If this keyword is used, NOTICE is reported when the table does not exist. If this keyword is not used, ERROR is reported. In either case, the table fails to be created.
gaussdb=# CREATE TABLE test_t3(id INT); -- Create a table named test_t3. gaussdb=# CREATE TABLE test_t3(id INT); ERROR: relation "test_t3" already exists in schema "public" DETAIL: creating new table with existing name in the same schema -- Use the IF NOT EXISTS keyword. gaussdb=# CREATE TABLE IF NOT EXISTS test_t3(id INT); NOTICE: relation "test_t3" already exists, skipping CREATE TABLE -- Drop the table. gaussdb=# DROP TABLE test_t3;
- Specifying a tablespace during table creation
-- Create a tablespace. gaussdb=# CREATE TABLESPACE ds_tbs1 RELATIVE LOCATION 'tablespace/tablespace_1'; -- Specify a tablespace when creating a table. gaussdb=# CREATE TABLE test(id CHAR(7), name VARCHAR(20)) TABLESPACE ds_tbs1; -- Delete the table and tablespace. gaussdb=# DROP TABLE test; gaussdb=# DROP TABLESPACE ds_tbs1;
- Specifying the AUTO_INCREMENT column during table creation
-- The local auto-increment column is not used as the distribution key. The value starts from 10 for each DN. (Change the DN name by running SELECT node_name FROM pgxc_node WHERE node_type = 'D' based on the actual situation.) gaussdb=# CREATE TABLE local_autoinc(col int AUTO_INCREMENT, col1 int) AUTO_INCREMENT = 10 DISTRIBUTE BY LIST(col1)( SLICE s1 VALUES (1) DATANODE datanode1, SLICE s2 VALUES (2) DATANODE datanode2 ); -- You are advised to use the auto-increment column as the first column of the index to create an index. gaussdb=# CREATE INDEX local_autoinc_ai ON local_autoinc(col); -- Distribute data to DN1. The value NULL triggers auto-increment, and the auto-increment value is 10. gaussdb=# INSERT INTO local_autoinc(col, col1) VALUES(NULL,1); -- Distribute data to DN2. The value 0 triggers auto-increment, and the auto-increment value is 10. gaussdb=# INSERT INTO local_autoinc(col, col1) VALUES(1 - 1,2); -- Distribute data to DN1. The value 100 does not trigger auto-increment. After data is inserted successfully, the auto-increment value is updated to 100. gaussdb=# INSERT INTO local_autoinc(col, col1) VALUES(100,1); -- Distribute data to DN1. The value 0 triggers auto-increment, and the auto-increment value is 101. gaussdb=# INSERT INTO local_autoinc(col, col1) VALUES(0,1); -- Distribute data to DN2. The value DEFAULT triggers auto-increment, and the auto-increment value is 11. gaussdb=# INSERT INTO local_autoinc(col, col1) VALUES(DEFAULT,2); gaussdb=# SELECT col,col1 FROM local_autoinc ORDER BY 2,1; col | col1 -----+------ 10 | 1 100 | 1 101 | 1 10 | 2 11 | 2 (5 rows) -- Delete. gaussdb=# DROP TABLE local_autoinc;
- Creating a table using CREATE TABLE ... LIKE
-- Create the source table t1. gaussdb=# CREATE TABLE t1(col INT); CREATE TABLE gaussdb=# \d t1 Table "public.t1" Column | Type | Modifiers --------+---------+----------- col | integer | -- Create the target table t2. gaussdb=# CREATE TABLE t2(LIKE t1); CREATE TABLE gaussdb=# \d t2 Table "public.t2" Column | Type | Modifiers --------+---------+----------- col | integer | -- Delete. gaussdb=# DROP TABLE t1,t2;
Examples of Creating a Table and Adding Constraints to the Table
- Non-null constraints
If no value is specified for a column with a NOT NULL constraint when data is added, an error is reported. You can add NOT NULL constraints to multiple columns in a table.
-- Create a table and add a NOT NULL constraint to the id column. gaussdb=# CREATE TABLE test_t4( id CHAR(7) NOT NULL, name VARCHAR(20), province VARCHAR(60), -- Province country VARCHAR(30) DEFAULT 'China' -- Country ); -- If the value of id is not specified or is NULL during data insertion, the NOT NULL constraint is triggered. As a result, the insertion fails. gaussdb=# INSERT INTO test_t4 (name,province) VALUES ('scott','Shanghai'); ERROR: null value in column "id" violates not-null constraint DETAIL: Failing row contains (null, scott, Shanghai, China) -- Drop the table. gaussdb=# DROP TABLE test_t4;
- Unique constraint
The keyword UNIQUE is used to add a unique constraint to a column. When data is inserted, the constraint is triggered if the column is duplicate. Multiple NULL values are not duplicate values. When a unique constraint is added, a unique index is automatically added. You can add unique constraints to multiple columns in a table.
-- Create a table to add unique constraints. gaussdb=# CREATE TABLE test_t5( id CHAR(7) UNIQUE USING INDEX TABLESPACE pg_default, -- You can specify a tablespace or use the default tablespace. name VARCHAR(20), province VARCHAR(60), -- Province country VARCHAR(30) DEFAULT 'China' -- Country ); -- You can also use the following method to manually name a unique constraint and add constraints for multiple columns: gaussdb=# CREATE TABLE test_t6( id CHAR(7), name VARCHAR(20), province VARCHAR(60), -- Province country VARCHAR(30) DEFAULT 'China', -- Country CONSTRAINT unq_test_id UNIQUE (id,name) );
-- When data with duplicate IDs is inserted, constraints are triggered. As a result, the insertion fails. gaussdb=# INSERT INTO test_t5(id) VALUES('0000010'); INSERT 0 1 gaussdb=# INSERT INTO test_t5(id) VALUES('0000010'); ERROR: duplicate key value violates unique constraint "test_t5_id_key" DETAIL: Key (id)=(0000010) already exists.
-- The constraint is not triggered when data whose id is NULL is inserted for multiple times. gaussdb=# INSERT INTO test_t5(id) VALUES (NULL); INSERT 0 1 gaussdb=# INSERT INTO test_t5(id) VALUES (NULL); INSERT 0 1 gaussdb=# SELECT * FROM test_t5; id | name | province | country ---------+------+----------+-------- 0000010 | | | China | | | China | | | China -- Drop the table. gaussdb=# DROP TABLE test_t5; gaussdb=# DROP TABLE test_t6;
- Primary key constraints
The keyword PRIMARY KEY is used to add a unique constraint to a column. The column must be unique and cannot be empty. When a primary key constraint is added, a unique index is automatically created for the table, and a non-null constraint is automatically added for the column.
Only one primary key constraint can be defined in each table.
-- Create a table and add a primary key constraint to the table. gaussdb=# CREATE TABLE test_t6( id CHAR(7) PRIMARY KEY, name VARCHAR(20), province VARCHAR(60), -- Province country VARCHAR(30) DEFAULT 'China' -- Country ); gaussdb=# INSERT INTO test_t6 (id,name,province) VALUES ('0000001','july','Beijing'); -- You can also use the following method to manually name a unique constraint and add constraints for multiple columns: gaussdb=# CREATE TABLE test_t7( id CHAR(7), name VARCHAR(20), province VARCHAR(60), -- Province country VARCHAR(30) DEFAULT 'China', -- Country CONSTRAINT pk_test_t6_id PRIMARY KEY (id,name) );
-- Insert data whose id is NULL to trigger the constraint. gaussdb=# INSERT INTO test_t6 (id,name,province) VALUES (NULL,'july','Beijing'); ERROR: null value in column "id" violates not-null constraint DETAIL: Failing row contains (null, july, Beijing, China). -- Insert data with duplicate id values to trigger the constraint. gaussdb=# INSERT INTO test_t6 (id,name,province) VALUES ('0000001','ben','Shanghai'); ERROR: duplicate key value violates unique constraint "test_t6_pkey" DETAIL: Key (id)=(0000001) already exists.
-- Drop the table. gaussdb=# DROP TABLE test_t6; gaussdb=# DROP TABLE test_t7;
- Check constraints
The keyword CHECK adds a check constraint to a column. The check constraint must reference one or more columns in the table, and the result returned by the expression must be a Boolean value. Currently, expressions cannot contain subqueries. Both check and non-null constraints can be defined for the same column.
-- Create a table and add check constraints. gaussdb=# CREATE TABLE test_t8 ( id CHAR(7), name VARCHAR(20), age INT CHECK(age > 0 AND age < 150) ); -- You can also use the following SQL statements to name a check constraint and add the check constraint for one or more columns: gaussdb=# CREATE TABLE test_t9 ( id CHAR(7), name VARCHAR(20), age INT, CONSTRAINT chek_test_t8_age CHECK(age > 0 AND age < 150) );
-- If a value that does not comply with the expression is inserted, the check constraint is triggered. As a result, the insertion fails. gaussdb=# INSERT INTO test_t8 (id,name,age) VALUES ('0000007','scott',200); ERROR: new row for relation "test_t8" violates check constraint "test_t8_age_check" DETAIL: N/A
-- Drop the table. gaussdb=# DROP TABLE test_t8; gaussdb=# DROP TABLE test_t9;
Data Distribution Example
- REPLICATION
gaussdb=# CREATE TABLE test_replication( id CHAR(7), name VARCHAR(20), province VARCHAR(60), -- Province country VARCHAR(30) DEFAULT 'China' -- Country )DISTRIBUTE BY REPLICATION; -- Query table information. gaussdb=# \d+ test_replication Table "public.test_replication" Column | Type | Modifiers | Storage | Stats target | Description ----------+-----------------------+------------------------------------+----------+--------------+------------- id | character(7) | | extended | | name | character varying(20) | | extended | | province | character varying(60) | | extended | | country | character varying(30) | default 'China'::character varying | extended | | Has OIDs: no Distribute By: REPLICATION Location Nodes: ALL DATANODES Options: orientation=row, logical_repl_node=-1, compression=no -- Delete. gaussdb=# DROP TABLE test_replication;
- HASH
-- Define a hash table. gaussdb=# CREATE TABLE test_hash( id CHAR(7), name VARCHAR(20), province VARCHAR(60), -- Province country VARCHAR(30) DEFAULT 'China' -- Country )DISTRIBUTE BY HASH(id); -- Insert data. gaussdb=# INSERT INTO test_hash VALUES ('0000001', 'Bob', 'Shanghai', 'China'), ('0000002', 'Jack', 'Beijing', 'China'), ('0000003', 'Scott', 'Beijing', 'China');
-- View data distribution. gaussdb=# SELECT a.count,b.node_name FROM (SELECT COUNT(*) AS count, xc_node_id FROM test_hash GROUP BY xc_node_id) a, pgxc_node b WHERE a.xc_node_id=b.node_id ORDER BY a.count DESC; count | node_name -------+------------------- 2 | dn_6001_6002_6003 1 | dn_6004_6005_6006
-- Drop the table. gaussdb=# DROP TABLE test_hash;
- MURMURHASH
-- The test environment contains one CN and six DNs. -- Create a node group. You can run the SELECT node_name FROM PGXC_NODE WHERE node_type = 'D' command to query the DN name in the node group. In the query result, replace the DN name following WITH in the CREATE NODE GROUP statement as required. gaussdb=# CREATE NODE GROUP NG1 WITH(datanode1, datanode2, datanode3, datanode4, datanode5, datanode6); -- Define a table using MURMURHASH. gaussdb=# CREATE TABLE test_murmurhash1 (a int NOT NULL, b int) DISTRIBUTE BY MURMURHASH(a) TO GROUP NG1; gaussdb=# CREATE TABLE test_murmurhash2 (a int NOT NULL, b int) DISTRIBUTE BY MURMURHASH(lpad_s(a,10,'0')) TO GROUP NG1; -- Insert data. gaussdb=# INSERT INTO test_murmurhash1 VALUES(0,1); gaussdb=# INSERT INTO test_murmurhash2 VALUES(1,2); -- Query data. gaussdb=# SELECT * FROM test_murmurhash1; a | b ---+--- 0 | 1 (1 row) gaussdb=# SELECT * FROM test_murmurhash2; a | b ---+--- 1 | 2 (1 row) -- Drop the table. gaussdb=# DROP TABLE test_murmurhash1; gaussdb=# DROP TABLE test_murmurhash2; -- Delete a node group. gaussdb=# DROP NODE GROUP NG1;
- RANGE
-- Define a range-distributed table. (Change the DN name based on the actual situation. You can run the SELECT node_name FROM PGXC_NODE WHERE node_type = 'D' command to query the DN name.) gaussdb=# CREATE TABLE test_range( id INT, name VARCHAR(20), province VARCHAR(60), -- Province country VARCHAR(30) DEFAULT 'China' -- Country )DISTRIBUTE BY RANGE(id)( SLICE s1 VALUES LESS THAN (100) DATANODE dn_6001_6002_6003, SLICE s2 VALUES LESS THAN (200) DATANODE dn_6004_6005_6006, SLICE s3 VALUES LESS THAN (MAXVALUE) DATANODE dn_6007_6008_6009 ); -- Insert data. gaussdb=# INSERT INTO test_range VALUES (52, 'Bob', 'Beijing', 'China'); gaussdb=# INSERT INTO test_range VALUES (100, 'Ben', 'Shanghai', 'China'); gaussdb=# INSERT INTO test_range VALUES (150, 'Scott', 'Guangzhou', 'China'); gaussdb=# INSERT INTO test_range VALUES (300, 'Jordan', 'Beijing', 'China');
-- View data distribution. gaussdb=# SELECT a.count,b.node_name FROM (SELECT COUNT(*) AS count, xc_node_id FROM test_range GROUP BY xc_node_id) a, pgxc_node b WHERE a.xc_node_id=b.node_id ORDER BY a.count DESC; count | node_name -------+------------------- 2 | dn_6004_6005_6006 1 | dn_6001_6002_6003 1 | dn_6007_6008_6009 (3 rows)
-- Query data stored on each DN. gaussdb=# SELECT b.node_name, a.* FROM (SELECT *, xc_node_id FROM test_range) a, pgxc_node b WHERE a.xc_node_id=b.node_id order by node_name; node_name | id | name | province | country | xc_node_id -------------------+-----+--------+-----------+---------+------------- dn_6001_6002_6003 | 52 | Bob | Beijing | China | -1072999043 dn_6004_6005_6006 | 100 | Ben | Shanghai | China | -564789568 dn_6004_6005_6006 | 150 | Scott | Guangzhou | China | -564789568 dn_6007_6008_6009 | 300 | Jordan | Beijing | China | 1532339558 (4 rows)
-- Drop the table. gaussdb=# DROP TABLE test_range;
- LIST
-- Define a list-distributed table. (Change the DN name based on the actual situation. You can run the SELECT node_name FROM PGXC_NODE WHERE node_type = 'D' command to query the DN name.) gaussdb=# CREATE TABLE test_list( id INT, name VARCHAR(20), country VARCHAR(30) DEFAULT 'China' -- Country )DISTRIBUTE BY LIST(country)( SLICE s1 VALUES ('China') DATANODE dn_6001_6002_6003, SLICE s2 VALUES ('USA') DATANODE dn_6004_6005_6006, SLICE s3 VALUES (DEFAULT) DATANODE dn_6007_6008_6009 ); -- Insert data. gaussdb=# INSERT INTO test_list VALUES (1,'Scott','China'); gaussdb=# INSERT INTO test_list VALUES (2,'Henry','USA'); gaussdb=# INSERT INTO test_list VALUES (3,'Michael','France'); gaussdb=# INSERT INTO test_list VALUES (4,'Jack','UK');
-- Query data stored on each DN. gaussdb=# SELECT b.node_name, a.* FROM (SELECT *, xc_node_id FROM test_list) a, pgxc_node b WHERE a.xc_node_id=b.node_id order by node_name; node_name | id | name | country | xc_node_id -------------------+----+---------+--------+------------- dn_6001_6002_6003 | 1 | Scott | China | -1072999043 dn_6004_6005_6006 | 2 | Henry | USA | -564789568 dn_6007_6008_6009 | 3 | Michael | France | 1532339558 dn_6007_6008_6009 | 4 | Jack | UK | 1532339558 (4 rows)
-- Drop the table. gaussdb=# DROP TABLE test_list;
Helpful Links
Suggestions
- UNLOGGED
- The unlogged table and its indexes do not use the WAL mechanism during data writing. Their write speed is much higher than that of ordinary tables. Therefore, they can be used for storing intermediate result sets of complex queries to improve query performance.
- The unlogged table has no primary/standby mechanism. In case of system faults or abnormal breakpoints, data loss may occur. Therefore, the unlogged table cannot be used to store basic data.
- TEMPORARY | TEMP
- A temporary table is automatically dropped at the end of a session.
- The temporary table is visible only to the current CN.
- LIKE
- The new table automatically inherits all column names, data types, and not-null constraints from this table. The new table is irrelevant to the original table after the creation.
- LIKE INCLUDING DEFAULTS
- The default expressions are copied from the original table to the new table only if INCLUDING DEFAULTS is specified. The default behavior is to exclude default expressions, resulting in the copied columns in the new table having default values NULL.
- LIKE INCLUDING CONSTRAINTS
- The CHECK constraints are copied from the original table to the new table only when INCLUDING CONSTRAINTS is specified. Other types of constraints are never copied to the new table. Not-null constraints are always copied to the new table. These rules also apply to column constraints and table constraints.
- LIKE INCLUDING INDEXES
- Any indexes on the original table will not be created on the new table, unless the INCLUDING INDEXES clause is specified.
- LIKE INCLUDING STORAGE
- STORAGE settings for the copied column definitions are copied only if INCLUDING STORAGE is specified. The default behavior is to exclude STORAGE settings.
- LIKE INCLUDING COMMENTS
- If INCLUDING COMMENTS is specified, comments for the copied columns, constraints, and indexes are copied. The default behavior is to exclude comments.
- LIKE INCLUDING PARTITION
- If INCLUDING PARTITION is specified, the partition definitions of the source table are copied to the new table, and the new table no longer uses the PARTITION BY clause. By default, the partition definition of the source table is not copied.
- LIKE INCLUDING RELOPTIONS
- If INCLUDING RELOPTIONS is specified, the new table will copy the storage parameter (that is, WITH clause) of the source table. The default behavior is to exclude partition definition of the storage parameter of the original table.
- LIKE INCLUDING DISTRIBUTION
- If INCLUDING DISTRIBUTION is specified, the distribution information of the original table is copied to the new table, including distribution type and key, and the new table no longer use the DISTRIBUTE BY clause. The default behavior is to exclude distribution information of the original table.
- LIKE INCLUDING ALL
- INCLUDING ALL contains the content of INCLUDING DEFAULTS, INCLUDING CONSTRAINTS, INCLUDING INDEXES, INCLUDING STORAGE, INCLUDING COMMENTS, INCLUDING PARTITION, INCLUDING RELOPTIONS, and INCLUDING DISTRIBUTION.
- ORIENTATION ROW
- Creates a row-store table. Row store applies to the OLTP service, which has many interactive transactions. An interaction involves many columns in the table. Using row store can improve the efficiency.
- DISTRIBUTE BY
- It is recommended that a fact table or dimension table containing a large amount of data be created as a distributed table. Each row of the table will be placed into specified DNs based on the hash value of the specified column. The syntax is DISTRIBUTE BY HASH(column_name).
- It is recommended that a dimension table containing a small amount of data be created as a replication table. Each row in the table exists on all DNs. That is, each DN has complete table data. The syntax is DISTRIBUTE BY REPLICATION.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot