Updated on 2024-05-07 GMT+08:00

CREATE TABLE

Function

CREATE TABLE creates a time series table in the current database. The table will be owned by the user who created it.

The stream data warehouse provides DDL statements for creating a time series table. To create a time series table that stores data based on key values, the DDL statement needs to include the dimension attribute tstag, indicator attribute tsfield, and time attribute tstime. The time series database (TSDB) allows you to specify the time to live (TTL) of data and the period for creating partitions (PERIOD) to automatically create or delete partitions. In addition, orientation needs to be set to timeseries in the table creation statement.

Precautions

  • To create a time series table, you must have the USAGE permission on schema cstore.
  • All attributes of a time series table, except the time attribute, must be specified to either a dimension(TSTAG) or an indicator (TSFIELD).
  • If PARTITION BY is specified explicitly, only tstime can be used as the partition key.
  • If an index column is deleted using DROP COLUMN, the remaining index columns will be used to rebuild the index. If all the index columns are deleted, the first 10 tag columns will be used to rebuild the index.
  • Time series tables do not support UPDATE, UPSERT, primary keys, or PCKs.
  • Each time series table is bound to a tag table. The OIDs and index OIDs of the tag table are recorded in the reltoastrelid and reltoastidxid columns of the pg_class table, respectively.
  • By default, the first 10 tag columns of a tag table are used to create an index.
  • Tag tables cannot be queried on CNs. The table size returned in a query contains the tag table size.
  • The kvtype column setting in the statement for creating a non-time series table does not take effect.

Syntax Format

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
CREATE TABLE [ IF NOT EXISTS ] table_name
({ column_name data_type [ kv_type ] 
    | LIKE source_table [like_option [...] ] }
}
    [, ... ])
[ WITH ( {storage_parameter = value}  [, ... ] ) ]

[ TABLESPACE tablespace_name ]
[ DISTRIBUTE BY  HASH ( column_name [,...])]
[ TO { GROUP groupname | NODE ( nodename [, ... ] ) } ]
[ PARTITION BY { 
        {RANGE (partition_key) ( partition_less_than_item [, ... ] )} 
 } [ { ENABLE | DISABLE } ROW MOVEMENT ] ]; 
The options for LIKE are as follows:
{ INCLUDING | EXCLUDING } { DEFAULTS | CONSTRAINTS | INDEXES | STORAGE | COMMENTS | PARTITION | RELOPTIONS | DISTRIBUTION | ALL }

Parameter description

  • IF NOT EXISTS

    Does not throw an error if a table with the same name exists. A notice is issued in this case.

  • table_name

    Specifies the name of the table to be created.

  • column_name

    Specifies the name of a column to be created in the new table.

  • data_type

    Specifies the data type of the column.

  • kv_type

    kv_type attributes of columns, including a dimension attribute (TSTAG), an indicator attribute (TSFIELD), and a time attribute (TSTIME).

    One and only one TSTIME attribute must be specified. Columns of the TSTIME type cannot be deleted. At least one of the TSTAG and TSFIELD columns must be specified, or an error will be reported during table creation.

    The TSTAG column supports the text, char, bool, int, and big int types.

    The TSTime column supports the timestamp with time zone and timestamp without time zone types. It also supports the date type in databases compatible with the Oracle syntax. If time zone-related operations are involved, select a time type with time zone.

    For details about the data types supported by the TSFIELD column, see Data Types Supported by TSFIELD.

  • LIKE source_table [like_option...]

    Specifies a table from which the new table automatically copies all column names and their data types.

    The new table and the original table are decoupled after creation is complete. Changes to the original table will not be applied to the new table, and scans on the original table will not be performed on the data of the new table.

    Columns copied by LIKE are not merged with the same name. If the same name is specified explicitly or in another LIKE clause, an error will be reported.

    A time series table only inherits from another time series table.

  • WITH( { storage_parameter = value } [, ...] )

    Specifies an optional storage parameter for a table.

    • ORIENTATION

      Specifies the storage mode (time series, row-store, or column-store) of table data. This parameter cannot be modified once it is set.

      Options:
      • TIMESERIES indicates that the data is stored in time series.
      • COLUMN indicates that the data is stored in columns.
      • ROW indicates that table data is stored in rows.
    Default value: ROW
    • COMPRESSION

      Specifies the compression level of the table data. It determines the compression ratio and time. Generally, a higher compression level indicates a higher compression ratio and a longer compression time, and vice versa. The actual compression ratio depends on the distribution characteristics of loading table data.

      Options:
      • The valid values for time series tables and column-store tables are YES/NO and LOW/MIDDLE/HIGH, and the default is LOW.
      • The valid values for row-store tables are YES and NO, and the default is NO.
    • COMPRESSLEVEL

      Specifies table data compression rate and duration at the same compression level. This divides a compression level into sub-levels, providing you with more choices for compression ratio and duration. As the value becomes greater, the compression rate becomes higher and duration longer at the same compression level. The parameter is only valid for time series tables and column-store tables.

      Value range: 0 to 3. The default value is 0.

    • MAX_BATCHROW

      Specifies the maximum number of rows in a storage unit during data loading. The parameter is only valid for time series tables and column-store tables.

      Value range: 10000 to 60000

      Default value: 60000

    • PARTIAL_CLUSTER_ROWS

      Specifies the number of records to be partially clustered for storage during data loading. The parameter is only valid for time series tables and column-store tables.

      Value range: 600000 to 2147483647

    • enable_delta

      Specifies whether to enable delta tables in time series tables. The parameter is only valid for time series tables and column-store tables.

      Default value: on

    • SUB_PARTITION_COUNT

      Specifies the number of level-2 partitions in a time series table. This parameter specifies the number of level-2 partitions during data import. This parameter is configured during table creation and cannot be modified after table creation. You are not advised to set the default value, which may affect the import and query performance.

      Value range: 1 to 1024. The default value is 32.

    • DELTAROW_THRESHOLD

      Specifies the maximum number of rows (SUB_PARTITION_COUNT * DELTAROW_THRESHOLD) to be imported to the delta table when a time series table is imported. This parameter is valid only if enable_delta has been enabled. The parameter is only valid for time series tables and column-store tables.

      Value range: 0 to 60000

      Default value: 10000

    • COLVERSION

      Specifies the version of a storage format. The parameter is only valid for time series tables and column-store tables. You cannot switch between different storage formats in time series tables. The time series table supports only version 2.0.

      Options:

      1.0: Each column in a column-store table is stored in a separate file. The file name is relfilenode.C1.0, relfilenode.C2.0, relfilenode.C3.0, or similar.

      2.0: All the columns of a time series or column-store table are combined and stored in a file. The file is named relfilenode.C1.0.

      Default value: 2.0

    • TTL

      Schedules the partition deletion tasks in a time series table. By default, partitions are not deleted.

      Value range:

      1 hour ~ 100 years

    • PERIOD

      Schedules the tasks to create partitions in a time series table. If TTL has been configured, PERIOD cannot be greater than TTL.

      Value range:

      1 hour to 100 years. The default value is 1 day.

  • TABLESPACE tablespace_name

    Specifies the tablespace where the new table is created. If not specified, default tablespace is used.

  • DISTRIBUTE BY

    Specifies how the table is distributed or replicated between DNs.

    Options:

    HASH (column_name): Each row of the table will be placed into all the DNs based on the hash value of the specified column.

    By default, the time series table is distributed based on all tag columns.

  • TO { GROUP groupname | NODE ( nodename [, ... ] ) }

    TO GROUP specifies the Node Group in which the table is created. Currently, it cannot be used for HDFS tables. TO NODE is used for internal scale-out tools.

  • PARTITION BY

    Specifies the initial partition of a time series table. The partition keys of the time series table must be in the TSTIME column.

  • TTL indicates the data storage period of a table. Data that exceeds the TTL period will be deleted. PERIOD indicates that data is partitioned by time. The partition size may affect the query performance. In this partitioning mode, a partition will be created at the interval specified by PERIOD. The values of TTL and PERIOD are of the interval type, for example, 1 hour, 1 day, 1 week, 1 month, 1 year, and 1 month 2 day 3 hour.
  • orientation of storage_parameter specifies the storage mode. The key-value storage is supported only when orientation is set to timeseries.
  • You do not need to manually specify DISTRIBUTE BY and PARTITION BY for a time series table. By default, data is distributed based on all tag columns, the TSTIME is used as the partition key, and a partitioned table with the automatic partition management function is created.

Examples

Create a simple time series table.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
CREATE TABLE IF NOT EXISTS CPU(
scope_name text TSTag,
server_ip text TSTag,
group_path text TSTag,
time timestamptz TSTime,
 idle numeric TSField,
system numeric TSField,
 util numeric TSField,
vcpu_num numeric TSField,
guest numeric TSField,
iowait numeric TSField,
 users numeric TSField) with (orientation=TIMESERIES) distribute by hash(scope_name);

CREATE TABLE CPU1(
idle numeric TSField,
IO numeric TSField,
scope text TSTag,
IP text TSTag,
time timestamp TSTime
) with (TTL='7 days', PERIOD='1 day', orientation=TIMESERIES);

CREATE TABLE CPU2 (LIKE CPU INCLUDING ALL);