Updated on 2025-07-22 GMT+08:00

CREATE TABLE AS

Function

CREATE TABLE AS creates a table based on the results of a query. It is applicable to data migration or replication scenarios.

CREATE TABLE AS creates a table and fills the table with SELECT query results. The table columns have the names and data types associated with the output columns of SELECT Except that you can override the SELECT output column names by giving an explicit list of new column names.

CREATE TABLE AS queries once the source table and writes data in the new table. The query result view changes when the source table changes. In contrast, a view re-evaluates its defining SELECT statement whenever it is queried.

Precautions

  • This command cannot be used to create a partitioned table.
  • If an error occurs when you create a table, after the system is recovered, the system probably cannot automatically clear the created disk file whose size is not 0. This problem seldom occurs.
  • The UNLOGGED table uses no primary/standby mechanism. In the case of system faults or abnormal breakpoints, data loss may occur. Therefore, the UNLOGGED table cannot be used to store basic data.
  • The UNLOGGED table does not ensure safe data. Users can back up data before using the table; for example, users should back up the data before a system upgrade.

Syntax

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
CREATE [ [ GLOBAL | LOCAL | VOLATILE ] { TEMPORARY | TEMP } | UNLOGGED ] TABLE table_name
    [ (column_name [, ...] ) ]
    [ WITH ( {storage_parameter = value} [, ... ] ) ]
    [ COMPRESS | NOCOMPRESS ]
    
    [ DISTRIBUTE BY { REPLICATION | ROUNDROBIN | { [HASH ] ( column_name ) } } ]
    
    [ COMMENT [=] 'text' ]
    AS query
    [ WITH [ NO ] DATA ];

Parameter Description

Table 1 CREATE TABLE AS parameters

Parameter

Description

Value Range

GLOBAL | LOCAL | VOLATILE

Specifies the type of a temporary table.

For details, see Parameter Description.

TEMPORARY | TEMP

If TEMP or TEMPORARY is specified, the created table is a temporary table.

-

UNLOGGED

Specifies that the table is created as an unlogged table. Data written to unlogged tables is not written to the write-ahead log, which makes them considerably faster than ordinary tables. However, they are not crash-safe: an unlogged table is automatically truncated after a crash or unclean shutdown. The contents of an unlogged table are also not replicated to standby servers. Any indexes created on an unlogged table are automatically unlogged as well.

If data is missing in the indexes of unlogged tables due to some unexpected operations such as an unclean shutdown, users should re-create the indexes with errors.

Starting from version 9.1.0, UNLOGGED tables are automatically saved in the pg_unlogged tablespace and cannot be moved or assigned to other tablespaces. After an earlier version is upgraded to 9.1.0, the UNLOGGED table created in the earlier version is still stored in the original tablespace.

If the instance restarts unexpectedly, the UNLOGGED table will be reset, which can impact the instance's recovery time objective (RTO). Starting from version 9.1.0, there is a script called switch_unlogged_tablepace.py that can move unlogged tables to optimize the recovery time objective (RTO). This script works together with the GUC parameter enable_unlogged_tablespace_compat.

table_name

Specifies the name of the table to be created.

A string, which must comply with the naming convention.

column_name

Specifies the name of a column to be created in the new table.

A string, which must comply with the naming convention.

WITH ( storage_parameter [= value] [, ... ] )

The WITH clause sets the storage options for a table or its index.

For details about the parameters, see Table 2.

COMPRESS | NOCOMPRESS

  • If the keyword COMPRESS is specified, the compression feature is triggered when data is inserted into tables in batches. If this feature is enabled, a scan is performed for all tuple data within the page to generate a dictionary and then the tuple data is compressed and stored.
  • If NOCOMPRESS is specified, the table is not compressed.

Default value: NOCOMPRESS, tuple data is not compressed before storage.

-

DISTRIBUTE BY

Specifies how the table is distributed or replicated between DNs.

  • REPLICATION: Each row in the table exists on all DNs, that is, each DN has complete table data.
  • ROUNDROBIN: Each row in the table is sent to each DN in sequence. This distribution policy prevents data skew. However, data distribution nodes are random. As a result, there is a higher probability that table redistribution is triggered during computing. This distribution policy is recommended for large tables with severe column skew. This value is supported only in 8.1.2 or later.
  • HASH (column_name): Each row of the table will be placed into all the DNs based on the hash value of the specified column.

Default value: determined by the GUC parameter default_distribution_mode For details about how to select the default value of DISTRIBUTE BY, see the notice below this table.

The following data types can be used as distribution columns:
  • Integer types: TINYINT, SMALLINT, INT, BIGINT, and NUMERIC/DECIMAL
  • Character types: CHAR, BPCHAR, VARCHAR, VARCHAR2, NVARCHAR2, and TEXT
  • Date/time types: DATE, TIME, TIMETZ, TIMESTAMP, TIMESTAMPTZ, INTERVAL, and SMALLDATETIME

COMMENT [=] 'text'

The COMMENT clause can specify table comments during table creation.

-

AS query

Indicates a SELECT or VALUES command, or an EXECUTE command that runs a prepared SELECT, or VALUES query.

-

[ WITH [ NO ] DATA ]

Specifies whether the data produced by the query should be copied into the new table. By default, the data is copied. If the NO parameter is used, the data is not copied.

-

DISTRIBUTE BY parameter description:

  • When DISTRIBUTE BY HASH (column_name) is specified, the primary key and its unique index must contain the column_name column.
  • When DISTRIBUTE BY HASH (column_name) in a referenced table is specified, the foreign key of the reference table must contain the column_name column.
  • When default_distribution_mode is set to roundrobin, the default value of DISTRIBUTE BY is selected according to the following rules:
    1. If the primary key or unique constraint is included during table creation, hash distribution is selected. The distribution column is the column corresponding to the primary key or unique constraint.
    2. If the primary key or unique constraint is not included during table creation, round-robin distribution is selected.
  • When default_distribution_mode is set to hash, the default value of DISTRIBUTE BY is selected according to the following rules:
    1. If the primary key or unique constraint is included during table creation, hash distribution is selected. The distribution column is the column corresponding to the primary key or unique constraint.
    2. If the primary key or unique constraint is not included during table creation but there are columns whose data types can be used as distribution columns, hash distribution is selected. The distribution column is the first column whose data type can be used as a distribution column.
    3. If the primary key or unique constraint is not included during table creation and no column whose data type can be used as a distribution column exists, round-robin distribution is selected.
Table 2 storage_parameter parameters

storage_parameter

Parameter

Description

Value Range

FILLFACTOR

The fillfactor of a table is a percentage between 10 and 100. When a smaller fillfactor is specified, INSERT operations pack table pages only to the indicated percentage. The remaining space on each page is reserved for updating rows on that page. This gives UPDATE a chance to place the updated copy of a row on the same page, which is more efficient than placing it on a different page. For a table whose records are never updated, setting the fillfactor to 100 (complete packing) is the appropriate choice, but in heavily updated tables smaller fillfactors are appropriate.

The default value is 100 (filled to capacity).

The parameter is only valid for row–store tables.

10~100

ORIENTATION

Specifies the storage mode (row-store, column-store) for table data. This parameter cannot be modified once it is set.

Default value: ROW (row-store)

  • ROW indicates that table data is stored in rows.

    ROW applies to OLTP service, which has many interactive transactions. An interaction involves many columns in the table. Using ROW can improve the efficiency.

  • COLUMN indicates that the data is stored in columns.

    COLUMN applies to the data warehouse service, which has a large amount of aggregation computing, and involves a few column operations.

COMPRESSION

Specifies the compression level of the table data. It determines the compression ratio and time. Generally, the higher the level of compression, the higher the ratio, the longer the time, and the lower the level of compression, the lower the ratio, the shorter the time. The actual compression ratio depends on the distribution characteristics of loading table data.

NOTE:

Currently, row-store table compression is not supported.

This parameter is available only for column-store tables. Value: LOW (default value), MIDDLE, or HIGH.

MAX_BATCHROW

Specifies the maximum of a storage unit during data loading process. The parameter is only valid for column-store tables.

Default value: 60000

10000 to 60000

PARTIAL_CLUSTER_ROWS

Specifies the number of records to be partial cluster stored during data loading process. The parameter is only valid for column-store tables.

Default value: 4,200,000

600000 to 2147483647

enable_delta

Specifies whether to enable delta tables in column-store tables. The parameter is only valid for column-store tables.

Using column-store tables with delta tables is not recommended. This may cause disk bloat and performance deterioration due to delayed merge.

Default value: off

on or off

COLVERSION

Specifies the version of the column-store format. You can switch between different storage formats.

Default value: 2.0

NOTE:

When creating a column-store table, set COLVERSION to 2.0. Compared with the 1.0 storage format, the performance is significantly improved:

  1. The time required for creating a column-store wide table is significantly reduced.
  2. In the Roach data backup scenario, the backup time is significantly reduced.
  3. The build and catch up time is greatly reduced.

Valid value:

  • 1.0: Each column in a column-store table is stored in a separate file. The file name is relfilenode.C1.0, relfilenode.C2.0, relfilenode.C3.0, or similar.
  • 2.0: All columns of a column-store table are combined and stored in a file. The file is named relfilenode.C1.0.

SKIP_FPI_HINT

Indicates whether to skip the hint bits operation when the full-page writes (FPW) log needs to be written during sequential scanning.

Default value: false
NOTE:

If SKIP_FPI_HINT is set to true and the checkpoint operation is performed on a table, no Xlog will be generated when the table is sequentially scanned. This applies to intermediate tables that are queried less frequently, reducing the size of Xlogs and improving query performance.

true or false

Examples

Create the customer table.

1
2
3
4
5
6
7
8
9
DROP TABLE IF EXISTS customer;
CREATE TABLE customer
(    
    C_CUSTKEY     BIGINT NOT NULL CONSTRAINT C_CUSTKEY_pk PRIMARY KEY  , 
    C_NAME        VARCHAR(25)  , 
    C_ADDRESS     VARCHAR(40)  , 
    C_NATIONKEY   INT NOT NULL  CHECK (C_NATIONKEY > 0)  
)
DISTRIBUTE BY HASH(C_CUSTKEY);

Create the store_returns_t1 table and insert numbers that are greater than 4795 in the C_CUSTKEY column of the CUSTOMER table:

1
CREATE TABLE store_returns_t1 AS SELECT * FROM CUSTOMER WHERE C_CUSTKEY > 4795;

Copy store_returns_t1 to create the store_returns_t2 table:

1
CREATE TABLE store_returns_t2 AS table store_returns_t1;

Helpful Links

CREATE TABLE, SELECT