CREATE TABLE_MapReduce Service

Syntax

CREATE [EXTERNAL]^1,2 TABLE [IF NOT EXISTS] [catalog_name.][db_name.]table_name (

{ [col_name data_type [COMMENT col_comment] [ WITH³ ( property_name = expression [, ...] )³ ]

| LIKE³ existing_table_name [ {INCLUDING | EXCLUDING}³ PROPERTIES]³ } [, ...])

[[PARTITIONED BY¹(col_name data_type, ....)] [SORT BY¹ ([column [, column ...]])] ]¹

[COMMENT 'table_comment']

[ WITH ( property_name = expression [, ...] ) ]³

[[PARTITIONED BY² (col_name data_type, .....)] [CLUSTERED BY² (col_name, col_name, ...) [SORTED BY²³ (col_name, col_name, ...)] INTO num_buckets BUCKETS]² ]²

[[STORED AS file_format]

[LOCATION 'hdfs_path']

[TBLPROPERTIES (orc_table_property = value [, ...] ) ] ]^1,2

Remarks

Expressions or keywords with the same sequence number 1, 2, or 3) can be used at the same time. Expressions or keywords with different sequence numbers cannot be used at the same time.

bucket_count can be set for the session attribute. The default value is -1, indicating that bucket_count is not set. During table creation, if bucket_count is set to -1 and buckets is not set in the table creation statement, the default value 16 is used.
The default storage location of external tables is /user/hive/warehouse.
Setting "transactional=true" enables tables to support atomicity, consistency, isolation, and durability (ACID). However, after a table is defined as a transaction table, it cannot be degraded to a non-transaction table by setting "transactional=false".
When transactional='true' or '0' is executed, type conversion will not be performed. Therefore, the following exception is thrown:

Cannot convert ['true'] to boolean

Cannot convert ['0'] to boolean
By default, data cannot be inserted into a managed table whose property external is set to true. To use this function, add custom Hive property hive.non-managed-table-writes-enabled=true. For details, see Precautions.
The MPPDB has a restriction that the maximum length of a database identifier is 63. If the identifier name exceeds the maximum length, the excess part is automatically truncated and only the identifier with the maximum length is left.
Table creation is not supported in cross-domain scenarios.

Description

Creates a new empty table with specified columns by using CREATE TABLE. Use the CREATE TABLE AS statement to create a table with data. When the optional parameter IF NOT EXISTS is used, no error is reported if the table already exists.

The WITH clause can be used to set properties for a newly created table or a single column, such as the storage location of the table and whether the table is an external table.

The LIKE clause is used to include all column definitions from an existing table in a new table. You can specify multiple LIKE clauses to allow columns to be copied from multiple tables. If INCLUDING PROPERTIES is specified, all table properties are copied to the new table. If the attribute name specified in the WITH clause is the same as the copied attribute name, the value in the WITH clause is used. By default, the EXCLUDING PROPERTIES attribute is used. You can specify the INCLUDING PROPERTIES attribute for only one table.

PARTITIONED BY can be used to specify the column of a partition. CLUSTERED BY can be used to specify columns for buckets. SORT BY and SORTED BY can be used to sort specified bucket columns. BUCKETS can be used to specify the number of buckets. EXTERNAL can be used to create foreign tables. STORED AS can be used to specify the file storage format. LOCATION can be used to specify the storage path in HDFS.

To view the table properties supported by the current data source, run the following command:

SELECT * FROM system.metadata.table_properties;

The following table lists the query result when the catalog is hive.

**Table 1** Querying results
Catalog Name	Attribute Name	Default Value	Items	Description
hive	avro_schema_url	-	varchar	URI pointing to Avro schema for the table
hive	bucket_count	0	integer	Number of buckets
hive	bucketed_by	[]	array(varchar)	Bucketing columns
hive	bucketing_version	-	integer	Bucketing version
hive	csv_escape	-	varchar	CSV escape character
hive	csv_quote	-	varchar	CSV quote character
hive	csv_separator	-	varchar	CSV separator character
hive	external_location	-	varchar	File system location URI for external table
hive	format	ORC	varchar	Hive storage format for the table. Possible values: [ORC, PARQUET, AVRO, RCBINARY, RCTEXT, SEQUENCEFILE, JSON, TEXTFILE, CSV]
hive	orc_compress	GZIP	varchar	Compression codec used. Possible values: [NONE, SNAPPY, LZ4, ZSTD, GZIP, ZLIB]
hive	orc_compress_size	262144	bigint	orc compression size
hive	orc_row_index_stride	10000	integer	no. of row index strides
hive	orc_stripe_size	67108864	bigint	orc stripe size
hive	orc_bloom_filter_columns	[]	array(varchar)	ORC Bloom filter index columns
hive	orc_bloom_filter_fpp	0.05	double	ORC Bloom filter false positive probability
hive	partitioned_by	[]	array(varchar)	Partition columns
hive	sorted_by	[]	array(varchar)	Bucket sorting columns
hive	textfile_skip_footer_line_count	-	integer	Number of footer lines
hive	textfile_skip_header_line_count	-	integer	Number of header lines
hive	transactional	false	boolean	Is transactional property enabled
systemremote	configs	configurations	varchar	configurations for current query
systemremote	init_query_id	initqueryid	varchar	init query id
systemremote	initiator_format	initiator format	varchar	column metadata from initiator

Example

To create a new table orders and use the WITH clause to specify the storage format, storage location, and whether the table is an external table.

The auto.purge parameter can be used to specify whether to clear related data when data removal operations (such as DROP, DELETE, INSERT OVERWRITE, and TRUNCATE TABLE) are performed.

If it is set to true, metadata and data files are cleared.
If it is set to false, only metadata is cleared and data files are moved to the HDFS trash bin. The default value is false. You are advised not to change the value. Otherwise, deleted data cannot be restored.

CREATE TABLE orders (
orderkey bigint,
orderstatus varchar,
totalprice double,
orderdate date
)
WITH (format = 'ORC', location='/user',orc_compress='ZLIB',external=true, "auto.purge"='false');

-- You can run the DESC FORMATTED statement to view details about table creation.
desc formatted  orders ;
                           Describe Formatted Table                           
------------------------------------------------------------------------------
 # col_name      data_type      comment                                       
 orderkey      bigint                                                         
 orderstatus      varchar                                                     
 totalprice      double                                                       
 orderdate      date                                                          

 # Detailed Table Information                                                 
 Database:                   default                                          
 Owner:                      admintest                                        
 LastAccessTime:             0                                                
 Location:                   hdfs://hacluster/user                            
 Table Type:                 EXTERNAL_TABLE                                   

 # Table Parameters:                                                          
        EXTERNAL                TRUE                                                     
        auto.purge              false                                                     
        orc.compress.size       262144                                                   
        orc.compression.codec   ZLIB                                                    
        orc.row.index.stride    10000                                                    
        orc.stripe.size         67108864                                                 
        presto_query_id         20220812_084110_00050_srknk@default@HetuEngine           
        presto_version          1.2.0-h0.cbu.mrs.320.r1-SNAPSHOT                         
        transient_lastDdlTime   1660293670                                              

 # Storage Information                                                        
 SerDe Library:              org.apache.hadoop.hive.ql.io.orc.OrcSerde        
 InputFormat:                org.apache.hadoop.hive.ql.io.orc.OrcInputFormat  
 OutputFormat:               org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat 
 Compressed:                 No                                               
 Num Buckets:                -1                                               
 Bucket Columns:             []                                               
 Sort Columns:               []                                               
 Storage Desc Params:                                                         
        serialization.format    1                                                        
(1 row)

If the orders table does not exist, create the orders table and add table comments and column comments:

CREATE TABLE IF NOT EXISTS new_orders (
orderkey bigint,
orderstatus varchar,
totalprice double COMMENT 'Price in cents.',
orderdate date
)
COMMENT 'A table to keep track of orders.';
insert into new_orders values
(202011181113,'online',9527,date '2020-11-11'),
(202011181114,'online',666,date '2020-11-11'),
(202011181115,'online',443,date '2020-11-11'),
(202011181115,'offline',2896,date '2020-11-11');

Create the bigger_orders table using the column definition of the orders table:

CREATE TABLE bigger_orders (
another_orderkey bigint,
LIKE orders,
another_orderdate date
);

SHOW CREATE TABLE bigger_orders ;
                            Create Table                             
---------------------------------------------------------------------
 CREATE TABLE hive.default.bigger_orders (                           
    another_orderkey bigint,
    orderkey bigint,
    orderstatus varchar,
    totalprice double,
    ordersdate date,                                             
    another_orderdate date                                           
 )                                                                   
 WITH (                                                              
    external = false,                                                
    format = 'ORC',                                                  
    location = 'hdfs://hacluster/user/hive/warehouse/bigger_orders', 
    orc_compress = 'GZIP',                                           
    orc_compress_size = 262144,                                      
    orc_row_index_stride = 10000,                                    
    orc_stripe_size = 67108864                                       
 )                                                                   
(1 row)

¹ Example of creating a table:

CREATE EXTERNAL TABLE hetu_test (orderkey bigint, orderstatus varchar, totalprice double, orderdate date) PARTITIONED BY(ds int) SORT BY (orderkey, orderstatus) COMMENT 'test' STORED AS ORC LOCATION '/user' TBLPROPERTIES (orc_compress = 'SNAPPY', orc_compress_size = 6710422, orc_bloom_filter_columns = 'orderstatus,totalprice');

² Example of creating a table:

CREATE EXTERNAL TABLE hetu_test1 (orderkey bigint, orderstatus varchar, totalprice double, orderdate date) COMMENT 'test' PARTITIONED BY(ds int) CLUSTERED BY (orderkey, orderstatus) SORTED BY (orderkey, orderstatus) INTO 16 BUCKETS STORED AS ORC LOCATION '/user'  TBLPROPERTIES (orc_compress = 'SNAPPY', orc_compress_size = 6710422, orc_bloom_filter_columns = 'orderstatus,totalprice');

³Example of creating a table:

CREATE TABLE hetu_test2 (orderkey bigint, orderstatus varchar, totalprice double, orderdate date, ds int) COMMENT 'This table is in Hetu syntax' WITH (partitioned_by = ARRAY['ds'], bucketed_by = ARRAY['orderkey', 'orderstatus'], sorted_by = ARRAY['orderkey', 'orderstatus'], bucket_count = 16, orc_compress = 'SNAPPY', orc_compress_size = 6710422, orc_bloom_filter_columns = ARRAY['orderstatus', 'totalprice'], external = true, format = 'orc', location = '/user');

Table Configuration

The following tables can be configured for HetuEngine to connect to the Hive data source:

**Table 2** Table configuration of the Hive data source
Catalog Name	Parameter in the ORC Table	Default Value	Items	Description
hive	orc_compress	GZIP	varchar	High-level compression algorithm (one of NONE, ZLIB, SNAPPY, GZIP, ZSTD, LZ4)
hive	orc_compress_size	262144	long	Number of bytes in each compressed block
hive	orc_stripe_size	67108864	long	Number of bytes in each stripe
hive	orc_row_index_stride	10000	long	Number of rows between index entries
hive	orc_bloom_filter_columns	''	varchar	A comma-separated list of column names for which a Bloom filter should be created
hive	orc_bloom_filter_fpp	0.05	double	Positive probability of the Bloom filter (must be greater than 0.0 and less than 1.0)

Example:

CREATE TABLE format1(orderkey bigint, orderstatus varchar, totalprice double, orderdate date)  TBLPROPERTIES (orc_compress = 'SNAPPY', orc_compress_size = 6710422, orc_bloom_filter_columns = 'orderstatus,totalprice');

CREATE TABLE format2(orderkey bigint, orderstatus varchar, totalprice double, orderdate date) TBLPROPERTIES (orc_compress = 'LZ4', orc_stripe_size = 33554432, orc_bloom_filter_fpp = 0.1, orc_row_index_stride=5000,  orc_compress_size = 6710422, orc_bloom_filter_columns = 'orderstatus,totalprice');

View the configuration parameters of the table:

Show create table format2;
                             Create Table
-----------------------------------------------------------------------
 CREATE TABLE hive.default.format2 (
    orderkey bigint,
    orderstatus varchar,
    totalprice double,
    orderdate date
 )
 WITH (
    external = false,
    format = 'ORC',
    location = '/user/hive/warehouse/format2',
    orc_bloom_filter_columns = ARRAY['orderstatus','totalprice'],
    orc_bloom_filter_fpp = 1E-1,
    orc_compress = 'LZ4',
    orc_compress_size = 6710422,
    orc_row_index_stride = 5000,
    orc_stripe_size = 33554432
 )
(1 row)

Create a partitioned table.

--Create a schema.
CREATE SCHEMA hive.web WITH (location = 'hdfs://hacluster/user');
--Create a partitioned table.
CREATE TABLE hive.web.page_views (
  view_time timestamp,
  user_id bigint,
  page_url varchar,
  ds date,
  country varchar
)
WITH (
  format = 'ORC',
  partitioned_by = ARRAY['ds', 'country'],
  bucketed_by = ARRAY['user_id'],
  bucket_count = 50
);
--Insert an empty partition.
CALL system.create_empty_partition(
    schema_name => 'web',
    table_name => 'page_views',
    partition_columns => ARRAY['ds', 'country'],
    partition_values => ARRAY['2020-07-17', 'US']);
    
CALL system.create_empty_partition(
    schema_name => 'web',
    table_name => 'page_views',
    partition_columns => ARRAY['ds', 'country'],
    partition_values => ARRAY['2020-07-18', 'US']);
    
--View the partition.
SELECT * FROM hive.web."page_views$partitions";
     ds     | country 
------------|---------
 2020-07-18 | US      
 2020-07-17 | US    
--Insert data.
insert into hive.web.page_views values(timestamp '2020-07-17 23:00:15',bigint '15141','www.local.com',date '2020-07-17','US' );
insert into hive.web.page_views values(timestamp '2020-07-18 23:00:15',bigint '18148','www.local.com',date '2020-07-18','US' );

--Query data.
 select * from hive.web.page_views;
        view_time        | user_id |   page_url    |     ds     | country 
-------------------------|---------|---------------|------------|---------
 2020-07-17 23:00:15.000 |   15141 | www.local.com | 2020-07-17 | US      
2020-07-18 23:00:15.000 |   18148 | www.local.com | 2020-07-18 | US

CREATE TABLE