Select a synchronized table on the left and perform operations on its columns. The operations include order by, key columns, distributed by, partition by, data_model, buckets, replication_num, and enable_persistent_index. Multiple operations are separated by semicolons (;).
For details about the syntax, see
Table 3.
Table 3 Operation syntax
Operation Type |
Syntax |
order by |
order by (column1, column2) or order by column1,column2 |
key columns |
key columns (column1, column2) or key columns column1,column2 |
distributed by |
distributed by (column1, column2) buckets 3
NOTE:
buckets is optional. If it is not set, the default value is used.
|
partition by |
There are expression partitions and list partitions. For details, see the partition syntax example. |
data_model |
Specifies the table type. The value can be primary key, duplicate key, or unique key.
Syntax:
data_model=primary key, data_model=duplicate key, or data_model=unique key |
replication_num |
replication_num=3
NOTE:
The value cannot exceed the number of BE nodes, or the verification fails.
|
enable_persistent_index |
Specifies whether to make the index persistent. Syntax:
enable_persistent_index=true or enable_persistent_index=false |
Combined scenario |
data_model=duplicate key;key columns column1, column2; |
Partition syntax example:
You only need to set a partition expression (time function expression or lis expression) when creating a table. During data import, an HTAP instance automatically creates partitions based on the data and the rule defined in the partition expression.
Partitioning based on a time function expression: If data is often queried and managed based on a continuous date range, you only need to specify a partition column of the date type (DATE or DATETIME) and a partition granularity (year, month, day, or hour) in the time function expression. An HTAP instance automatically creates partitions and sets the start time and end time of the partitions based on the imported data and partition expression.
Syntax:
PARTITION BY expression
...
[ PROPERTIES( 'partition_live_number' = 'xxx' ) ]
expression ::=
{ date_trunc ( <time_unit> , <partition_column> ) |
time_slice ( <partition_column> , INTERVAL <N> <time_unit> [ , boundary ] ) }
Table 4 Parameter description
Parameter |
Mandatory |
Description |
expression |
Yes |
Currently, only the date_trunc and time_slice functions are supported. If you use time_slice, you do not need to configure the boundary parameter because this parameter can only be set to floor by default. |
time_unit |
Yes |
Partition granularity. Currently, the value can only be hour, day, month, or year. If the partition granularity is hour, the partition columns can only be of the DATETIME type. |
partition_column |
Yes |
Partition column.
- Only the date type (DATE or DATETIME) is supported. If date_trunc is used, the partition column can be of the DATE or DATETIME type. If time_slice is used, the partition column can only be of the DATETIME type. The value of the partition column can be NULL.
- If the partition column is of the DATE type, the value range is from 0000-01-01 to 9999-12-31. If the partition column is of the DATETIME type, the value range is from 0000-01-01 01:01:01 to 9999-12-31 23:59:59.
- Currently, only one partition column can be specified.
|
Example: If you often query data by day, you can use the partition expression date_trunc (), set the partition column to event_day, and set the partition granularity to day during table creation. In this way, data is automatically partitioned based on dates when being imported. Data of the same day is stored in the same partition. Partition pruning can significantly improve queries.
CREATE TABLE site_access1 (
event_day DATETIME NOT NULL,
site_id INT DEFAULT '10',
city_code VARCHAR(100),
user_name VARCHAR(32) DEFAULT '',
pv BIGINT DEFAULT '0'
)
DUPLICATE KEY(event_day, site_id, city_code, user_name)
PARTITION BY date_trunc('day', event_day)
DISTRIBUTED BY HASH(event_day, site_id);
Partitioning based on the column expression: If you often query and manage data based on enumerated values, you only need to specify the column representing the type as the partition column. An HTAP instance automatically divides and creates partitions based on the partition column value of the imported data.
Syntax:
PARTITION BY expression
...
[ PROPERTIES( 'partition_live_number' = 'xxx' ) ]
expression ::=
( <partition_columns> )
partition_columns ::=
<column>, [ <column> [,...] ]
Table 5 Parameter description
Parameter |
Mandatory |
Description |
partition_columns |
Yes |
Partition columns.
- The value can be a Character (BINARY is not supported), Date, Integer, or Boolean value. The value cannot be NULL.
- After the import, a partition automatically created can contain only one value of each partition column. If multiple values of each partition column need to be contained, use list partitioning.
|
Example: If you often query the equipment room billing details by date range and city, you can use a partition expression to specify the date and city as the partition columns when creating a table. In this way, data of the same date and city is grouped into the same partition, and partition pruning can be used to significantly accelerate queries.
CREATE TABLE t_recharge_detail1 (
id bigint,
user_id bigint,
recharge_money decimal(32,2),
city varchar(20) not null,
dt varchar(20) not null
)
DUPLICATE KEY(id)
PARTITION BY (dt,city)
DISTRIBUTED BY HASH(`id`);
List partitioning
Data is partitioned based on a list of enumerated values that you explicitly define. You need to explicitly list the enumerated values contained in each list partition, and the values do not need to be consecutive.
List partitioning is suitable for storing columns where there are a small number of enumerated values and querying and managing data based on the enumerated values. For example, a column indicates a geographical location, status, or category. Each value of a column represents an independent category. Data is partitioned based on the enumerated values of columns to improve query performance and data management. List partitioning is especially suitable for scenarios where a partition needs to contain multiple values of each partition column. For example, the city column in a table indicates the city that an individual is from, and you often query and manage data by state and city. You can use the city column as the partition column for list partitioning when creating a table, and specify that data of multiple cities in the same state is stored in the same partition PARTITION pCalifornia VALUES IN ("Los Angeles","San Francisco","San Diego"), this feature accelerates queries and data management.
Partitions must be created during table creation. Partitions cannot be automatically created during data import. If the table does not contain the partitions corresponding to the data, an error is reported.
Syntax:
PARTITION BY LIST (partition_columns)(
PARTITION <partition_name> VALUES IN (value_list)
[, ...]
)
partition_columns::=
<column> [,<column> [, ...] ]
value_list ::=
value_item [, value_item [, ...] ]
value_item ::=
{ <value> | ( <value> [, <value>, [, ...] ] ) }
Table 6 Parameter description
Parameter |
Mandatory |
Description |
partition_columns |
Yes |
Partition columns.
The value can be a Character (except BINARY), Date (DATE and DATETIME), Integer, or Boolean value. The value cannot be NULL. |
partition_name |
Yes |
Partition name.
You are advised to set proper partition names to distinguish data categories in different partitions. |
value_list |
Yes |
List of enumerated values of partition columns in a partition. |
Example 1: If you often query the equipment room billing details by state or city, you can specify the city column as the partition column and specify that the cities in each partition belong to the same state. In this way, you can quickly query data of a specific state or city and manage data by state or city.
CREATE TABLE t_recharge_detail2 (
id bigint,
user_id bigint,
recharge_money decimal(32,2),
city varchar(20) not null,
dt varchar(20) not null
)
DUPLICATE KEY(id)
PARTITION BY LIST (city) (
PARTITION pCalifornia VALUES IN ("Los Angeles","San Francisco","San Diego"), --: These cities belong to the same state.
PARTITION pTexas VALUES IN ("Houston","Dallas","Austin")
)
DISTRIBUTED BY HASH(`id`);
Example 2: If you often query the equipment room billing details by date range and state or city, you can specify the date and city as the partition columns when creating a table. In this way, data of a specific date and a specific state or city is grouped into the same partition, to accelerate queries and data management.
CREATE TABLE t_recharge_detail4 (
id bigint,
user_id bigint,
recharge_money decimal(32,2),
city varchar(20) not null,
dt varchar(20) not null
) ENGINE=OLAP
DUPLICATE KEY(id)
PARTITION BY LIST (dt,city) (
PARTITION p202204_California VALUES IN (
("2022-04-01", "Los Angeles"),
("2022-04-01", "San Francisco"),
("2022-04-02", "Los Angeles"),
("2022-04-02", "San Francisco")
),
PARTITION p202204_Texas VALUES IN (
("2022-04-01", "Houston"),
("2022-04-01", "Dallas"),
("2022-04-02", "Houston"),
("2022-04-02", "Dallas")
)
)
DISTRIBUTED BY HASH(`id`);