Help Center/ GaussDB(DWS)/ Best Practices/ Data Development/ Cutting Partition Maintenance Costs for the E-commerce and IoT Industries by Leveraging Automatic Partition Management Feature
Updated on 2025-09-08 GMT+08:00

Cutting Partition Maintenance Costs for the E-commerce and IoT Industries by Leveraging Automatic Partition Management Feature

Scenarios

For partition tables whose partition columns are time, the automatic partition management function can be added to automatically create partitions and delete expired partitions, reducing partition table maintenance costs and improving query performance. To facilitate data query and maintenance, the time column is often used as the partition column of a partitioned table that stores time-related data, such as e-commerce order information and real-time IoT data. When the time-related data is imported to a partitioned table, the table should have partitions of the corresponding time ranges. Common partition tables do not automatically create new partitions or delete expired partitions. Therefore, maintenance personnel need to periodically create new partitions and delete expired partitions, leading to increased O&M costs.

Addressing this, DWS introduces the automatic partition management feature. You can set the table-level parameters period and ttl to enable the automatic partition management function, which automatically creates partitions and deletes expired partitions, reducing partitioned table maintenance costs and improving query performance.

The automatic partition management also works with both time-based and non-time-based columns like INT, BIGINT, VARCHAR, and TEXT. This expands the function's usefulness and flexibility.

period: interval for automatically creating partitions. The default value is 1 day. The value range is 1 hour ~ 100 years.

ttl: time for automatically eliminate partitions. The value range is 1 hour ~ 100 years. Partition elimination occurs when nowtime - Partition boundary > ttl, resulting in the removal of qualifying partitions.

time_format:

If the partition column is VARCHAR, TEXT, INT, or BIGINT, set the table-level parameter time_format to specify the time format. This tells the system how to parse the time value in the partition column for automatic partition management. You can set time_format only if the partition key is INT4, INT8, VARCHAR, or TEXT, and a period is specified.

Here are the time_format options and restrictions for each type of partition column:

Table 1 Time format supported by the VARCHAR/TEXT type

Format

Description

Sample Input

YYYY

Use four digits for the year (0000-9999).

2024

MM

Use two digits for the month (01-12).

05

DD

Use two digits for the date (01–31).

17

HH24

Use two digits for the hour (00–23).

14

MI

Use two digits for the minute (00–59).

32

SS

Use two digits for the second (00–59).

45

  • The precision can be accurate to seconds.
  • The entered content cannot contain letters, such as MONTH, AM, and PM.
  • The time format must be in descending order, for example, YYYYMMDDHH24MISS.
Table 2 Time formats supported by the INT/BIGINT type

Format

Description

Sample Input

YYYY

Use four digits for the year (0000-9999).

2024

MM

Use two digits for the month (01-12).

05

DD

Use two digits for the date (01–31).

17

HH24

Use two digits for the hour (00–23).

14

  • The precision can be accurate to hours.
  • The input content cannot contain non-numeric elements.
  • The time format must be in descending order, for example, YYYYMMDDHH24.

Automatic Partition Creation Rules

  • Automatic partition creation

    One or more partitions are automatically created at the interval specified by period to make the maximum partition boundary time greater than nowTime + 30 x period. As long as there is an automatically created partition, real-time data will not fail to be imported within the next 30 periods.

    Figure 1 Automatic partition creation
  • Automatically deleting expired partitions

    Partitions whose boundary time is earlier than nowTime-ttl are considered expired partitions. The automatic partition management function traverses all partitions and deletes expired partitions after each period. If all partitions are expired partitions, the system retains one partition and truncates the table.

Constraints

When using the partition management function, ensure that the following requirements are met:
  • It cannot be used on midrange servers or acceleration clusters.
  • It can be used in clusters of version 8.1.3 or later.
  • It can only be used for row-store range partitioned tables, column-store range partitioned tables, time series tables, and cold and hot tables.
  • The partition key must be unique. The supported data types include TIMESTAMP, TIMESTAMPTZ, DATE, and INT, BIGINT, VARCHAR, and TEXT added in 9.1.0.200.
  • The maxvalue partition is not supported.
  • The value of (nowTime - boundaryTime)/period must be less than the maximum number of partitions. nowTime indicates the current time, and boundaryTime indicates the earliest partition boundary time.
  • The values of period and ttl range from 1 hour to 100 years. In addition, in a database compatible with Teradata or MySQL, if the partition key type is date, the value of period cannot be less than 1day.
  • The table-level parameter ttl cannot exist independently. You must set period in advance or at the same time, and the value of ttl must be greater than or equal to that of period.
  • During online cluster scale-out, partitions cannot be automatically added. Partitions reserved each time partitions are added will ensure that services are not affected.
  • The time_format option cannot be modified using SET. When period is reset (indicating that automatic partitioning is disabled and a message is displayed), you can reset this option.

Creating an ECS

For details, see Purchasing an ECS. After purchasing an ECS, log in to the ECS by referring to Logging In to a Linux ECS.

When creating an ECS, ensure that the ECS is in the same region, AZ, and VPC subnet as the stream data warehouse. Select the OS used by the gsql client (CentOS 7.6 is used as an example) as the ECS OS, and select using passwords to log in.

Creating a cluster

  1. Create a cluster on the DWS console. For details, see Creating a DWS Storage-Compute Coupled Cluster.

Using the gsql CLI Client to Connect to a Cluster

  1. Remotely log in to the Linux server where gsql is to be installed as user root, and run the following command in the Linux command window to download the gsql client:

    1
    wget https://obs.ap-southeast-1.myhuaweicloud.com/dws/download/dws_client_8.1.x_redhat_x64.zip --no-check-certificate
    

  2. Decompress the client.

    1
    cd <Path_for_storing_the_client> unzip dws_client_8.1.x_redhat_x64.zip
    

    Where,

    • <Path_for_storing_the_client>: Replace it with the actual path.
    • dws_client_8.1.x_redhat_x64.zip: This is the client tool package name of RedHat x64. Replace it with the actual name.

  3. Configure the DWS client.

    1
    source gsql_env.sh
    

    If the following information is displayed, the gsql client is successfully configured:

    1
    All things done.
    

  4. Use the gsql client to connect to a DWS database (using the password you defined when creating the cluster).

    1
    gsql -d gaussdb -p 8000 -h 192.168.0.86 -U dbadmin -W password -r
    

    If the following information is displayed, the connection succeeded:

    1
    gaussdb=>
    

Automatic partition management

The partition management function is bound to the table-level parameters period and ttl. Automatic partition creation is enabled with the enabling of period, and automatic partition deletion is enabled with the enabling of ttl. 30 seconds after period or ttl is set, the automatic partition creation or deletion works for the first time.

You can enable the partition management function in either of the following ways:

  • Specify period and ttl when creating a table.

    This way is applicable when you create a partition management table. There are two syntaxes for creating a partition management table. One specifies partitions, and the other does not.

    If partitions are specified when a partition management table is created, the syntax rules are the same as those for creating a common partitioned table. The only difference is that the syntax specifies the table-level parameters period and ttl.

    The following example shows how to create a partition management table CPU1 and specify partitions.

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    -- Time type
    CREATE TABLE CPU1(
        id integer,
        IP text,
        time timestamp
    ) with (TTL='7 days',PERIOD='1 day')
    partition by range(time)
    (
        PARTITION P1 VALUES LESS THAN('2023-02-13 16:32:45'),
        PARTITION P2 VALUES LESS THAN('2023-02-15 16:48:12')
    );
    
    -- INT type
    CREATE TABLE CPU1(
        id integer,
        IP text,
        time integer
    ) with (TTL='7 days',PERIOD='1 day', TIME_FORMAT='YYYYMMDD')
    partition by range(time)
    (
        PARTITION P1 VALUES LESS THAN('20230213'),
        PARTITION P2 VALUES LESS THAN('20230215')
    );
    
    -- VARCHAR type
    CREATE TABLE CPU1(
        id integer,
        IP text,
        time varchar
    ) with (TTL='7 days',PERIOD='1 day', TIME_FORMAT='YYYY-MM-DD HH24:MI:SS')
    partition by range(time)
    (
        PARTITION P1 VALUES LESS THAN('2023-02-13 16:32:45'),
        PARTITION P2 VALUES LESS THAN('2023-02-15 16:48:12')
    );
    

    For partitioned tables with INT, BIGINT, VARCHAR, or TEXT types, if the automatic partitioning feature is on, the system adds missing partitions within the TTL range. This is based on the TTL setting and the existing minimum partition boundary (min_bound).

    The rules depend on the relationship between min_bound and the current time (cur_time), and are categorized as follows:

    Condition

    Description

    min_bound > cur_time + 29 * period

    The current minimum partition size is sufficient. The system sees no need for additional partitions and will not create them automatically.

    min_bound > cur_time and min_bound < cur_time + 29 * period

    The system advances partitions until the earliest partition boundary is before cur_time - ttl.

    min_bound < cur_time and min_bound > cur_time - ttl

    The system advances partitions until the earliest partition boundary is before cur_time - ttl.

    min_bound < cur_time - ttl

    The current minimum partition is outside the TTL range and will soon be removed. Thus, it will not be moved forward.

    • cur_time indicates the current system time.
    • period indicates the period of automatic partitioning.
    • The automatic completion logic ensures that partitions exist completely in a valid time window, facilitating data import and query.

    When creating a partition management table, you can specify only the partition key but not partitions. In this case, two default partitions will be created with period as the partition time range. The boundary time of the first default partition is the first hour, day, week, month, or year past the current time. The time unit is selected based on the maximum unit of PERIOD. The boundary time of the second default partition is the boundary time of the first partition plus PERIOD. Assume that the current time is 2023-02-17 16:32:45, and the boundary of the first default partition is described in the following table.

    Table 3 Description of the period parameter

    period

    Maximum PERIOD Unit

    Boundary of First Default Partition

    1hour

    Hour

    2023-02-17 17:00:00

    1day

    Day

    2023-02-18 00:00:00

    1month

    Month

    2023-03-01 00:00:00

    13months

    Year

    2024-01-01 00:00:00

    For INT, BIGINT, VARCHAR, and TEXT partition tables, if no partition is set, the system automatically creates initial partitions when the automatic partition management is turned on. The creation rules are as follows:

    • Add two partitions forward (compared with the current time).
    • Add ttl/period partitions backward (calculated based on the lifecycle ttl and partition period).

    Run the following command to create the partition management table CPU2 with no partitions specified:

    1
    2
    3
    4
    5
    6
    CREATE TABLE CPU2(
        id integer,
        IP text,
        time timestamp
    ) with (TTL='7 days',PERIOD='1 day')
    partition by range(time);
    
  • Run the ALTER TABLE RESET command to set period and ttl.

    This method is used to add the partition management function to an ordinary partitioned table that meets the partition management constraints.

    • Run the following command to create an ordinary partition table CPU3:
       1
       2
       3
       4
       5
       6
       7
       8
       9
      10
      11
      12
      13
      14
      15
      16
      17
      18
      19
      20
      21
      22
      23
      -- Time type
      CREATE TABLE CPU3(
          id integer,
          IP text,
          time timestamp
      ) 
      partition by range(time)
      (
          PARTITION P1 VALUES LESS THAN('2023-02-14 16:32:45'),
          PARTITION P2 VALUES LESS THAN('2023-02-15 16:56:12')
      );
      
      -- VARCHAR type
      CREATE TABLE CPU3(
          id integer,
          IP text,
          time varchar
      ) 
      partition by range(time)
      (
          PARTITION P1 VALUES LESS THAN('2023-02-13 16:32:45'),
          PARTITION P2 VALUES LESS THAN('2023-02-15 16:48:12')
      );
      
    • To enable the automatic partition creation and deletion functions, run the following command:
      1
      ALTER TABLE CPU3 SET (PERIOD='1 day',TTL='7 days');
      
    • To enable only the automatic partition creation function, run the following command:
      1
      ALTER TABLE CPU3 SET (PERIOD='1 day');
      
    • To enable only the automatic partition deletion function, run the following command (If automatic partition creation is not enabled in advance, the operation will fail):
      1
      ALTER TABLE CPU3 SET (TTL='7 days');
      
    • Modify the period and ttl parameters to modify the partition management function.
      1
      ALTER TABLE CPU3 SET (TTL='10 days',PERIOD='2 days');
      
  • Disabling the partition management function

    You can run the ALTER TABLE RESET command to delete the table-level parameters period and ttl to disable the partition management function.

    • The period cannot be deleted separately with TTL.
    • The time series table does not support ALTER TABLE RESET.
    • Run the following command to disable the automatic partition creation and deletion functions:
      1
      ALTER TABLE CPU1 RESET (PERIOD,TTL);
      
    • To disable only the automatic partition deletion, run the following command:
      1
      ALTER TABLE CPU3 RESET (TTL);
      
    • To disable only the automatic partition creation function, run the following command (If the table contains the ttl parameter, the operation will fail):
      1
      ALTER TABLE CPU3 RESET (PERIOD);
      
    • For partition tables of the INT, BIGINT, VARCHAR, and TEXT types, disable the TIME_FORMAT option as prompted.
      1
      ALTER TABLE CPU3 RESET (TIME_FORMAT);