Help Center/ GaussDB(DWS)/ Best Practices/ Advanced Features/ Best Practices for Automatic Partition Management
Updated on 2024-03-13 GMT+08:00

Best Practices for Automatic Partition Management

Scenarios

For partition tables whose partition columns are time, the automatic partition management function can be added to automatically create partitions and delete expired partitions, reducing partition table maintenance costs and improving query performance. To facilitate data query and maintenance, the time column is often used as the partition column of a partitioned table that stores time-related data, such as e-commerce order information and real-time IoT data. When the time-related data is imported to a partitioned table, the table should have partitions of the corresponding time ranges. Common partition tables do not automatically create new partitions or delete expired partitions. Therefore, maintenance personnel need to periodically create new partitions and delete expired partitions, leading to increased O&M costs.

Addressing this, GaussDB(DWS) introduces the automatic partition management feature. You can set the table-level parameters period and ttl to enable the automatic partition management function, which automatically creates partitions and deletes expired partitions, reducing partition table maintenance costs and improving query performance.

period: interval for automatically creating partitions. The default value is 1 day. The value range is 1 hour ~ 100 years.

ttl: time for automatically eliminate partitions. The value range is 1 hour ~ 100 years. The partition elimination policy is based on the condition that nowtime - partition boundary > ttl. Partitions that meet this condition will be eliminated.

  • Automatic partition creation

    One or more partitions are automatically created at the interval specified by period to make the maximum partition boundary time greater than nowTime + 30 x period. As long as there is an automatically created partition, real-time data will not fail to be imported within the next 30 periods.

    Figure 1 Automatic partition creation
  • Automatically deleting expired partitions

    Partitions whose boundary time is earlier than nowTime-ttl are considered expired partitions. The automatic partition management function traverses all partitions and deletes expired partitions after each period. If all partitions are expired partitions, the system retains one partition and truncates the table.

Constraints

When using the partition management function, ensure that the following requirements are met:
  • It cannot be used on midrange servers, acceleration clusters, or stand-alone clusters.
  • It can be used in clusters of version 8.1.3 or later.
  • It can only be used for row-store range partitioned tables, column-store range partitioned tables, time series tables, and cold and hot tables.
  • The partition key must be unique and its type must be timestamp, timestamptz, or date.
  • The maxvalue partition is not supported.
  • The value of (nowTime - boundaryTime)/period must be less than the maximum number of partitions. nowTime indicates the current time, and boundaryTime indicates the earliest partition boundary time.
  • The values of period and ttl range from 1 hour to 100 years. In addition, in a database compatible with Teradata or MySQL, if the partition key type is date, the value of period cannot be less than 1day.
  • The table-level parameter ttl cannot exist independently. You must set period in advance or at the same time, and the value of ttl must be greater than or equal to that of period.
  • During online cluster scale-out, partitions cannot be automatically added. Partitions reserved each time partitions are added will ensure that services are not affected.

Creating an ECS

For details, see "Creating an ECS" in the lastic Cloud Server User Guide. When the ECS is created, log in to the ECS. For details, see "Remotely Logging In to a Linux ECS Using a Password (SSH)".

When creating an ECS, ensure that the ECS is in the same region, AZ, and VPC subnet as the stream data warehouse. Select the OS used by the gsql client (CentOS 7.6 is used as an example) as the ECS OS, and select using passwords to log in.

Creating a cluster

  1. Log in to the Huawei Cloud management console.
  2. Choose Service List > Analytics > Data Warehouse Service. On the page that is displayed, click Create Cluster in the upper right corner.
  3. Configure the parameters according to Table 1.

    Table 1 Software configuration

    Parameter

    Configuration

    Region

    Select EU-Dublin.

    NOTE:

    EU-Dublin is used as an example. You can select other regions as required. Ensure that all operations are performed in the same region.

    AZ

    AZ2

    Product

    Standard data warehouse

    CPU Architecture

    X86

    Node Flavor

    dws2.m6.4xlarge.8 (16 vCPUs | 128 GB | 2000 GB SSD)

    NOTE:

    If this flavor is sold out, select other AZs or flavors.

    Nodes

    3

    Cluster Name

    dws-demo

    Administrator Account

    dbadmin

    Administrator Password

    -

    Confirm Password

    -

    Database Port

    8000

    VPC

    vpc-default

    Subnet

    subnet-default(192.168.0.0/24)

    Security Group

    Automatic creation

    EIP

    Buy now

    Bandwidth

    1Mbit/s

    Advanced Settings

    Default

  4. Confirm the information, click Next, and then click Submit.
  5. Wait about 6 minutes. After the cluster is created, click next to the cluster name. On the displayed cluster information page, record the value of Public Network Address.

Using the gsql CLI Client to Connect to a Cluster

  1. Remotely log in to the Linux server where gsql is to be installed as user root, and run the following command in the Linux command window to download the gsql client:

    1
    wget https://obs.eu-west-101.myhuaweicloud.com/dws/download/dws_client_8.1.x_redhat_x64.zip --no-check-certificate
    

  2. Decompress the client.

    1
    cd <Path_for_storing_the_client> unzip dws_client_8.1.x_redhat_x64.zip
    

    Where,

    • <Path_for_storing_the_client>: Replace it with the actual path.
    • dws_client_8.1.x_redhat_x64.zip: This is the client tool package name of RedHat x64. Replace it with the actual name.

  3. Configure the GaussDB(DWS) client.

    1
    source gsql_env.sh
    

    If the following information is displayed, the gsql client is successfully configured:

    1
    All things done.
    

  4. Use the gsql client to connect to a GaussDB(DWS) database (using the password you defined when creating the cluster).

    1
    gsql -d gaussdb -p 8000 -h 192.168.0.86 -U dbadmin -W password -r
    

    If the following information is displayed, the connection succeeded:

    1
    gaussdb=>
    

Automatic partition management

The partition management function is bound to the table-level parameters period and ttl. Automatic partition creation is enabled with the enabling of period, and automatic partition deletion is enabled with the enabling of ttl. 30 seconds after period or ttl is set, the automatic partition creation or deletion works for the first time.

You can enable the partition management function in either of the following ways:

  • Specify period and ttl when creating a table.

    This way is applicable when you create a partition management table. There are two syntaxes for creating a partition management table. One specifies partitions, and the other does not.

    If partitions are specified when a partition management table is created, the syntax rules are the same as those for creating an ordinary partition table. The only difference is that the syntax specifies the table-level parameters period and ttl.

    The following example shows how to create a partition management table CPU1 and specify partitions.

     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    CREATE TABLE CPU1(
        id integer,
        IP text,
        time timestamp
    ) with (TTL='7 days',PERIOD='1 day')
    partition by range(time)
    (
        PARTITION P1 VALUES LESS THAN('2023-02-13 16:32:45'),
        PARTITION P2 VALUES LESS THAN('2023-02-15 16:48:12')
    );
    

    When creating a partition management table, you can specify only the partition key but not partitions. In this case, two default partitions will be created with period as the partition time range. The boundary time of the first default partition is the first hour, day, week, month, or year past the current time. The time unit is selected based on the maximum unit of PERIOD. The boundary time of the second default partition is the boundary time of the first partition plus PERIOD. Assume that the current time is 2023-02-17 16:32:45, and the boundary of the first default partition is described in the following table.

    Table 2 Description of the period parameter

    period

    Maximum PERIOD Unit

    Boundary of First Default Partition

    1hour

    Hour

    2023-02-17 17:00:00

    1day

    Day

    2023-02-18 00:00:00

    1month

    Month

    2023-03-01 00:00:00

    13months

    Year

    2024-01-01 00:00:00

    Run the following command to create the partition management table CPU2 with no partitions specified:

    1
    2
    3
    4
    5
    6
    CREATE TABLE CPU2(
        id integer,
        IP text,
        time timestamp
    ) with (TTL='7 days',PERIOD='1 day')
    partition by range(time);
    
  • Run the ALTER TABLE RESET command to set period and ttl.

    This method is used to add the partition management function to an ordinary partitioned table that meets the partition management constraints.

    • Run the following command to create an ordinary partition table CPU3:
       1
       2
       3
       4
       5
       6
       7
       8
       9
      10
      CREATE TABLE CPU3(
          id integer,
          IP text,
          time timestamp
      ) 
      partition by range(time)
      (
          PARTITION P1 VALUES LESS THAN('2023-02-14 16:32:45'),
          PARTITION P2 VALUES LESS THAN('2023-02-15 16:56:12')
      );
      
    • To enable the automatic partition creation and deletion functions, run the following command:
      1
      ALTER TABLE CPU3 SET (PERIOD='1 day',TTL='7 days');
      
    • To enable only the automatic partition creation function, run the following command:
      1
      ALTER TABLE CPU3 SET (PERIOD='1 day');
      
    • To enable only the automatic partition deletion function, run the following command (If automatic partition creation is not enabled in advance, the operation will fail):
      1
      ALTER TABLE CPU3 SET (TTL='7 days');
      
    • Modify the period and ttl parameters to modify the partition management function.
      1
      ALTER TABLE CPU3 SET (TTL='10 days',PERIOD='2 days');
      
  • Disabling the partition management function

    You can run the ALTER TABLE RESET command to delete the table-level parameters period and ttl to disable the partition management function.

    • The period cannot be deleted separately with TTL.
    • The time series table does not support ALTER TABLE RESET. ​
    • Run the following command to disable the automatic partition creation and deletion functions:
      1
      ALTER TABLE CPU1 RESET (PERIOD,TTL);
      
    • To disable only the automatic partition deletion, run the following command:
      1
      ALTER TABLE CPU3 RESET (TTL);
      
    • To disable only the automatic partition creation function, run the following command (If the table contains the ttl parameter, the operation will fail):
      1
      ALTER TABLE CPU3 RESET (PERIOD);