Partitioned Tables

Overview

Database table partitioning is a database optimization technology. It can physically divide a large table into multiple smaller parts to improve query performance. Each part is called a partition, and these partitions can be stored on different storage devices.

Advantages:
- Improved query performance: Reducing the scanned data size significantly enhances query performance.
- Optimized storage: Distributing partitions across various storage media helps balance performance and costs.
- Improved maintainability: Maintenance operations such as data cleanup and index rebuild for partitioned tables can be carried out at the partition level, minimizing the impact on the overall system.
- Improved concurrency: Partitioned tables enable parallel processing of multiple partitions, resulting in improved concurrency. For instance, multiple queries can access different partitions simultaneously without causing interference.
Disadvantages:
- Memory usage: A partitioned table typically consumes around (Number of partitions × 3/1,024) MB of memory. If there are too many partitions causing memory shortages, performance may decline significantly.
- Complexity of partitioning policies: Technical knowledge and experience are required to develop and implement appropriate partitioning policies. Selecting an inappropriate partitioning policy can lead to uneven data distribution, thereby impacting performance.
- Complexity of backup and restoration: Although partitions can be backed up and restored separately, more detailed backup policies and management are required.

Precautions

Usage scenarios
- High query performance: When a table contains a significant amount of data and certain data features are frequently accessed in a particular scenario, you can reduce the scanned data size during queries to enhance query performance. This is particularly useful for tables that are regularly analyzed on a monthly, quarterly, or yearly basis.
- Balance between performance and costs: When a table contains a significant amount of data, it is advisable to store cold data (infrequently accessed data) on low-cost storage, while keeping hot data (frequently accessed data) on high-performance storage.
- Large table management: Tables containing a significant amount of data may need to be stored across multiple storage media.
Partitioning key selection:
Selecting partition keys for partitioned tables is a critical design decision as it directly affects the performance, maintainability, and data management efficiency of the database.
- Query optimization: Specify frequently queried columns as partition keys. For instance, if a table is often queried by date, it is advisable to select the date column as the partition key.
- Data distribution: Consider the distribution of data when selecting partition keys. This helps prevent situations where some partitions store an excessive amount of data while others store only a small amount.
- Partition quantity and management: Limit the number of partitions. Creating excessive partitions can lead to increased management complexity and performance decline.

Partitioned Table Types

GaussDB supports range partitioned tables, list partitioned tables, and hash partitioned tables.

Range partitioned table: Data in different ranges is mapped to different partitions. The range is determined by the partition key specified during the partitioned table creation. It is often used for time series data, such as partitioning by date, month, or year.
List partitioned table: Data is allocated to different partitions based on the partition key values. The key values contained in a partition are specified when the partitioned table is created. This applies to well-classified data, such as order status, device type, or region code.
Hash partitioned table: Data is mapped to different partitions based on the hash algorithm. The number of partitions is specified when the partitioned table is created. It applies to scenarios where data needs to be evenly distributed to balance loads.

Examples

-- Example 1 of a range partitioned table.
gaussdb=# CREATE TABLE test_range1(
    id INT, 
    info VARCHAR(20)
) PARTITION BY RANGE (id) (
    PARTITION p1 VALUES LESS THAN (100), 
    PARTITION p2 VALUES LESS THAN (200),
    PARTITION p3 VALUES LESS THAN (300),
    PARTITION pmax VALUES LESS THAN (MAXVALUE)
);
NOTICE:  The 'DISTRIBUTE BY' clause is not specified. Using 'id' as the distribution column by default.
HINT:  Please use 'DISTRIBUTE BY' clause to specify suitable data distribution column.
CREATE TABLE

-- Example 2 of a range partitioned table.
gaussdb=# CREATE TABLE test_range2(
    id INT, 
    info VARCHAR(20)
) PARTITION BY RANGE (id) (
    PARTITION p1 START(1) END(600) EVERY(200),    
    PARTITION p2 START(600) END(800),
    PARTITION pmax START(800) END(MAXVALUE)
);
NOTICE:  The 'DISTRIBUTE BY' clause is not specified. Using 'id' as the distribution column by default.
HINT:  Please use 'DISTRIBUTE BY' clause to specify suitable data distribution column.
CREATE TABLE

-- List partitioned table
gaussdb=# CREATE TABLE test_list ( NAME VARCHAR ( 50 ), area VARCHAR ( 50 ) ) 
PARTITION BY LIST (area) (
    PARTITION p1 VALUES ('Beijing'),
    PARTITION p2 VALUES ('Shanghai'),
    PARTITION p3 VALUES ('Guangzhou'),
    PARTITION p4 VALUES ('Shenzhen'),
    PARTITION pdefault VALUES (DEFAULT)
);
NOTICE:  The 'DISTRIBUTE BY' clause is not specified. Using 'name' as the distribution column by default.
HINT:  Please use 'DISTRIBUTE BY' clause to specify suitable data distribution column.
CREATE TABLE

-- Example 1 of a hash partitioned table.
gaussdb=# CREATE TABLE test_hash1(c1 int) PARTITION BY HASH(c1) PARTITIONS 3;
NOTICE:  The 'DISTRIBUTE BY' clause is not specified. Using 'c1' as the distribution column by default.
HINT:  Please use 'DISTRIBUTE BY' clause to specify suitable data distribution column.
CREATE TABLE

-- Example 2 of a hash partitioned table.
gaussdb=# CREATE TABLE test_hash2(c1 int) PARTITION BY HASH(C1)(
    PARTITION pa,
    PARTITION pb,
    PARTITION pc
);
NOTICE:  The 'DISTRIBUTE BY' clause is not specified. Using 'c1' as the distribution column by default.
HINT:  Please use 'DISTRIBUTE BY' clause to specify suitable data distribution column.
CREATE TABLE

-- Drop tables.
gaussdb=# DROP TABLE test_range1;
DROP TABLE
gaussdb=# DROP TABLE test_range2;
DROP TABLE
gaussdb=# DROP TABLE test_list;
DROP TABLE
gaussdb=# DROP TABLE test_hash1;
DROP TABLE
gaussdb=# DROP TABLE test_hash2;
DROP TABLE