Updated on 2022-12-16 GMT+08:00

Data Partitioning

Description

Most database products partition data. In the DWS distributed system, data partitioning is a process of horizontally partitioning data on a node based on a specified policy. A table is divided into multiple partitions based on a specified range, and data in different partitions does not overlap.

GaussDB(DWS) supports range partitioning and list partitioning. In range partitioning, records are divided and inserted into multiple partitions of a table. Each partition stores data of a specific range (ranges in different partitions do not overlap). If you configure the PARTITION parameter when running the CREATE TABLE statement, data in the table will be partitioned.

Benefits

Table 1 uses an xDR scenario to describe the benefits provided after data is partitioned based on time slices.

Table 1 Partitioning benefits

Scenarios

Benefits

The rows frequently accessed in a table are located in one or a few partitions.

Significantly reduces search space and improves access performance.

Most partition records need to be queried or updated.

Significantly improves performance because only one partition rather than the whole table needs to be scanned.

Records that need to be loaded or deleted in batches are located in one or a few partitions.

Improves processing performance because related partitions can be directly read or deleted. Reduces de-fragmentation workloads because records can be deleted in batches.

Data partitioning provides the following benefits:

  • Improves manageability: Tables and indexes are divided into smaller and more manageable units, In this way, data management can be performed by partitions. Database administrators will perform maintenance in the designated area of the table.
  • Improves deletion performance: You can delete an entire partition rather than delete data row by row.

    The syntax for deleting a partitioned table and a common table is the same: DROP TABLE.

  • Improves query performance: You can restrict the volume of data to be checked or manipulated to make queries quicker.

    With partition pruning, also known as partition elimination, the CN filters out unexpected partitions and scans only the remaining partitions. Partition pruning greatly improves query performance.

    Intelligent partition connection: Partitioning can also improve the performance of multi-table joins by using a technique known as partition-wise joins. Partition-wise joins can be applied when two tables are joint and at least one of these tables is partitioned using a join key. Partition-wise joins break a large join into smaller joins of "identical" data sets. "Identical" here is defined as covering the same set of partitioning key values on both sides of the join, ensuring that only a join of these 'identical' data sets will produce a result and that other data sets do not have to be considered.