Help Center/ GaussDB/ Centralized_8.x/ Best Practices/ Best Practices of Table Design
Updated on 2024-06-03 GMT+08:00

Best Practices of Table Design

Using Partitioned Tables

Partitioning refers to splitting what is logically one large table into smaller physical pieces based on specific schemes. The table based on the logic is called a partitioned table, and a physical piece is called a partition. Data is stored on these physical partitions, instead of the logical partitioned table. A partitioned table has the following advantages over an ordinary table:

  1. High query performance: You can specify partitions when querying partitioned tables, improving query efficiency.
  2. High availability: If a certain partition in a partitioned table is faulty, data in the other partitions is still available.
  3. Easy maintenance: To fix a partitioned table having a faulty partition, you only need to fix the partition itself.
    GaussDB supports level-1 and level-2 partitioned tables. There are four types of level-1 partitioned tables: range partitioned tables, interval partitioned tables, list partitioned tables, and hash partitioned tables. Level-2 partitioned tables are composed of nine combinations of any two of the following: range partitioned tables, list partitioned tables, and hash partitioned tables.
    • Range partitioned table: Data within a certain range is mapped to each partition. The range is determined by the partition key specified when the partitioned table is created. This partitioning method is most commonly used. The partition key is usually a date. For example, sales data is partitioned by month.
    • Interval partitioned table: a special type of range partitioned tables. Compared with range partitioned tables, interval value definition is added. When no matching partition can be found for an inserted record, a partition can be automatically created based on the interval value.
    • List partitioned table: Key values contained in the data are stored in different partitions, and the data is mapped to each partition in sequence. The key values contained in the partitions are specified when the partitioned table is created.
    • Hash partitioned table: Data is mapped to each partition based on the internal hash algorithm. The number of partitions is specified when the partitioned table is created.
    • Level-2 partitioned table: a partitioned table obtained by randomly combining range partitioning, list partitioning, and hash partitioning. Both level-1 and level-2 partitions can be defined in the preceding three ways.

Table Compression Level

When creating a table, you can customize the compression level and compression ratio of fields. Compression affects not only data loading but also data query. The COMPRESSION parameter specifies the table compression level.

Parameter description:

COMPRESSION specifies the compression level of table data. It determines the compression ratio and time. Generally, the higher the level of compression, the higher the ratio, the longer the time; and the lower the level of compression, the lower the ratio, the shorter the time. The actual compression ratio depends on the distribution mode of table data loaded.

Value range:

  • Valid values for row-store tables are YES and NO, and the default is NO.

You can select different compression levels based on Table 1 in different scenarios.

Table 1 Application scenarios of compression levels

Compression Level

Application Scenario

Storage Model

YES

Enabling table compression: You are advised not to enable this function because the compression ratio of row-store tables is low.

Row store

NO

Disabling table compression.

Row store

Selecting a Data Type

Use the following principles to select efficient data types:

  1. Select data types that facilitate data calculation.

    Generally, the calculation of integers (including common comparison calculations, for example, =, >, <, >=, <=, and !=, as well as GROUP BY) is more efficient than that of strings and floating point numbers.

  2. Select data types with a short length.

    Data types with short length reduce both the data file size and the memory used for computing, improving the I/O and computing performance. For example, use SMALLINT instead of INT, and INT instead of BIGINT.

  3. Use the same data type for a join.

    You are advised to use the same data type for a join. To join columns with different data types, the database needs to convert them to the same type, which leads to additional performance overheads.