Updated on 2025-03-13 GMT+08:00

Column Design Specifications

  • [Rule] Use recommended data types for column design.

    Recommended data types must be used for column design. Some data types are not recommended because they apply to limited service scenarios and are not used on a large scale for commercial purposes.

    Table 1 Best practices of database data types

    Data Type

    Description

    Recommended or Not

    UUID

    Different clusters may generate the same UUID.

    Prohibited

    Serial integer

    Auto-increment column, including SMALLSERIAL, SERIAL, and BIGSERIAL.

    Prohibited

    Integer

    TINYINT, SMALLINT, INTEGER, and BIGINT

    Recommended

    Arbitrary-precision

    NUMERIC/DECIMAL

    Recommended

    Floating-point

    REAL/FLOAT4, DOUBLE PRECISION/FLOAT8, and FLOAT

    Recommended

    Boolean

    Boolean

    Recommended

    Fixed-length character

    CHAR(n)

    Recommended

    Variable-length character

    VARCHAR(n) and NVARCHAR2(n)

    Recommended

    TEXT and CLOB (character large object)

    Not recommended

    Time

    DATE, TIME, TIMESTAMP, SMALLDATETIME, INTERVAL, and REALTIME

    Recommended

    TIMETZ and TIMESTAMPTZ

    Not recommended

    Binary

    BYTEA (variable-length binary)

    Recommended

    BLOB (binary large object) and RAW (variable-length hexadecimal string)

    Not recommended

    Bit string

    BIT(n) and VARBIT(n)

    Recommended

    Special character

    NAME and "CHAR" are usually used within the database system.

    Not recommended

    JSON

    JSON data does not support operators.

    Not recommended

    HLL

    You are advised to use the HLL functions to reduce the impact on performance.

    Not recommended

    Currency

    The MONEY type stores a currency amount with fixed fractional precision.

    Not recommended

    Geometric

    POINT, LSEG, BOX, PATH, POLYGON, and CIRCLE

    Not recommended

    Network address

    Stores IPv4 and MAC addresses.

    Not recommended

  • [Rule] Use the most specific numeric data types. If all of the following numeric types provide the required service precision, they are recommended in descending order of priority: integer, floating point, and NUMERIC.
  • [Rule] Properly set the data type of a numeric column based on the value range, and use the NUMERIC or DECIMAL type as less as possible.

    NUMERIC and DECIMAL are equivalent. NUMERIC or DECIMAL data operations consume great CPU resources.

    Table 2 Storage space and value range of numeric data types

    Type

    Storage Size (Unit: Byte)

    Minimum Value

    Maximum Value

    TINYINT

    1

    0

    255

    SMALLINT

    2

    -32768

    32767

    INTEGER

    4

    -2,147,483,648

    2,147,483,647

    BIGINT

    8

    -9,223,372,036,854,775,808

    9,223,372,036,854,775,807

    REAL/FLOAT4

    4

    6-bit decimal digits

    DOUBLE PRECISION/FLOAT8

    8

    15-bit decimal digits

  • [Rule] Select a proper string type. If the value of a column must be a fixed-length character, use fixed-length character types or automatically add spaces. Otherwise, use the variable-length character type VARCHAR.

    For a typical fixed-length column, for example, gender, you can enter only f or m that occupies a byte. You are advised to use the fixed-length data type (for example, CHAR(n)) for this type of columns.

    If such requirement does not exist or longer characters may be required for future expansion, use variable-length character types (such as VARCHAR and TEXT) preferentially. You are advised not to specify the length of variable-length characters.

    The reasons are as follows:

    • For fixed-length columns, the input data that is shorter than the fixed length will be padded with space characters and then be saved to the database. This wastes the storage space in the database.
    • For fixed-length character types, the entire table needs to be scanned and rewritten if the length needs to be extended later. This causes high performance overhead and affects online services.
    • For a variable-length column with a fixed length, the system checks whether the length exceeds the limit each time upon data insertion. This causes performance overhead.
  • [Rule] Do not store data of the numeric type in columns of the character type.

    If numeric calculation or comparison (for example, adding a filter condition) is performed on data stored in columns of the character type, unnecessary overhead will be caused due to data type conversion, and the column indexes may become invalid, affecting query performance.

  • [Rule] Do not store data of the time or date type in columns of the character type.

    If calculation or comparison (for example, adding a filter condition) with data of the time or date type is performed on data stored in columns of the character type, unnecessary overhead will be caused by data type conversion, and the column indexes may become invalid, affecting query performance.

  • [Rule] Add NOT NULL constraints to columns that never have NULL values.

    In certain scenarios, the optimizer may specially optimize NOT NULL columns to improve query performance.

  • [Rule] Use the same data type for joined columns.

    If the column types are inconsistent during a join operation, overhead will be caused by data type conversion.

  • [Rule] The number of large fields (such as varchar (1000) and varchar (4000)) is not to exceed eight.
  • [Recommendation] When defining a column, you are advised to create a comment for the column to facilitate subsequent maintenance.

    For details about the description, value range, and usage of different types of fields, see Data Type.

  • [Recommendation] In tables that are logically related, columns having the same meaning should use the same data type.
  • [Recommendation] For string data, you are advised to use variable-length strings and specify the maximum length. To avoid truncation, ensure that the specified maximum length is greater than the maximum number of characters to be stored. You are advised not to use CHAR(n), BPCHAR(n), NCHAR(n), or CHARACTER(n), unless you know that the string length is fixed.
  • [Recommendation] Add NOT NULL constraints to columns that are used for WHERE filtering and join operations.

    In certain scenarios, the optimizer may specially optimize NOT NULL columns to greatly improve query performance.

  • [Recommendation] Do not reserve columns for a table. In most cases, you can quickly add or delete table columns, or change the default values of columns.

    An added column must meet the following requirements. Otherwise, the entire table is updated, leading to additional overheads and affecting online services.

    1. The data type is BOOLEAN, BYTEA, SMALLINT, BIGINT, SMALLINT, INTEGER, NUMERIC, FLOAT, DOUBLE PRECISION, CHAR, VARCHAR, TEXT, TIMESTAMPTZ, TIMESTAMP, DATE, TIME, TIMETZ, or INTERVAL.
    2. The length of the default value cannot exceed 128 bytes.
    3. The default value of the added column does not contain the volatile function.
    4. The default value is required and cannot be NULL.

    If you are not sure whether the third condition is met, contact GaussDB technical support for evaluation.