Updated on 2024-09-06 GMT+08:00

Column Compression

To reduce the storage occupied by data pages and costs, GaussDB(for MySQL) provides two algorithms ZLIB and ZSTD for fine-grained column compression. You can select either of them to compress large columns that are not frequently accessed based on compression ratio and performance. Automatic column compression is also supported.

Application scenario: There are large columns that are not frequently accessed in tables and users want to compress these columns to reduce costs.

Constraints

  • The kernel version of your GaussDB(for MySQL) instance must be 2.0.54.240600 or later.
  • Partitioned tables, temporary tables, and non-InnoDB engine tables are not supported.
  • A compressed column cannot contain an index (primary key, unique index, secondary index, foreign key, and full-text index).
  • Only the following data types are supported: BLOB (including TINYBLOB, BLOB, MEDIUMBLOB, and LONGBLOB) and TEXT (including TINYTEXT, TEXT, MEDIUMTEXT, and LONGTEXT). VARCHAR, and VARBINARY.
  • This feature cannot be used on generated columns.
  • EXCHANGE PARTITION cannot be executed between a partitioned table and a table with compressed columns.
  • IMPORT TABLESPACE is not supported.
  • This feature can be used only in CREATE TABLE, ALTER TABLE ADD, ALTER TABLE CHANGE, and ALTER TABLE MODIFY statements.
  • ALTER TABLE ADD COLUMN does not support the INSTANT algorithm. The INSTANT algorithm cannot be used when the ALTER TABLE {CHANGE|MODIFY} syntax involves data changes.
  • In automatic compression scenarios (rds_column_compression = 2), the compression attributes can be added only when the maximum length of a column is at least the compression threshold (rds_column_compression_threshold). In explicit compression scenarios (rds_column_compression = 1), if the maximum length of a column is less than the compression threshold, the compression attributes can be added but a warning message is received.
  • If a table contains compressed columns, NDP is not supported for compute pushdown.
  • When you manually perform binlog synchronization, ALTER statements are incompatible. You are advised to use HINT.
  • When you use DRS to migrate data from one instance to another that does not support column compression, the compression attribute is eliminated. The full migration task can be performed. During incremental migration, if ALTER statements contain compression columns, the migration task fails.
  • When physical backups are used to restore data, the related versions must support column compression.
  • If column compression has been used after the version upgrade, the version cannot be rolled back to a version without this feature.

Syntax

The column_definition definition is extended to support compression when column attributes are defined in CREATE TABLE, ALTER TABLE ADD, ALTER TABLE CHANGE, and ALTER TABLE MODIFY statements.

create_definition: {
    col_name column_definition
  | {INDEX | KEY} [index_name] [index_type] (key_part,...)
      [index_option] ...
  | {FULLTEXT | SPATIAL} [INDEX | KEY] [index_name] (key_part,...)
      [index_option] ...
  | [CONSTRAINT [symbol]] PRIMARY KEY
      [index_type] (key_part,...)
      [index_option] ...
  | [CONSTRAINT [symbol]] UNIQUE [INDEX | KEY]
      [index_name] [index_type] (key_part,...)
      [index_option] ...
  | [CONSTRAINT [symbol]] FOREIGN KEY
      [index_name] (col_name,...)
      reference_definition
  | check_constraint_definition
}

alter_option: {
    table_options
  | ADD [COLUMN] col_name column_definition
        [FIRST | AFTER col_name]
  | ADD [COLUMN] (col_name column_definition,...)
  | CHANGE [COLUMN] old_col_name new_col_name column_definition
        [FIRST | AFTER col_name]
  | MODIFY [COLUMN] col_name column_definition
        [FIRST | AFTER col_name]
  ...

column_definition is as follows:

column_definition: {
    data_type [NOT NULL | NULL] [DEFAULT {literal | (expr)} ]
      [VISIBLE | INVISIBLE]
      [AUTO_INCREMENT] [UNIQUE [KEY]] [[PRIMARY] KEY]
      [COMMENT 'string']
      [COLLATE collation_name]
      [COLUMN_FORMAT {FIXED | DYNAMIC | DEFAULT}]
      [COLUMN_FORMAT {FIXED|DYNAMIC|DEFAULT}|COMPRESSED[={ZLIB|ZSTD}**]]
      [ENGINE_ATTRIBUTE [=] 'string']
      [SECONDARY_ENGINE_ATTRIBUTE [=] 'string']
      [STORAGE {DISK | MEMORY}]
      [reference_definition]
      [check_constraint_definition]
  | data_type
      [COLLATE collation_name]
      [GENERATED ALWAYS] AS (expr)
      [VIRTUAL | STORED] [NOT NULL | NULL]
      [VISIBLE | INVISIBLE]
      [UNIQUE [KEY]] [[PRIMARY] KEY]
      [COMMENT 'string']
      [reference_definition]
      [check_constraint_definition]
}

Feature Parameter Description

Table 1 Parameter description

Parameter

Description

Value Range

Default Value

Level

Dynamic Validation

rds_column_compression

  • If this parameter is set to 0, column compression is disabled. Compressed columns cannot be created explicitly or automatically.
  • If this parameter is set to 1, compressed columns can only be created explicitly.
  • If this parameter is set to 2, compressed columns can be created explicitly or automatically.

[0,2]

0

GLOBAL

Yes

rds_default_column_compression_algorithm

Controls default compression algorithm for column compression. The algorithm is used when

  • You explicitly create a compressed column without specifying a specific compression algorithm.
  • A compressed column is created automatically.

ZLIB or ZSTD

ZLIB

GLOBAL

Yes

rds_column_compression_threshold

Controls the threshold for triggering column compression.

  • When the maximum length of a column is less than this threshold, a compressed column can be explicitly created, but a message is displayed indicating that the compressed column cannot be automatically created.
  • When the maximum length of a column is at least to the threshold, a compressed column can be created explicitly or automatically.

[20-4294967295]

100

GLOBAL

Yes

rds_zlib_column_compression_level

Specifies the compression level of the ZLIB column compression algorithm.

  • If this parameter is set to 0, columns are not compressed.
  • Setting this parameter to a value other than 0 will affect the compression speed and effectiveness. A smaller value indicates faster compression but a poorer effect, while a larger value indicates slower compression but a better effect.

[0,9]

6

GLOBAL

Yes

rds_zstd_column_compression_level

Specifies the compression level of the ZSTD column compression algorithm.

A smaller value indicates faster compression but a poorer effect, while a larger value indicates slower compression but a better effect.

[1,22]

3

GLOBAL

Yes

Example

  1. Explicitly create a compressed column.
    mysql> show variables like 'rds_column_compression';
    +------------------------+-------+
    | Variable_name          | Value |
    +------------------------+-------+
    | rds_column_compression | 1     |
    +------------------------+-------+
    1 row in set (0.00 sec)
    
    mysql> show variables like 'rds_default_column_compression_algorithm';
    +------------------------------------------+-------+
    | Variable_name                            | Value |
    +------------------------------------------+-------+
    | rds_default_column_compression_algorithm | ZLIB  |
    +------------------------------------------+-------+
    1 row in set (0.00 sec)
    
    mysql> create table t1(c1 varchar(100) compressed, c2 varchar(100) compressed=zlib, c3 varchar(100) compressed=zstd) default charset=latin1;
    Query OK, 0 rows affected (0.06 sec)
    
    mysql> show create table t1\G
    *************************** 1. row ***************************
    Table: t1
    Create Table: CREATE TABLE `t1` (
    `c1` varchar(100) /*!99990 800220201 COMPRESSED=ZLIB */ DEFAULT NULL,
    `c2` varchar(100) /*!99990 800220201 COMPRESSED=ZLIB */ DEFAULT NULL,
    `c3` varchar(100) /*!99990 800220201 COMPRESSED=ZSTD */ DEFAULT NULL
    ) ENGINE=InnoDB DEFAULT CHARSET=latin1
    1 row in set (0.00 sec)
  2. Automatically create a compressed column.
    mysql> set global rds_column_compression = 2;
    Query OK, 0 rows affected (0.00 sec)
    
    mysql> show variables like 'rds_column_compression';
    +------------------------+-------+
    | Variable_name          | Value |
    +------------------------+-------+
    | rds_column_compression | 2     |
    +------------------------+-------+
    1 row in set (0.01 sec)
    
    mysql> show variables like 'rds_column_compression_threshold';
    +----------------------------------+-------+
    | Variable_name                    | Value |
    +----------------------------------+-------+
    | rds_column_compression_threshold | 100   |
    +----------------------------------+-------+
    1 row in set (0.01 sec)
    
    mysql> show variables like 'rds_default_column_compression_algorithm';
    +------------------------------------------+-------+
    | Variable_name                            | Value |
    +------------------------------------------+-------+
    | rds_default_column_compression_algorithm | ZLIB  |
    +------------------------------------------+-------+
    1 row in set (0.01 sec)
    
    mysql> create table t2(c1 varchar(99), c2 varchar(100)) default charset=latin1;
    Query OK, 0 rows affected (0.05 sec)
    
    mysql> show create table t2\G
    *************************** 1. row ***************************
    Table: t2
    Create Table: CREATE TABLE `t2` (
    `c1` varchar(99) DEFAULT NULL,
    `c2` varchar(100) /*!99990 800220201 COMPRESSED=ZLIB */ DEFAULT NULL
    ) ENGINE=InnoDB DEFAULT CHARSET=latin1
    1 row in set (0.01 sec)
  3. Disable column compression.
    mysql> set global rds_column_compression = 0;
    Query OK, 0 rows affected (0.00 sec)
    
    mysql> show variables like 'rds_column_compression';
    +------------------------+-------+
    | Variable_name          | Value |
    +------------------------+-------+
    | rds_column_compression | 0     |
    +------------------------+-------+
    1 row in set (0.01 sec)
    
    mysql> show variables like 'rds_column_compression_threshold';
    +----------------------------------+-------+
    | Variable_name                    | Value |
    +----------------------------------+-------+
    | rds_column_compression_threshold | 100   |
    +----------------------------------+-------+
    1 row in set (0.01 sec)
    
    mysql> create table t3(c1 varchar(100) compressed, c2 varchar(100) compressed=zlib, c3 varchar(100) compressed=zstd) default charset=latin1;
    Query OK, 0 rows affected, 3 warnings (0.04 sec)
    
    mysql> show create table t3\G
    *************************** 1. row ***************************
    Table: t3
    Create Table: CREATE TABLE `t3` (
    `c1` varchar(100) DEFAULT NULL,
    `c2` varchar(100) DEFAULT NULL,
    `c3` varchar(100) DEFAULT NULL
    ) ENGINE=InnoDB DEFAULT CHARSET=latin1
    1 row in set (0.01 sec)

Result Verification

  1. Run the show create table statement to display the table structure information. It is found that the information contains "/*! If the content in "99990 800220201 COMPRESSED=xxxx */", column compression is used.

    Example

    mysql> show create table t1\G
    *************************** 1. row ***************************
           Table: t1
    Create Table: CREATE TABLE `t1` (
      `c1` varchar(100) /*!99990 800220201 COMPRESSED=ZLIB */ DEFAULT NULL,
      `c2` varchar(100) /*!99990 800220201 COMPRESSED=ZLIB */ DEFAULT NULL,
      `c3` varchar(100) /*!99990 800220201 COMPRESSED=ZSTD */ DEFAULT NULL
    ) ENGINE=InnoDB DEFAULT CHARSET=latin1
    1 row in set (0.00 sec)
  2. Use the system view information_schema.columns to query compressed columns.

    Example

    mysql> select TABLE_SCHEMA, TABLE_NAME, COLUMN_NAME, EXTRA from information_schema.columns where extra like '%compressed%';
    +--------------+------------+-------------+-----------------+
    | TABLE_SCHEMA | TABLE_NAME | COLUMN_NAME | EXTRA           |
    +--------------+------------+-------------+-----------------+
    | test         | t1         | c1          | COMPRESSED=ZLIB |
    | test         | t1         | c2          | COMPRESSED=ZLIB |
    | test         | t1         | c3          | COMPRESSED=ZSTD |
    | test         | t2         | c2          | COMPRESSED=ZLIB |
    +--------------+------------+-------------+-----------------+
    4 rows in set (0.50 sec)
  3. Query the status information to determine the actual number of times that the column compression or decompression API is called.
    mysql> show global status like '%column%compress%';
    +--------------------------------+-------+
    | Variable_name                  | Value |
    +--------------------------------+-------+
    | Innodb_column_compress_count   | 243   |
    | Innodb_column_uncompress_count | 34    |
    +--------------------------------+-------+
  4. Run the following statement or view the information on the monitoring page to compare the table usage space before and after compression and check the compression effect.
    SELECT table_name AS Table, round(((data_length + index_length) / 1024 / 1024), 2) AS Size in MB FROM information_schema.TABLES WHERE table_schema = "***" and table_name='***'

Compression Ratio and Performance Impact Verification

  1. Insert 10,000 rows of data in a table randomly. Each row consists of 32 character strings returned by 400 MD5 functions.
    CREATE TABLE `random_data` (
    `id` int(11) NOT NULL AUTO_INCREMENT,
    `data` longtext,
    PRIMARY KEY (`id`)
    ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;
     
    DELIMITER $$
    CREATE PROCEDURE `generate_random_data`()
    BEGIN
      DECLARE i INT DEFAULT 1;
      DECLARE j INT DEFAULT 1;
      DECLARE str longtext;
      WHILE i <= 10000 DO
        SET j = 1;
        SET str = '';
        WHILE j <= 400 DO
          SET str = CONCAT(str, MD5(RAND()));
          SET j = j + 1;
        END WHILE;
        INSERT INTO `random_data` (`data`) VALUES (str);
        SET i = i + 1;
      END WHILE;
    END$$
    DELIMITER ;

    Set rds_column_compression to 0 first and then set it to 2. Retain the default values for other parameters. Import the preceding table structure and invoke a stored procedure to insert data. Use the ZLIB or ZTSD algorithm to compress data. The ratio of the data file size before and after compression is 1.8.

  2. Use sysbench to import 64 tables. Each table contains 10 million rows of data. The types of the c and pad columns are changed to varchar. The modified table structure is as follows:
    CREATE TABLE `sbtest1` (
      `id` int NOT NULL AUTO_INCREMENT,
      `k` int NOT NULL DEFAULT '0',
      `c` varchar(120) COLLATE utf8mb4_0900_bin NOT NULL DEFAULT '',
      `pad` varchar(60) COLLATE utf8mb4_0900_bin NOT NULL DEFAULT '',
      PRIMARY KEY (`id`),
      KEY `k_1` (`k`)
    ) ENGINE=InnoDB AUTO_INCREMENT=10000001 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_bin
    • Set rds_column_compression to 0 first and then set it to 2, and retain the default values for other parameters. Import the table structure and data. After the calculation, only column c is compressed using ZLIB or ZSTD, the ratio of the data file size before and after compression is 1.2.
    • Theoretically, a higher compression level has a greater impact on performance. After compression, the performance loss is about 10%.