Column Compression
To reduce the storage occupied by data pages and costs, GaussDB(for MySQL) provides two algorithms ZLIB and ZSTD for fine-grained column compression. You can select either of them to compress large columns that are not frequently accessed based on compression ratio and performance. Automatic column compression is also supported.
Application scenario: There are large columns that are not frequently accessed in tables and users want to compress these columns to reduce costs.
Constraints
- The kernel version of your GaussDB(for MySQL) instance must be 2.0.54.240600 or later.
- Partitioned tables, temporary tables, and non-InnoDB engine tables are not supported.
- A compressed column cannot contain an index (primary key, unique index, secondary index, foreign key, and full-text index).
- Only the following data types are supported: BLOB (including TINYBLOB, BLOB, MEDIUMBLOB, and LONGBLOB) and TEXT (including TINYTEXT, TEXT, MEDIUMTEXT, and LONGTEXT). VARCHAR, and VARBINARY.
- This feature cannot be used on generated columns.
- EXCHANGE PARTITION cannot be executed between a partitioned table and a table with compressed columns.
- IMPORT TABLESPACE is not supported.
- This feature can be used only in CREATE TABLE, ALTER TABLE ADD, ALTER TABLE CHANGE, and ALTER TABLE MODIFY statements.
- ALTER TABLE ADD COLUMN does not support the INSTANT algorithm. The INSTANT algorithm cannot be used when the ALTER TABLE {CHANGE|MODIFY} syntax involves data changes.
- In automatic compression scenarios (rds_column_compression = 2), the compression attributes can be added only when the maximum length of a column is at least the compression threshold (rds_column_compression_threshold). In explicit compression scenarios (rds_column_compression = 1), if the maximum length of a column is less than the compression threshold, the compression attributes can be added but a warning message is received.
- If a table contains compressed columns, NDP is not supported for compute pushdown.
- When you manually perform binlog synchronization, ALTER statements are incompatible. You are advised to use HINT.
- When you use DRS to migrate data from one instance to another that does not support column compression, the compression attribute is eliminated. The full migration task can be performed. During incremental migration, if ALTER statements contain compression columns, the migration task fails.
- When physical backups are used to restore data, the related versions must support column compression.
- If column compression has been used after the version upgrade, the version cannot be rolled back to a version without this feature.
Syntax
The column_definition definition is extended to support compression when column attributes are defined in CREATE TABLE, ALTER TABLE ADD, ALTER TABLE CHANGE, and ALTER TABLE MODIFY statements.
create_definition: { col_name column_definition | {INDEX | KEY} [index_name] [index_type] (key_part,...) [index_option] ... | {FULLTEXT | SPATIAL} [INDEX | KEY] [index_name] (key_part,...) [index_option] ... | [CONSTRAINT [symbol]] PRIMARY KEY [index_type] (key_part,...) [index_option] ... | [CONSTRAINT [symbol]] UNIQUE [INDEX | KEY] [index_name] [index_type] (key_part,...) [index_option] ... | [CONSTRAINT [symbol]] FOREIGN KEY [index_name] (col_name,...) reference_definition | check_constraint_definition } alter_option: { table_options | ADD [COLUMN] col_name column_definition [FIRST | AFTER col_name] | ADD [COLUMN] (col_name column_definition,...) | CHANGE [COLUMN] old_col_name new_col_name column_definition [FIRST | AFTER col_name] | MODIFY [COLUMN] col_name column_definition [FIRST | AFTER col_name] ...
column_definition is as follows:
column_definition: { data_type [NOT NULL | NULL] [DEFAULT {literal | (expr)} ] [VISIBLE | INVISIBLE] [AUTO_INCREMENT] [UNIQUE [KEY]] [[PRIMARY] KEY] [COMMENT 'string'] [COLLATE collation_name] [COLUMN_FORMAT {FIXED | DYNAMIC | DEFAULT}] [COLUMN_FORMAT {FIXED|DYNAMIC|DEFAULT}|COMPRESSED[={ZLIB|ZSTD}**]] [ENGINE_ATTRIBUTE [=] 'string'] [SECONDARY_ENGINE_ATTRIBUTE [=] 'string'] [STORAGE {DISK | MEMORY}] [reference_definition] [check_constraint_definition] | data_type [COLLATE collation_name] [GENERATED ALWAYS] AS (expr) [VIRTUAL | STORED] [NOT NULL | NULL] [VISIBLE | INVISIBLE] [UNIQUE [KEY]] [[PRIMARY] KEY] [COMMENT 'string'] [reference_definition] [check_constraint_definition] }
Feature Parameter Description
Parameter |
Description |
Value Range |
Default Value |
Level |
Dynamic Validation |
---|---|---|---|---|---|
rds_column_compression |
|
[0,2] |
0 |
GLOBAL |
Yes |
rds_default_column_compression_algorithm |
Controls default compression algorithm for column compression. The algorithm is used when
|
ZLIB or ZSTD |
ZLIB |
GLOBAL |
Yes |
rds_column_compression_threshold |
Controls the threshold for triggering column compression.
|
[20-4294967295] |
100 |
GLOBAL |
Yes |
rds_zlib_column_compression_level |
Specifies the compression level of the ZLIB column compression algorithm.
|
[0,9] |
6 |
GLOBAL |
Yes |
rds_zstd_column_compression_level |
Specifies the compression level of the ZSTD column compression algorithm. A smaller value indicates faster compression but a poorer effect, while a larger value indicates slower compression but a better effect. |
[1,22] |
3 |
GLOBAL |
Yes |
Example
- Explicitly create a compressed column.
mysql> show variables like 'rds_column_compression'; +------------------------+-------+ | Variable_name | Value | +------------------------+-------+ | rds_column_compression | 1 | +------------------------+-------+ 1 row in set (0.00 sec) mysql> show variables like 'rds_default_column_compression_algorithm'; +------------------------------------------+-------+ | Variable_name | Value | +------------------------------------------+-------+ | rds_default_column_compression_algorithm | ZLIB | +------------------------------------------+-------+ 1 row in set (0.00 sec) mysql> create table t1(c1 varchar(100) compressed, c2 varchar(100) compressed=zlib, c3 varchar(100) compressed=zstd) default charset=latin1; Query OK, 0 rows affected (0.06 sec) mysql> show create table t1\G *************************** 1. row *************************** Table: t1 Create Table: CREATE TABLE `t1` ( `c1` varchar(100) /*!99990 800220201 COMPRESSED=ZLIB */ DEFAULT NULL, `c2` varchar(100) /*!99990 800220201 COMPRESSED=ZLIB */ DEFAULT NULL, `c3` varchar(100) /*!99990 800220201 COMPRESSED=ZSTD */ DEFAULT NULL ) ENGINE=InnoDB DEFAULT CHARSET=latin1 1 row in set (0.00 sec)
- Automatically create a compressed column.
mysql> set global rds_column_compression = 2; Query OK, 0 rows affected (0.00 sec) mysql> show variables like 'rds_column_compression'; +------------------------+-------+ | Variable_name | Value | +------------------------+-------+ | rds_column_compression | 2 | +------------------------+-------+ 1 row in set (0.01 sec) mysql> show variables like 'rds_column_compression_threshold'; +----------------------------------+-------+ | Variable_name | Value | +----------------------------------+-------+ | rds_column_compression_threshold | 100 | +----------------------------------+-------+ 1 row in set (0.01 sec) mysql> show variables like 'rds_default_column_compression_algorithm'; +------------------------------------------+-------+ | Variable_name | Value | +------------------------------------------+-------+ | rds_default_column_compression_algorithm | ZLIB | +------------------------------------------+-------+ 1 row in set (0.01 sec) mysql> create table t2(c1 varchar(99), c2 varchar(100)) default charset=latin1; Query OK, 0 rows affected (0.05 sec) mysql> show create table t2\G *************************** 1. row *************************** Table: t2 Create Table: CREATE TABLE `t2` ( `c1` varchar(99) DEFAULT NULL, `c2` varchar(100) /*!99990 800220201 COMPRESSED=ZLIB */ DEFAULT NULL ) ENGINE=InnoDB DEFAULT CHARSET=latin1 1 row in set (0.01 sec)
- Disable column compression.
mysql> set global rds_column_compression = 0; Query OK, 0 rows affected (0.00 sec) mysql> show variables like 'rds_column_compression'; +------------------------+-------+ | Variable_name | Value | +------------------------+-------+ | rds_column_compression | 0 | +------------------------+-------+ 1 row in set (0.01 sec) mysql> show variables like 'rds_column_compression_threshold'; +----------------------------------+-------+ | Variable_name | Value | +----------------------------------+-------+ | rds_column_compression_threshold | 100 | +----------------------------------+-------+ 1 row in set (0.01 sec) mysql> create table t3(c1 varchar(100) compressed, c2 varchar(100) compressed=zlib, c3 varchar(100) compressed=zstd) default charset=latin1; Query OK, 0 rows affected, 3 warnings (0.04 sec) mysql> show create table t3\G *************************** 1. row *************************** Table: t3 Create Table: CREATE TABLE `t3` ( `c1` varchar(100) DEFAULT NULL, `c2` varchar(100) DEFAULT NULL, `c3` varchar(100) DEFAULT NULL ) ENGINE=InnoDB DEFAULT CHARSET=latin1 1 row in set (0.01 sec)
Result Verification
- Run the show create table statement to display the table structure information. It is found that the information contains "/*! If the content in "99990 800220201 COMPRESSED=xxxx */", column compression is used.
Example
mysql> show create table t1\G *************************** 1. row *************************** Table: t1 Create Table: CREATE TABLE `t1` ( `c1` varchar(100) /*!99990 800220201 COMPRESSED=ZLIB */ DEFAULT NULL, `c2` varchar(100) /*!99990 800220201 COMPRESSED=ZLIB */ DEFAULT NULL, `c3` varchar(100) /*!99990 800220201 COMPRESSED=ZSTD */ DEFAULT NULL ) ENGINE=InnoDB DEFAULT CHARSET=latin1 1 row in set (0.00 sec)
- Use the system view information_schema.columns to query compressed columns.
Example
mysql> select TABLE_SCHEMA, TABLE_NAME, COLUMN_NAME, EXTRA from information_schema.columns where extra like '%compressed%'; +--------------+------------+-------------+-----------------+ | TABLE_SCHEMA | TABLE_NAME | COLUMN_NAME | EXTRA | +--------------+------------+-------------+-----------------+ | test | t1 | c1 | COMPRESSED=ZLIB | | test | t1 | c2 | COMPRESSED=ZLIB | | test | t1 | c3 | COMPRESSED=ZSTD | | test | t2 | c2 | COMPRESSED=ZLIB | +--------------+------------+-------------+-----------------+ 4 rows in set (0.50 sec)
- Query the status information to determine the actual number of times that the column compression or decompression API is called.
mysql> show global status like '%column%compress%'; +--------------------------------+-------+ | Variable_name | Value | +--------------------------------+-------+ | Innodb_column_compress_count | 243 | | Innodb_column_uncompress_count | 34 | +--------------------------------+-------+
- Run the following statement or view the information on the monitoring page to compare the table usage space before and after compression and check the compression effect.
SELECT table_name AS Table, round(((data_length + index_length) / 1024 / 1024), 2) AS Size in MB FROM information_schema.TABLES WHERE table_schema = "***" and table_name='***'
Compression Ratio and Performance Impact Verification
- Insert 10,000 rows of data in a table randomly. Each row consists of 32 character strings returned by 400 MD5 functions.
CREATE TABLE `random_data` ( `id` int(11) NOT NULL AUTO_INCREMENT, `data` longtext, PRIMARY KEY (`id`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4; DELIMITER $$ CREATE PROCEDURE `generate_random_data`() BEGIN DECLARE i INT DEFAULT 1; DECLARE j INT DEFAULT 1; DECLARE str longtext; WHILE i <= 10000 DO SET j = 1; SET str = ''; WHILE j <= 400 DO SET str = CONCAT(str, MD5(RAND())); SET j = j + 1; END WHILE; INSERT INTO `random_data` (`data`) VALUES (str); SET i = i + 1; END WHILE; END$$ DELIMITER ;
Set rds_column_compression to 0 first and then set it to 2. Retain the default values for other parameters. Import the preceding table structure and invoke a stored procedure to insert data. Use the ZLIB or ZTSD algorithm to compress data. The ratio of the data file size before and after compression is 1.8.
- Use sysbench to import 64 tables. Each table contains 10 million rows of data. The types of the c and pad columns are changed to varchar. The modified table structure is as follows:
CREATE TABLE `sbtest1` ( `id` int NOT NULL AUTO_INCREMENT, `k` int NOT NULL DEFAULT '0', `c` varchar(120) COLLATE utf8mb4_0900_bin NOT NULL DEFAULT '', `pad` varchar(60) COLLATE utf8mb4_0900_bin NOT NULL DEFAULT '', PRIMARY KEY (`id`), KEY `k_1` (`k`) ) ENGINE=InnoDB AUTO_INCREMENT=10000001 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_bin
- Set rds_column_compression to 0 first and then set it to 2, and retain the default values for other parameters. Import the table structure and data. After the calculation, only column c is compressed using ZLIB or ZSTD, the ratio of the data file size before and after compression is 1.2.
- Theoretically, a higher compression level has a greater impact on performance. After compression, the performance loss is about 10%.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot