Help Center> GaussDB(DWS)> Troubleshooting> Cluster Performance> Table Bloat Occurs After Data Is Inserted into a Column-Store Table for Multiple Times
Updated on 2024-03-08 GMT+08:00

Table Bloat Occurs After Data Is Inserted into a Column-Store Table for Multiple Times

Symptom

After INSERT is executed for multiple times in a column-store table, table bloat occurs.

Possible Causes

In a column-store table, data is stored by column. Every 60,000 rows in a column are stored as a CU. CUs in the same column are continuously stored in a file. When the file is larger than 1 GB, more CUs will be stored in a new file. Data in a CU file cannot be modified and can only be appended. After VACUUM is performed on a column-store table that is frequently deleted and updated, even the space marked as available cannot be reused because the data in a CU file cannot be changed. To reuse the space, you need to change the CUs. Therefore, you are not advised to frequently delete and update column-store tables in GaussDB(DWS).

Handling Procedure

You are advised to enable the delta table function for column-store tables.

1
ALTER TABLE table_name SET (ENABLE_DELTA = ON);
  • Enabling the delta table function of a column-store table can prevent small CUs from being generated when a single piece of data or a small amount of data is imported to the table, hence improving performance. For example, if 100 pieces of data are imported each time in a cluster with 3 CNs and 6 DNs, the import time can be reduced by 25%, the storage space usage can be reduced by 97%. Therefore, you need to enable the delta table before inserting a small batch of data for multiple times and disable the delta table after confirming that no small batch of data needs to be imported.
  • A delta table is a row-store table attached to a column-store table. After data is inserted into a delta table, the high compression ratio of the column-store table is lost. In normal cases, column-store tables are used to import a large amount of data. Therefore, the delta table is disabled by default, if the delta table is enabled when a large amount of data is imported, more time and space are consumed. If the delta table is enabled when 10,000 data records each time are imported in a cluster with 3 DNs and 6 DNs, the import speed is four times slower and more than 10 times of the space is consumed than that when the delta table is disabled. Therefore, exercise caution when enabling the delta table.