Slow Performance Caused by Too Many Small CUs in Column Storage

In actual service scenarios, a large number of column-store tables are used. However, improper use of column-store tables may cause serious performance problems. The most common problem is slow performance caused by too many small CUs.

Symptom

The system I/O surges for a long time, and the query becomes slow occasionally.
After checking the execution plan information when the service is occasionally slow, it is found that the cstore scan is slow. The reason is that although the amount of data to be scanned is small, the number of CUs to be scanned is large.

As shown in the following figure, a CU can store 60,000 records, but more than 2000 CUs need to be scanned for 70,000 records. There are too many small CUs.

Troubleshooting

Check the data distribution in the table CUs. Perform the following operations on DNs:

Check the cudesc table corresponding to the column-store table.

For non-partitioned tables:

      
           SELECT 'cstore.'||relname FROM pg_class where oid = (SELECT relcudescrelid FROM pg_class c inner join pg_namespace n on c.relnamespace = n.oid where relname = 'table name' and nspname = 'schema name');

For partitioned tables:

      
           SELECT 'cstore.'||relname FROM pg_class where oid in (SELECT p.relcudescrelid FROM pg_partition p,pg_class c,pg_namespace n where c.relnamespace = n.oid and p.parentid = c.oid and c.relname = 'table name' and n.nspname = 'schema name' and p.relcudescrelid != 0);

Check the rowcount of each CU in the cudesc table.
Query the cudesc table information returned in step 1. The query result is similar to the following. Pay attention to the number of CUs whose row_count is too small (far less than 60,000). If the number is large, there are many small CUs and the CU expansion problem is serious, affecting the storage efficiency and query access efficiency.

Trigger Conditions

Column-store data is frequently imported in small batches. In scenarios where partitions are involved and the number of partitions is large, the small CU problem is more serious.

Solutions

Service Side

Import column-store tables in batches. The amount of data to be imported at a time (if there are partitions, the amount of data to be imported to a single partition at time) is close to or greater than 60,000 x Number of primary DNs.
If the data volume is small, you are advised to use row-store tables for data import.

Maintenance Portal

If the amount of data to be imported to the database cannot be adjusted on the service side, periodically perform VACUUM FULL on column-store tables to integrate small CUs. This will relieve the problem to some extent.

Parent topic: Cluster Performance

Previous topic: Performance Deterioration Due to Data Bloat

Next topic: Reducing I/O Usage