Updated on 2024-11-05 GMT+08:00

Case: Adjusting the Partial Clustering Key

Partial Cluster Key (PCK) is an index technology that uses min/max indexes to quickly scan base tables in column storage. Partial cluster key can specify multiple columns, but you are advised to specify no more than two columns. It can be used to accelerated queries on large column-store tables.

Before Optimization

Create a column-store table orders_no_pck without partial clustering (PCK). The table is defined as follows:

Run the following SQL statement to query the execution plan of a point query:

1
2
3
4
EXPLAIN PERFORMANCE
SELECT * FROM orders_no_pck
WHERE o_orderkey = '13095143'
ORDER BY o_orderdate;

As shown in the following figure, the execution time is 48 ms. Check Datanode Information. It is found that the filter time is 19 ms and the CUNone ratio is 0.

After Optimization

The created column-store table orders_pck is defined as follows:

Use ALTER TABLE to set the o_orderkey field to PCK:

Run the following SQL statement to query the execution plan of the same point query SQL statement again:

1
2
3
4
EXPLAIN PERFORMANCE
SELECT * FROM orders_pck
WHERE o_orderkey = '13095143'
ORDER BY o_orderdate;

As shown in the following figure, the execution time is 5 ms. Check Datanode Information. It is found that the filter time is 0.5 ms and the CUNone ratio is 82. The higher the CUNone ratio, the higher performance that the PCK will bring.