Merge Append
场景描述
当对分区表进行全局排序时,通常SQL引擎的实现方式是先通过Partition Iterator + PartitionScan对分区表做全量扫描然后进行Sort排序操作,这样难以利用数据分区分治的算法思想进行全局排序,假如ORDER BY排序列包含索引,本身局部有序的前提条件则无法利用。针对这类问题,目前分区表支持了分区归并排序执行策略,利用Merge Append的执行机制改进分区表的排序机制。
示例
分区表MergeAppend的执行机制示例如下:
gaussdb=# CREATE TABLE test_range_pt (a INT, b INT, c INT) PARTITION BY RANGE(a) ( PARTITION p1 VALUES LESS THAN (2000), PARTITION p2 VALUES LESS THAN (3000), PARTITION p3 VALUES LESS THAN (4000), PARTITION p4 VALUES LESS THAN (5000), PARTITION p5 VALUES LESS THAN (MAXVALUE) )ENABLE ROW MOVEMENT; gaussdb=# INSERT INTO test_range_pt VALUES (generate_series(1,10000),generate_series(1,10000),generate_series(1,10000)); gaussdb=# CREATE INDEX idx_range_b ON test_range_pt(b) LOCAL; gaussdb=# ANALYZE test_range_pt; gaussdb=# EXPLAIN ANALYZE SELECT * FROM test_range_pt WHERE b >10 AND b < 5000 ORDER BY b LIMIT 10; QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------------------------------- Limit (cost=0.06..1.02 rows=10 width=12) (actual time=0.990..1.041 rows=10 loops=1) -> Result (cost=0.06..480.32 rows=10 width=12) (actual time=0.988..1.036 rows=10 loops=1) -> Merge Append (cost=0.06..480.32 rows=10 width=12) (actual time=0.985..1.026 rows=10 loops=1) Sort Key: b -> Partitioned Index Scan using idx_range_b on test_range_pt (cost=0.00..44.61 rows=998 width=12) (actual time=0.256..0.284 rows=10 loops=1) Index Cond: ((b > 10) AND (b < 5000)) Selected Partitions: 1 -> Partitioned Index Scan using idx_range_b on test_range_pt (cost=0.00..44.61 rows=998 width=12) (actual time=0.208..0.208 rows=1 loops=1) Index Cond: ((b > 10) AND (b < 5000)) Selected Partitions: 2 -> Partitioned Index Scan using idx_range_b on test_range_pt (cost=0.00..44.61 rows=998 width=12) (actual time=0.205..0.205 rows=1 loops=1) Index Cond: ((b > 10) AND (b < 5000)) Selected Partitions: 3 -> Partitioned Index Scan using idx_range_b on test_range_pt (cost=0.00..44.61 rows=998 width=12) (actual time=0.212..0.212 rows=1 loops=1) Index Cond: ((b > 10) AND (b < 5000)) Selected Partitions: 4 -> Partitioned Index Scan using idx_range_b on test_range_pt (cost=0.00..44.61 rows=998 width=12) (actual time=0.092..0.092 rows=0 loops=1) Index Cond: ((b > 10) AND (b < 5000)) Selected Partitions: 5 Total runtime: 1.656 ms (20 rows) --关闭分区表MergeAppend算子 gaussdb=# SET sql_beta_feature = 'disable_merge_append_partition'; SET gaussdb=# EXPLAIN ANALYZE SELECT * FROM test_range_pt WHERE b >10 AND b < 5000 ORDER BY b LIMIT 10; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------ Limit (cost=296.85..296.88 rows=10 width=12) (actual time=33.559..33.565 rows=10 loops=1) -> Sort (cost=296.85..309.33 rows=10 width=12) (actual time=33.555..33.557 rows=10 loops=1) Sort Key: b Sort Method: top-N heapsort Memory: 26kB -> Partition Iterator (cost=0.00..189.00 rows=4991 width=12) (actual time=0.352..27.176 rows=4989 loops=1) Iterations: 5 -> Partitioned Seq Scan on test_range_pt (cost=0.00..189.00 rows=4991 width=12) (actual time=16.874..25.637 rows=4989 loops=5) Filter: ((b > 10) AND (b < 5000)) Rows Removed by Filter: 5011 Selected Partitions: 1..5 Total runtime: 33.877 ms (11 rows) --清理示例 gaussdb=# DROP TABLE test_range_pt;
MergeAppend执行方式消耗远小于普通执行方式。
注意事项及约束条件
- 当分区扫描路径为Index/Index Only时,才支持MergeAppend执行机制。
- 分区剪枝结果大于1时,才支持MergeAppend执行机制。
- 当分区索引全部有效且为btree索引时,才支持MergeAppend执行机制。
- 当SQL含有Limit子句时,才支持MergeAppend执行机制。
- 当分区扫描时如果存在Filter,不支持MergeAppend执行机制。
- 当GUC参数sql_beta_feature = 'disable_merge_append_partition'时,不再生成MergeAppend路径。