案例:使排序下推
在做场景性能测试时,发现某场景大部分时间是CN端在做window agg,占到总执行时间95%以上,系统资源不能充分利用。研究发现该场景的特点是:将两列分别求sum作为一个子查询,外层对两列的和再求和后做trunc,然后排序。可以尝试将语句改写为子查询,使排序下推。
优化前
表结构如下所示:
1 2 |
CREATE TABLE public.test(imsi int,L4_DW_THROUGHPUT int,L4_UL_THROUGHPUT int) with (orientation = column) DISTRIBUTE BY hash(imsi); |
查询语句如下所示:
1 2 3 4 5 6 7 |
SELECT COUNT(1) over() AS DATACNT, IMSI AS IMSI_IMSI, CAST(TRUNC(((SUM(L4_UL_THROUGHPUT) + SUM(L4_DW_THROUGHPUT))), 0) AS DECIMAL(20)) AS TOTAL_VOLOME_KPIID FROM public.test AS test GROUP BY IMSI ORDER BY TOTAL_VOLOME_KPIID DESC LIMIT 10; |
1 2 3 4 5 6 7 8 9 10 11 |
QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------------------------------------------- id | operation | A-time | A-rows | E-rows | E-distinct | Peak Memory | E-memory | A-width | E-width | E-costs ----+--------------------------------------------------+------------------+---------+---------+------------+--------------+--------------+---------+---------+---------- 1 | -> Row Adapter | 2862.008 | 10 | 10 | | 31KB | | | 28 | 48360.42 2 | -> Vector Limit | 2861.969 | 10 | 10 | | 8KB | | | 28 | 48360.42 3 | -> Vector Sort | 2861.946 | 10 | 1000000 | | 479KB | | | 28 | 50860.39 4 | -> Vector WindowAgg | 2166.759 | 1000000 | 1000000 | | 69987KB | | | 28 | 26750.75 5 | -> Vector Streaming (type: GATHER) | 136.813 | 1000000 | 1000000 | | 208KB | | | 28 | 15500.75 6 | -> Vector Sonic Hash Aggregate | [71.374, 73.640] | 1000000 | 1000000 | | [14MB, 14MB] | 96MB(2919MB) | [31,31] | 28 | 15032.00 7 | -> CStore Scan on public.test | [2.957, 2.994] | 1000000 | 1000000 | | [1MB, 1MB] | 1MB | | 12 | 1282.00 |
可以看到window agg和sort全部在CN端执行,耗时非常严重。
优化后
尝试将语句改写为子查询:
1 2 3 4 5 6 7 |
SELECT COUNT(1) over() AS DATACNT, IMSI_IMSI, TOTAL_VOLOME_KPIID FROM (SELECT IMSI AS IMSI_IMSI, CAST(TRUNC(((SUM(L4_UL_THROUGHPUT) + SUM(L4_DW_THROUGHPUT))), 0) AS DECIMAL(20)) AS TOTAL_VOLOME_KPIID FROM public.test AS test GROUP BY IMSI ORDER BY TOTAL_VOLOME_KPIID DESC LIMIT 10); |
将trunc两列的和作为一个子查询,然后在子查询的外面做window agg,这样排序就可以下推了,执行计划如下:
1 2 3 4 5 6 7 8 9 10 11 12 13 |
QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ id | operation | A-time | A-rows | E-rows | E-distinct | Peak Memory | E-memory | A-width | E-width | E-costs ----+--------------------------------------------------------+--------------------+---------+---------+------------+----------------+--------------+---------+---------+---------- 1 | -> Row Adapter | 955.277 | 10 | 5 | | 31KB | | | 24 | 25843.13 2 | -> Vector WindowAgg | 955.261 | 10 | 5 | | 1572KB | | | 24 | 25843.13 3 | -> Vector Streaming (type: GATHER) | 955.015 | 10 | 10 | | 127KB | | | 24 | 25843.07 4 | -> Vector Limit | [0.018, 0.018] | 10 | 10 | | [8KB, 8KB] | 1MB | | 28 | 25836.97 5 | -> Vector Streaming(type: BROADCAST) | [0.014, 0.014] | 20 | 20 | | [719KB, 719KB] | 2MB | | 28 | 25837.12 6 | -> Vector Limit | [927.730, 934.283] | 20 | 20 | | [8KB, 8KB] | 1MB | | 28 | 25836.85 7 | -> Vector Sort | [927.720, 934.269] | 20 | 1000000 | | [463KB, 463KB] | 16MB | [32,32] | 28 | 27086.82 8 | -> Vector Sonic Hash Aggregate | [456.841, 461.077] | 1000000 | 1000000 | | [15MB, 15MB] | 96MB(2916MB) | [31,31] | 28 | 15032.00 9 | -> CStore Scan on public.test | [2.959, 3.014] | 1000000 | 1000000 | | [1MB, 1MB] | 1MB | | 12 | 1282.00 |
经过SQL改写,性能由2.862s提升0.955s,优化效果明显。需注意,本示例中优化结果仅供参考,由于WindowAgg的不确定性,优化后的结果集跟实际业务相关。