更新时间:2024-11-05 GMT+08:00

案例:使排序下推

在做场景性能测试时,发现某场景大部分时间是CN端在做window agg,占到总执行时间95%以上,系统资源不能充分利用。研究发现该场景的特点是:将两列分别求sum作为一个子查询,外层对两列的和再求和后做trunc,然后排序。可以尝试将语句改写为子查询,使排序下推。

优化前

表结构如下所示:

1
2
CREATE TABLE public.test(imsi int,L4_DW_THROUGHPUT int,L4_UL_THROUGHPUT int)
with (orientation = column) DISTRIBUTE BY hash(imsi);

查询语句如下所示:

1
2
3
4
5
6
7
SELECT COUNT(1) over() AS DATACNT,
IMSI AS IMSI_IMSI,
CAST(TRUNC(((SUM(L4_UL_THROUGHPUT) + SUM(L4_DW_THROUGHPUT))), 0) AS
DECIMAL(20)) AS TOTAL_VOLOME_KPIID
FROM public.test AS test
GROUP BY IMSI
ORDER BY TOTAL_VOLOME_KPIID DESC LIMIT 10;
执行计划如下:
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
QUERY PLAN                                                                                
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  id |                    operation                     |      A-time      | A-rows  | E-rows  | E-distinct | Peak Memory  |   E-memory   | A-width | E-width | E-costs  
 ----+--------------------------------------------------+------------------+---------+---------+------------+--------------+--------------+---------+---------+----------
   1 | ->  Row Adapter                                  | 2862.008         |      10 |      10 |            | 31KB         |              |         |      28 | 48360.42 
   2 |    ->  Vector Limit                              | 2861.969         |      10 |      10 |            | 8KB          |              |         |      28 | 48360.42 
   3 |       ->  Vector Sort                            | 2861.946         |      10 | 1000000 |            | 479KB        |              |         |      28 | 50860.39 
   4 |          ->  Vector WindowAgg                    | 2166.759         | 1000000 | 1000000 |            | 69987KB      |              |         |      28 | 26750.75 
   5 |             ->  Vector Streaming (type: GATHER)  | 136.813          | 1000000 | 1000000 |            | 208KB        |              |         |      28 | 15500.75 
   6 |                ->  Vector Sonic Hash Aggregate   | [71.374, 73.640] | 1000000 | 1000000 |            | [14MB, 14MB] | 96MB(2919MB) | [31,31] |      28 | 15032.00 
   7 |                   ->  CStore Scan on public.test | [2.957, 2.994]   | 1000000 | 1000000 |            | [1MB, 1MB]   | 1MB          |         |      12 | 1282.00  

可以看到window agg和sort全部在CN端执行,耗时非常严重。

优化后

尝试将语句改写为子查询:

1
2
3
4
5
6
7
SELECT COUNT(1) over() AS DATACNT, IMSI_IMSI, TOTAL_VOLOME_KPIID
FROM (SELECT IMSI AS IMSI_IMSI,
CAST(TRUNC(((SUM(L4_UL_THROUGHPUT) + SUM(L4_DW_THROUGHPUT))),
0) AS DECIMAL(20)) AS TOTAL_VOLOME_KPIID
FROM public.test AS test
GROUP BY IMSI
ORDER BY TOTAL_VOLOME_KPIID DESC LIMIT 10);

将trunc两列的和作为一个子查询,然后在子查询的外面做window agg,这样排序就可以下推了,执行计划如下:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
 QUERY PLAN                                                                                    
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  id |                       operation                        |       A-time       | A-rows  | E-rows  | E-distinct |  Peak Memory   |   E-memory   | A-width | E-width | E-costs  
 ----+--------------------------------------------------------+--------------------+---------+---------+------------+----------------+--------------+---------+---------+----------
   1 | ->  Row Adapter                                        | 955.277            |      10 |       5 |            | 31KB           |              |         |      24 | 25843.13 
   2 |    ->  Vector WindowAgg                                | 955.261            |      10 |       5 |            | 1572KB         |              |         |      24 | 25843.13 
   3 |       ->  Vector Streaming (type: GATHER)              | 955.015            |      10 |      10 |            | 127KB          |              |         |      24 | 25843.07 
   4 |          ->  Vector Limit                              | [0.018, 0.018]     |      10 |      10 |            | [8KB, 8KB]     | 1MB          |         |      28 | 25836.97 
   5 |             ->  Vector Streaming(type: BROADCAST)      | [0.014, 0.014]     |      20 |      20 |            | [719KB, 719KB] | 2MB          |         |      28 | 25837.12 
   6 |                ->  Vector Limit                        | [927.730, 934.283] |      20 |      20 |            | [8KB, 8KB]     | 1MB          |         |      28 | 25836.85 
   7 |                   ->  Vector Sort                      | [927.720, 934.269] |      20 | 1000000 |            | [463KB, 463KB] | 16MB         | [32,32] |      28 | 27086.82 
   8 |                      ->  Vector Sonic Hash Aggregate   | [456.841, 461.077] | 1000000 | 1000000 |            | [15MB, 15MB]   | 96MB(2916MB) | [31,31] |      28 | 15032.00 
   9 |                         ->  CStore Scan on public.test | [2.959, 3.014]     | 1000000 | 1000000 |            | [1MB, 1MB]     | 1MB          |         |      12 | 1282.00  

经过SQL改写,性能由2.862s提升0.955s,优化效果明显。需注意,本示例中优化结果仅供参考,由于WindowAgg的不确定性,优化后的结果集跟实际业务相关。