案例：调整GUC参数best_agg_plan

现象描述

t1的表定义为：

    
         create table t1(a int, b int, c int) store as orc;

假设agg下层算子所输出结果集的分布列为setA，agg操作的group by列为setB，则在Stream框架下，Agg操作可以分为两个场景。

场景一：setA是setB的一个子集。

对于这种场景，直接对下层结果集进行汇聚的结果就是正确的汇聚结果，上层算子直接使用即可。

场景二：setA不是setB的一个子集。

对于这种场景，Stream执行框架分为如下三种计划形态：

hashagg+gather(redistribute)+hashagg

redistribute+hashagg(+gather)

hashagg+redistribute+hashagg(+gather)

DataArtsFabric SQL提供了guc参数best_agg_plan来干预执行计划，强制其生成上述对应的执行计划，此参数取值范围为0，1，2，3

取值为1时，强制生成第一种计划。
取值为2时，如果group by列可以重分布，强制生成第二种计划，否则生成第一种计划。
取值为3时，如果group by列可以重分布，强制生成第三种计划，否则生成第一种计划。
取值为0时，优化器会根据以上三种计划的估算代价选择最优的一种计划生成。

具体影响如下：

postgres=> set best_agg_plan=1;
SET
postgres=> explain select a, count(1) from t1 group by a;
QUERY PLAN
------------------------------------------------------------------------------------------
id |                 operation                 | E-rows | E-memory | E-width | E-costs
----+-------------------------------------------+--------+----------+---------+---------
1 | ->  Row Adapter                           |    200 |          |      12 | 85.16
2 |    ->  Vector Sonic Hash Aggregate        |    200 |          |      12 | 85.16
3 |       ->  Vector Streaming (type: GATHER) |    522 |          |      12 | 85.16
4 |          ->  Vector Sonic Hash Aggregate  |    522 | 16MB     |      12 | 63.41
5 |             ->  Vector Foreign Scan on t1 |   1000 | 1MB      |       4 | 60.00

Predicate Information (identified by plan id)
---------------------------------------------
5 --Vector Foreign Scan on t1
Server Type: lf
Total files left: 0

====== Query Summary =====
------------------------------
System available mem: 262144KB
Query Max mem: 262144KB
Query estimated mem: 1872KB
(19 rows)

postgres=> set best_agg_plan=2;
SET
postgres=> explain select a, count(1) from t1 group by a;
QUERY PLAN
--------------------------------------------------------------------------------------------------
id |                     operation                     | E-rows | E-memory | E-width | E-costs
----+---------------------------------------------------+--------+----------+---------+---------
1 | ->  Row Adapter                                   |    201 |          |      12 | 78.84
2 |    ->  Vector Streaming (type: GATHER)            |    201 |          |      12 | 78.84
3 |       ->  Vector Sonic Hash Aggregate             |    201 | 16MB     |      12 | 70.47
4 |          ->  Vector Streaming(type: REDISTRIBUTE) |   1000 | 2MB      |       4 | 67.39
5 |             ->  Vector Foreign Scan on t1         |   1000 | 1MB      |       4 | 60.00

Predicate Information (identified by plan id)
---------------------------------------------
5 --Vector Foreign Scan on t1
Server Type: lf
Total files left: 0

====== Query Summary =====
------------------------------
System available mem: 262144KB
Query Max mem: 262144KB
Query estimated mem: 3965KB
(19 rows)

postgres=> set best_agg_plan=3;
SET
postgres=> explain select a, count(1) from t1 group by a;
QUERY PLAN
--------------------------------------------------------------------------------------------------
id |                     operation                     | E-rows | E-memory | E-width | E-costs
----+---------------------------------------------------+--------+----------+---------+---------
1 | ->  Row Adapter                                   |    200 |          |      12 | 82.25
2 |    ->  Vector Streaming (type: GATHER)            |    200 |          |      12 | 82.25
3 |       ->  Vector Sonic Hash Aggregate             |    200 | 16MB     |      12 | 70.25
4 |          ->  Vector Streaming(type: REDISTRIBUTE) |    522 | 2MB      |      12 | 68.03
5 |             ->  Vector Sonic Hash Aggregate       |    522 | 16MB     |      12 | 63.41
6 |                ->  Vector Foreign Scan on t1      |   1000 | 1MB      |       4 | 60.00

Predicate Information (identified by plan id)
---------------------------------------------
6 --Vector Foreign Scan on t1
Server Type: lf
Total files left: 0

====== Query Summary =====
------------------------------
System available mem: 262144KB
Query Max mem: 262144KB
Query estimated mem: 3968KB
(20 rows)

总结

通常优化器总会选择最优的执行计划，但是众所周知代价估算，尤其是中间结果集的代价估算一般会有比较大的偏差，这种比较大的偏差就可能会导致agg的计算方式出现比较大的偏差，这时候就需要通过best_agg_plan进行agg计算模型的干预。

一般来说，当agg汇聚的收敛度很小时，即结果集的个数在agg之后并没有明显变少时（经验上以5倍为临界点），选择redistribute+hashagg执行方式，否则选择hashagg+redistribute+hashagg执行方式。

父主题： SQL调优案例

上一篇：SQL调优案例

下一篇：Python UDF性能调优

意见反馈

文档内容是否对您有帮助？

有帮助没帮助

提供反馈

提交成功！非常感谢您的反馈，我们会继续努力做到更好！您可在我的云声建议查看反馈及问题处理状态。

系统繁忙，请稍后重试

在使用文档中是否遇到以下问题

内容与产品页面不一致

内容不易理解

缺失示例代码

步骤不可操作

搜不到想要的内容

缺少最佳实践

意见反馈（选填）

0/500

请至少选择一项反馈信息并填写问题反馈

字符长度不能超过500

直接提交取消

如您有其它疑问，您也可以通过华为云社区问答频道来与我们联系探讨

盘古Doer提问云社区提问

案例：调整GUC参数best_agg_plan

现象描述

总结

相关文档

意见反馈

文档内容是否对您有帮助？