案例：调整GUC参数best_agg_plan

现象描述

agg_t1的表定义为：

     create table agg_t1(a int, b int, c int) distribute by hash(a);

假设agg下层算子所输出结果集的分布列为setA，agg操作的group by列为setB，则在Stream框架下，Agg操作可以分为两个场景。

setA是setB的一个子集。

对于这种场景，直接对下层结果集进行汇聚的结果就是正确的汇聚结果，上层算子直接使用即可。如下图所示：

     gaussdb=# explain select a, count(1) from agg_t1 group by a;
 id |          operation           | E-rows | E-width | E-costs
----+------------------------------+--------+---------+---------
  1 | ->  Streaming (type: GATHER) |     20 |      12 | 14.23
  2 |    ->  HashAggregate         |     20 |      12 | 13.30
  3 |       ->  Seq Scan on agg_t1 |     20 |       4 | 13.13
(3 rows)
 
 
  

setA不是setB的一个子集。

对于这种场景，Stream执行框架分为如下三种计划形态：

hashagg + gather(redistribute) + hashagg；
redistribute + hashagg(+ gather)；
hashagg + redistribute + hashagg(+ gather)。

GaussDB提供了GUC参数best_agg_plan来干预执行计划，强制其生成上述对应的执行计划，此参数取值范围为0，1，2，3。

取值为1时，强制生成第一种计划。
取值为2时，如果group by列可以重分布，强制生成第二种计划，否则生成第一种计划。
取值为3时，如果group by列可以重分布，强制生成第三种计划，否则生成第一种计划。
取值为0时，优化器会根据以上三种计划的估算代价选择最优的一种计划生成。

具体影响如下所示：

     gaussdb=# set best_agg_plan to 1;
SET
gaussdb=# explain select b,count(1) from agg_t1 group by b;
 id |            operation            | E-rows | E-width | E-costs
----+---------------------------------+--------+---------+---------
  1 | ->  HashAggregate               |     10 |      12 | 14.23
  2 |    ->  Streaming (type: GATHER) |     20 |      12 | 14.23
  3 |       ->  HashAggregate         |     20 |      12 | 13.30
  4 |          ->  Seq Scan on agg_t1 |     20 |       4 | 13.13
(4 rows)
gaussdb=# set best_agg_plan to 2;
SET
gaussdb=# explain select b,count(1) from agg_t1 group by b;
 id |                operation                | E-rows | E-width | E-costs
----+-----------------------------------------+--------+---------+---------
  1 | ->  Streaming (type: GATHER)            |     20 |      12 | 14.52
  2 |    ->  HashAggregate                    |     20 |      12 | 13.58
  3 |       ->  Streaming(type: REDISTRIBUTE) |     20 |       4 | 13.42
  4 |          ->  Seq Scan on agg_t1         |     20 |       4 | 13.13
(4 rows)
gaussdb=# set best_agg_plan to 3;
SET
gaussdb=# explain select b,count(1) from agg_t1 group by b;
 id |                operation                | E-rows | E-width | E-costs
----+-----------------------------------------+--------+---------+---------
  1 | ->  Streaming (type: GATHER)            |     20 |      12 | 14.97
  2 |    ->  HashAggregate                    |     20 |      12 | 13.76
  3 |       ->  Streaming(type: REDISTRIBUTE) |     20 |      12 | 13.56
  4 |          ->  HashAggregate              |     20 |      12 | 13.30
  5 |             ->  Seq Scan on agg_t1      |     20 |       4 | 13.13
(5 rows)