Agg

Description

The agg operator is used to perform aggregation calculation. It supports three policies: common aggregation (without grouping), sort aggregation, and hash aggregation. Sort aggregation and hash aggregation must be used together with GROUP BY because they involve grouping. The difference between sort aggregation and hash aggregation is that the input of sort aggregation must be ordered, while hash aggregation does not depend on the input sequence. Even if the input is sorted, sort aggregation may not be selected because hash aggregation may have better execution performance.

Typical Scenarios

Common aggregation: Only aggregation calculation is involved. It is used to collect statistics on the total number of tables, or the number of rows or features in a column. The keyword Aggregate is used in the execution plan.
Sort aggregation: If the input tuples are sorted or the group keys are ordered, the sort aggregation may be selected. The GroupAggregate keyword is used in the execution plan.
Hash aggregation: It is the aggregation calculation in most service scenarios and in the case of disordered data. The HashAggregate keyword is used in the execution plan.

Examples

Example 1: common aggregation.

-- Prepare data.
gaussdb=#DROP TABLE IF EXISTS student;
gaussdb=#CREATE TABLE student(id integer, class_id integer, grade number); 
CREATE TABLE 
gaussdb=#INSERT INTO student VALUES(generate_series(1,50), 1, floor(100 * random())); 
INSERT 0 50 
gaussdb=#INSERT INTO student VALUES(generate_series(51,100), 2, floor(100 * random())); 
INSERT 0 50 
gaussdb=#INSERT INTO student VALUES(generate_series(101,150), 3, floor(100 * random())); 
INSERT 0 50 
gaussdb=#INSERT INTO student VALUES(generate_series(151,200), 3, floor(100 * random())); 
INSERT 0 50

-- Execution result.
gaussdb=#EXPLAIN SELECT count(*) FROM student; 
                                QUERY PLAN
---------------------------------------------------------------------------
 Aggregate  (cost=13.23..13.27 rows=1 width=8)
   ->  Streaming (type: GATHER)  (cost=13.23..13.27 rows=2 width=8)
         Node/s: All datanodes
         ->  Aggregate  (cost=13.16..13.17 rows=2 width=8)
               ->  Seq Scan on student  (cost=0.00..13.13 rows=20 width=0)
(5 rows)

Example 2: sort aggregation.

gaussdb=#SET enable_hashagg = off;
SET
gaussdb=#EXPLAIN SELECT id, count(class_id) FROM student GROUP BY id ORDER BY id; 
                                QUERY PLAN
---------------------------------------------------------------------------
 Streaming (type: GATHER)  (cost=14.00..14.54 rows=21 width=16)
   Merge Sort Key: id
   Node/s: All datanodes
   ->  GroupAggregate  (cost=13.37..13.60 rows=21 width=16)
         Group By Key: id
         ->  Sort  (cost=13.37..13.40 rows=21 width=8)
               Sort Key: id
               ->  Seq Scan on student  (cost=0.00..13.13 rows=20 width=8)
(8 rows)

Example 3: hash aggregation.

gaussdb=#SET enable_hashagg = on;
gaussdb=#EXPLAIN SELECT id, avg(grade) FROM student GROUP BY id;  
                              QUERY PLAN
----------------------------------------------------------------------
 Streaming (type: GATHER)  (cost=14.84..15.59 rows=30 width=68)
   Node/s: All datanodes
   ->  HashAggregate  (cost=14.21..14.34 rows=30 width=68)
         Group By Key: id
         ->  Seq Scan on student  (cost=0.00..14.14 rows=30 width=36)
(5 rows)

-- Drop.
gaussdb=#DROP TABLE student;

In the preceding example, the output of the agg operator is as follows.

Item	Description
Aggregate	Name of an operator for common aggregation, indicating that aggregation is performed without grouping.
GroupAggregate	Name of an operator for sort aggregation, indicating that the input tuples are sorted or the grouping keys are ordered.
HashAggregate	Name of an operator for hash aggregation, which is used for aggregation calculation in most scenarios and in the case of disordered data.