Agg

Description

The Agg operator is used to perform aggregation calculation. It supports three policies: common aggregation (aggregation without grouping), sort aggregation, and hash aggregation. Sort aggregation and hash aggregation must be used together with GROUP BY because they involve grouping. The difference between sort aggregation and hash aggregation is that the input of sort aggregation must be ordered, while hash aggregation does not depend on the input sequence. Even if the input is sorted, sort aggregation may not be selected because hash aggregation may have better execution performance.

Typical Scenarios

Common aggregation: Only aggregation calculation is involved. It is used to collect statistics on the total number of tables, or the number of rows or features in a column. The keyword Aggregate is used in the execution plan.
Sort aggregation: If the input tuples are sorted or the group keys are ordered, the sort aggregation may be selected. The GroupAggregate keyword is used in the execution plan.
Hash aggregation: It is the aggregation calculation in most service scenarios and in the case of disordered data. The HashAggregate keyword is used in the execution plan.

Examples

Example 1: common aggregation.

-- Prepare data.
gaussdb=# CREATE TABLE student(id integer, class_id integer, grade number); 
CREATE TABLE 
gaussdb=# INSERT INTO student VALUES(generate_series(1,50), 1, floor(100 * random())); 
INSERT 0 50 
gaussdb=# INSERT INTO student VALUES(generate_series(51,100), 2, floor(100 * random())); 
INSERT 0 50 
gaussdb=# INSERT INTO student VALUES(generate_series(101,150), 3, floor(100 * random())); 
INSERT 0 50 
gaussdb=# INSERT INTO student VALUES(generate_series(151,200), 3, floor(100 * random())); 
INSERT 0 50

-- Execution result.
gaussdb=# EXPLAIN SELECT count(*) FROM student; 
                           QUERY PLAN                             
----------------------------------------------------------------- 
 Aggregate  (cost=24.59..24.60 rows=1 width=8) 
   ->  Seq Scan on student  (cost=0.00..21.67 rows=1167 width=0) 
(2 rows)

Example 2: sort aggregation.

gaussdb=# SET enable_hashagg = off;
SET
gaussdb=# EXPLAIN SELECT id, count(class_id) FROM student GROUP BY id ORDER BY id; 
                             QUERY PLAN                               
--------------------------------------------------------------------- 
 GroupAggregate  (cost=10.64..14.14 rows=200 width=16) 
   Group By Key: id 
   ->  Sort  (cost=10.64..11.14 rows=200 width=8) 
         Sort Key: id 
         ->  Seq Scan on student  (cost=0.00..3.00 rows=200 width=8) 
(5 rows)

Example 3: hash aggregation.

-- Execution result.
gaussdb=# EXPLAIN SELECT id, avg(grade) FROM student GROUP BY id; 
                            QUERY PLAN                             
------------------------------------------------------------------ 
 HashAggregate  (cost=27.51..30.01 rows=200 width=36) 
   Group By Key: id 
   ->  Seq Scan on student  (cost=0.00..21.67 rows=1167 width=36) 
(3 rows)

-- Drop.
gaussdb=# DROP TABLE student;

In the preceding example, the output of the aggregate operator is as follows.

Item	Description
Aggregate	Name of a common aggregation operator, indicating that only aggregation is performed without grouping.
GroupAggregate	Name of a sort aggregation operator, indicating that the input tuple is sorted or the grouping key is ordered.
HashAggregate	Name of a hash aggregation operator, which is used for aggregation calculation in most scenarios and in the case of disordered data.