Agg
Description
The Agg operator is used to perform aggregation calculation. It supports three policies: common aggregation (aggregation without grouping), sort aggregation, and hash aggregation. Sort aggregation and hash aggregation must be used together with GROUP BY because they involve grouping. The difference between sort aggregation and hash aggregation is that the input of sort aggregation must be ordered, while hash aggregation does not depend on the input sequence. Even if the input is sorted, sort aggregation may not be selected because hash aggregation may have better execution performance.
Typical Scenarios
- Common aggregation: Only aggregation calculation is involved. It is used to collect statistics on the total number of tables, or the number of rows or features in a column. The keyword Aggregate is used in the execution plan.
- Sort aggregation: If the input tuples are sorted or the group keys are ordered, the sort aggregation may be selected. The GroupAggregate keyword is used in the execution plan.
- Hash aggregation: It is the aggregation calculation in most service scenarios and in the case of disordered data. The HashAggregate keyword is used in the execution plan.
Examples
Example 1: common aggregation.
-- Prepare data. gaussdb=# DROP TABLE IF EXISTS student; gaussdb=# CREATE TABLE student(id integer, class_id integer, grade number); CREATE TABLE gaussdb=# INSERT INTO student VALUES(generate_series(1,50), 1, floor(100 * random())); INSERT 0 50 gaussdb=# INSERT INTO student VALUES(generate_series(51,100), 2, floor(100 * random())); INSERT 0 50 gaussdb=# INSERT INTO student VALUES(generate_series(101,150), 3, floor(100 * random())); INSERT 0 50 gaussdb=# INSERT INTO student VALUES(generate_series(151,200), 3, floor(100 * random())); INSERT 0 50 -- Execution result. gaussdb=# EXPLAIN SELECT count(*) FROM student; QUERY PLAN --------------------------------------------------------------------------- Aggregate (cost=13.23..13.27 rows=1 width=8) -> Streaming (type: GATHER) (cost=13.23..13.27 rows=2 width=8) Node/s: All datanodes -> Aggregate (cost=13.16..13.17 rows=2 width=8) -> Seq Scan on student (cost=0.00..13.13 rows=20 width=0) (5 rows)
Example 2: sort aggregation.
gaussdb=# SET enable_hashagg = off; SET gaussdb=# EXPLAIN SELECT id, count(class_id) FROM student GROUP BY id ORDER BY id; QUERY PLAN --------------------------------------------------------------------------- Streaming (type: GATHER) (cost=14.00..14.54 rows=21 width=16) Merge Sort Key: id Node/s: All datanodes -> GroupAggregate (cost=13.37..13.60 rows=21 width=16) Group By Key: id -> Sort (cost=13.37..13.40 rows=21 width=8) Sort Key: id -> Seq Scan on student (cost=0.00..13.13 rows=20 width=8) (8 rows)
Example 3: hash aggregation.
gaussdb=# SET enable_hashagg = on; gaussdb=# SET enable_enable_fast_query_shipping = on; gaussdb=# EXPLAIN SELECT id, avg(grade) FROM student GROUP BY id; QUERY PLAN ---------------------------------------------------------------------------- Streaming (type: GATHER) (cost=14.00..14.57 rows=21 width=68) Node/s: All datanodes -> GroupAggregate (cost=13.37..13.63 rows=21 width=68) Group By Key: id -> Sort (cost=13.37..13.40 rows=21 width=36) Sort Key: id -> Seq Scan on student (cost=0.00..13.13 rows=20 width=36) (7 rows) -- Drop. gaussdb=# DROP TABLE student;
In the preceding example, the output of the aggregate operator is as follows.
| Item | Description | 
|---|---|
| Aggregate | Name of a common aggregation operator, indicating that only aggregation is performed without grouping. | 
| GroupAggregate | Name of a sort aggregation operator, indicating that the input tuple is sorted or the grouping key is ordered. | 
| HashAggregate | Name of a hash aggregation operator, which is used for aggregation calculation in most scenarios and in the case of disordered data. | 
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.
 
    