Updated on 2024-10-25 GMT+08:00

GROUP BY

GROUP BY

GROUP BY groups the output rows of a SELECT statement into groups that contain matching values. A simple GROUP BY can contain any expression consisting of input columns, or select the sequence number of the output column by position.

The following queries are equivalent:

SELECT count(*), nationkey FROM customer GROUP BY 2;
SELECT count(*), nationkey FROM customer GROUP BY nationkey;

GROUP BY can group the output by the input column names that do not appear in the output of the SELECT statement.

Example:

SELECT count(*) FROM customer GROUP BY mktsegment;
GROUPING SETS

You can specify multiple columns for grouping. The result column that does not belong to the grouping column is set to NULL. Queries with complex grouping syntax (GROUPING SETS, CUBE, or ROLLUP) read the underlying data source only once, while queries using UNION ALL read the underlying data three times. This is why queries that use UNION ALL can produce inconsistent results when the data source is not deterministic.

-- Create a shipping table:
create table shipping(origin_state varchar(25),origin_zip integer,destination_state varchar(25) ,destination_zip integer,package_weight integer);

--Insert data.
insert into shipping values ('California',94131,'New Jersey',8648,13),
('California',94131,'New Jersey',8540,42),
('California',90210,'Connecticut',6927,1337),
('California',94131,'Colorado',80302,5),
('New York',10002,'New Jersey',8540,3),
('New Jersey',7081,'Connecticut',6708,225);

-- Query the grouping sets.
SELECT
	origin_state,
	origin_zip,
	destination_state,
	sum( package_weight ) 
FROM shipping 
GROUP BY GROUPING SETS (
		( origin_state ),
	( origin_state, origin_zip ),
	( destination_state ));
--Logically, this query is equivalent to the union all of multiple group queries.
SELECT origin_state, NULL,NULL,sum( package_weight ) FROM shipping GROUP BY origin_state UNION ALL  SELECT origin_state,origin_zip,NULL,sum( package_weight ) FROM shipping GROUP BY origin_state,origin_zip UNION ALL  SELECT NULL,NULL,destination_state,sum( package_weight ) FROM  shipping GROUP BY  destination_state;
--Result
 origin_state | origin_zip | destination_state | _col3 
--------------|------------|-------------------|-------
 New Jersey   |       NULL | NULL              |   225 
 California   |      94131 | NULL              |    60 
 California   |       NULL | NULL              |  1397 
 New York     |      10002 | NULL              |     3 
 NULL         |       NULL | New Jersey        |    58 
 NULL         |       NULL | Connecticut       |  1562 
 California   |      90210 | NULL              |  1337 
 New York     |       NULL | NULL              |     3 
 NULL         |       NULL | Colorado          |     5 
 New Jersey   |       7081 | NULL              |   225 
(10 rows)

CUBE

Generate all possible groups for given columns. For example, the possible groups of (origin_state, destination_state) are (origin_state, destination_state), (origin_state), (destination_state), and ().

SELECT
	origin_state,
	destination_state,
	sum( package_weight ) 
FROM
	shipping 
GROUP BY
	CUBE ( origin_state, destination_state );
-- Equivalent to:
SELECT
origin_state,
destination_state,
sum( package_weight ) 
FROM
	shipping 
GROUP BY
	GROUPING SETS (
		( origin_state, destination_state ),
		( origin_state ),
	( destination_state ),
	());

ROLLUP

Generates partial possible subtotals for a given set of columns.

SELECT
	origin_state,
	origin_zip,
	sum( package_weight ) 
FROM
	shipping 
GROUP BY
	ROLLUP ( origin_state, origin_zip );
-- Equivalent to:
SELECT
origin_state,
origin_zip,
sum( package_weight ) 
FROM
	shipping 
GROUP BY
	GROUPING SETS ((origin_state,origin_zip ),( origin_state ),());

Currently, GROUP BY does not support column aliases. For example:

select count(userid) as num ,dept as aaa from salary group by aaa having sum(sal)>2000;

The following error is reported:

Query 20210630_084610_00018_wc8n9@default@HetuEngine failed: line 1:63: Column 'aaa' cannot be resolved