窗口函数
窗口函数
窗口函数与OVER语句一起使用。OVER语句用于对数据进行分组,并对组内元素进行排序。窗口函数用于给组内的值生成序号。
窗口函数中的order by后面必须跟字段名,若order by后面跟数字,该数字会被按照常量处理,因此对目标列没有起到排序的作用。
当父查询中包含子查询中的窗口函数的过滤条件时,支持将父查询中的过滤条件下推到子查询中。
- 只支持将父查询中关于窗口函数的<、<=、=过滤条件下推到子查询中。
- 限制条件范围的上边界可以是常量、常量表达式、参数、非VOLATILE类型的函数、非相关子链接。
- 只支持ROW_NUMBROW_NUMBER()、 RANK()、DENSE_RANK()三个窗口函数。
示例:
--创建表。 gaussdb=# CREATE TABLE t2 (c1 INT, c2 INT); --导入数据。 gaussdb=# INSERT INTO t2 SELECT generate_series, generate_series FROM generate_series(1, 1000000); --执行查询,查询正常无报错,且执行结果正确。 gaussdb=# EXPLAIN ANALYZE SELECT nc1 FROM ( SELECT row_number() over() rid, t2.c1 nc1 FROM t2 ) WHERE rid BETWEEN 1 AND (1 + 10 - 1); QUERY PLAN -------------------------------------------------------------------------------------------------------------------- Streaming (type: GATHER) (cost=0.12..10.64 rows=5 width=4) (actual time=10.201..12.766 rows=10 loops=1) Node/s: All datanodes -> Subquery Scan on __unnamed_subquery__ (cost=0.00..10.45 rows=5 width=4) (actual time=[5.310,6.092]..[5.310,6.092], rows=10) Filter: (__unnamed_subquery__.rid >= 1) -> WindowAgg (cost=0.00..10.25 rows=16 width=4) (actual time=[5.262,5.873]..[5.262,5.873], rows=10) row_number_filter: (row_number() OVER () <= 10) -> Streaming(type: BROADCAST) (cost=0.00..10.29 rows=32 width=4) (actual time=[5.162,5.682]..[5.162,5.682], rows=11) Spawn on: All datanodes -> WindowAgg (cost=0.00..10.25 rows=16 width=4) (actual time=[1.759,1.815]..[1.786,1.844], rows=20) row_number_filter: (row_number() OVER () <= 10) -> Seq Scan on t2 (cost=0.00..13.13 rows=20 width=4) (actual time=[1.711,1.716]..[1.743,1.747], rows=22) Total runtime: 13.428 ms (12 rows) --环境清理,避免数据泄露。 gaussdb=# DROP TABLE t2;
- RANK()
描述:RANK函数为各组内值生成跳跃排序序号,其中,相同的值具有相同序号。
返回值类型:BIGINT
示例:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
gaussdb=# CREATE TABLE rank_t1(a int, b int); gaussdb=# INSERT INTO rank_t1 VALUES(1,1),(1,1),(1, 2),(1, 3),(2, 4),(2, 5),(3,6); gaussdb=# SELECT a,b,RANK() OVER(PARTITION BY a ORDER BY b) FROM rank_t1; a | b | rank ---+---+------ 1 | 1 | 1 1 | 1 | 1 1 | 2 | 3 1 | 3 | 4 2 | 4 | 1 2 | 5 | 2 3 | 6 | 1 (7 rows) gaussdb=# DROP TABLE rank_t1;
- ROW_NUMBER()
描述:ROW_NUMBER函数为各组内值生成连续排序序号,其中,相同的值其序号也不相同。
返回值类型:BIGINT
示例:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
gaussdb=# CREATE TABLE row_number_t1(a int, b int); gaussdb=# INSERT INTO row_number_t1 VALUES(1,1),(1,1),(1, 2),(1, 3),(2, 4),(2, 5),(3,6); gaussdb=# SELECT a,b,ROW_NUMBER() OVER(PARTITION BY a ORDER BY b) FROM row_number_t1; a | b | row_number ---+---+------------ 1 | 1 | 1 1 | 1 | 2 1 | 2 | 3 1 | 3 | 4 2 | 4 | 1 2 | 5 | 2 3 | 6 | 1 (7 rows) gaussdb=# DROP TABLE row_number_t1;
- DENSE_RANK()
描述:DENSE_RANK函数为各组内值生成连续排序序号,其中,相同的值具有相同序号。
返回值类型:BIGINT
示例:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
gaussdb=# CREATE TABLE dense_rank_t1(a int, b int); gaussdb=# INSERT INTO dense_rank_t1 VALUES(1,1),(1,1),(1, 2),(1, 3),(2, 4),(2, 5),(3,6); gaussdb=# SELECT a,b,DENSE_RANK() OVER(PARTITION BY a ORDER BY b) FROM dense_rank_t1; a | b | dense_rank ---+---+------------ 1 | 1 | 1 1 | 1 | 1 1 | 2 | 2 1 | 3 | 3 2 | 4 | 1 2 | 5 | 2 3 | 6 | 1 (7 rows) gaussdb=# DROP TABLE dense_rank_t1;
- PERCENT_RANK()
描述:PERCENT_RANK函数为各组内对应值生成相对序号,即根据公式 (rank - 1) / (totalrows - 1)计算所得的值。其中rank为该值依据RANK函数所生成的对应序号,totalrows为该分组内的总元素个数。
返回值类型:DOUBLE PRECISION
示例:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
gaussdb=# CREATE TABLE percent_rank_t1(a int, b int); gaussdb=# INSERT INTO percent_rank_t1 VALUES(1,1),(1,1),(1, 2),(1, 3),(2, 4),(2, 5),(3,6); gaussdb=# SELECT a,b,PERCENT_RANK() OVER(PARTITION BY a ORDER BY b) FROM percent_rank_t1; a | b | percent_rank ---+---+------------------ 1 | 1 | 0 1 | 1 | 0 1 | 2 | .666666666666667 1 | 3 | 1 2 | 4 | 0 2 | 5 | 1 3 | 6 | 0 (7 rows) gaussdb=# DROP TABLE percent_rank_t1;
- CUME_DIST()
描述:CUME_DIST函数为各组内对应值生成累积分布序号。即根据公式(小于等于当前值的数据行数)/(该分组总行数totalrows)计算所得的相对序号。
返回值类型:DOUBLE PRECISION
示例:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
gaussdb=# CREATE TABLE cume_dist_t1(a int, b int); gaussdb=# INSERT INTO cume_dist_t1 VALUES(1,1),(1,1),(1, 2),(1, 3),(2, 4),(2, 5),(3,6); gaussdb=# SELECT a,b,CUME_DIST() OVER(PARTITION BY a ORDER BY b) FROM cume_dist_t1; a | b | cume_dist ---+---+----------- 1 | 1 | .5 1 | 1 | .5 1 | 2 | .75 1 | 3 | 1 2 | 4 | .5 2 | 5 | 1 3 | 6 | 1 (7 rows) gaussdb=# DROP TABLE cume_dist_t1;
- NTILE(num_buckets integer)
描述:NTILE函数根据num_buckets integer将有序的数据集合平均分配到num_buckets所指定数量的桶中,并将桶号分配给每一行。分配时应尽量做到平均分配。
返回值类型:INTEGER
示例:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
gaussdb=# CREATE TABLE ntile_t1(a int, b int); gaussdb=# INSERT INTO ntile_t1 VALUES(1,1),(1,1),(1, 2),(1, 3),(2, 4),(2, 5),(3,6); gaussdb=# SELECT a,b,NTILE(2) OVER(PARTITION BY a ORDER BY b) FROM ntile_t1; a | b | ntile ---+---+------- 1 | 1 | 1 1 | 1 | 1 1 | 2 | 2 1 | 3 | 2 2 | 4 | 1 2 | 5 | 2 3 | 6 | 1 (7 rows) gaussdb=# DROP TABLE ntile_t1;
- LAG
描述:LAG函数为各组内对应值生成滞后值。即将当前值对应的行数向前偏移offset位后所得行的value值作为序号。若经过向前偏移后行数不存在,则对应结果取为default值。若无指定,在默认情况下,offset取值为1,default值取值为NULL。default值的类型需要与value值的类型保持一致。
语法:
LAG(value any [, offset integer [, default any ]]) LAG(value any ignore|respect nulls [, offset integer [, default any ]]) LAG(value any [, offset integer [, default any ]]) ignore|respect nulls
ignore|respect nulls表示是否将NULL包含在向前偏移值中。若无指定,则默认是respect nulls。若设置为ignore nulls,当value为NULL时,则将其不包含在向前偏移值中。如果开启了ignore nulls功能,则该函数会存在性能劣化。
返回值类型:与参数数据类型相同。
示例1:不开启ignore nulls功能,设置offset=3, defualt=null
-- 建表并插入数据 gaussdb=# CREATE TABLE ta1 (hire_date date, last_name varchar(20), department_id int); CREATE TABLE gaussdb=# INSERT INTO ta1 values('07-DEC-02', 'Raphaely', 30); INSERT 0 1 gaussdb=# INSERT INTO ta1 values('24-JUL-05', 'Tobias', 30); INSERT 0 1 gaussdb=# INSERT INTO ta1 values('24-DEC-05', 'Baida', 30); INSERT 0 1 gaussdb=# INSERT INTO ta1 values('18-MAY-03', 'Khoo', 30); INSERT 0 1 gaussdb=# INSERT INTO ta1 values('15-NOV-06', 'Himuro', 30); INSERT 0 1 gaussdb=# INSERT INTO ta1 values('10-AUG-07', 'Colmenares', 30); INSERT 0 1 gaussdb=# INSERT INTO ta1 values('10-MAY-07', 'yq', 11); INSERT 0 1 gaussdb=# INSERT INTO ta1 values('10-MAY-08', 'zi', 11); INSERT 0 1 gaussdb=# INSERT INTO ta1 values('', 'yq1', 30); INSERT 0 1 gaussdb=# INSERT INTO ta1 values(null, 'yq2', 30); INSERT 0 1 gaussdb=# INSERT INTO ta1 values('10-DEC-07', 'yq3', 30); INSERT 0 1 gaussdb=# INSERT INTO ta1 values(null, null, 11); INSERT 0 1 gaussdb=# INSERT INTO ta1 values(null, null, 11); INSERT 0 1 -- 调用LAG,指定offset=3, default=null gaussdb=# SELECT hire_date, last_name, department_id, lag(hire_date, 3, null) OVER (PARTITION BY department_id ORDER BY last_name) AS "NextHired" FROM ta1 ORDER BY department_id; hire_date | last_name | department_id | NextHired ---------------------+------------+---------------+--------------------- 2007-05-10 00:00:00 | yq | 11 | 2008-05-10 00:00:00 | zi | 11 | | | 11 | | | 11 | 2007-05-10 00:00:00 2005-12-24 00:00:00 | Baida | 30 | 2007-08-10 00:00:00 | Colmenares | 30 | 2006-11-15 00:00:00 | Himuro | 30 | 2003-05-18 00:00:00 | Khoo | 30 | 2005-12-24 00:00:00 2002-12-07 00:00:00 | Raphaely | 30 | 2007-08-10 00:00:00 2005-07-24 00:00:00 | Tobias | 30 | 2006-11-15 00:00:00 | yq1 | 30 | 2003-05-18 00:00:00 | yq2 | 30 | 2002-12-07 00:00:00 2007-12-10 00:00:00 | yq3 | 30 | 2005-07-24 00:00:00 (13 rows)
示例2:开启ignore nulls功能,设置offset=3, defualt='01-JAN-00'
gaussdb=# SELECT hire_date, last_name, department_id, lag(hire_date, 3, '01-JAN-00') ignore nulls OVER (PARTITION BY department_id ORDER BY last_name) AS "NextHired" FROM ta1 ORDER BY department_id; hire_date | last_name | department_id | NextHired ---------------------+------------+---------------+--------------------- 2007-05-10 00:00:00 | yq | 11 | 2000-01-01 00:00:00 2008-05-10 00:00:00 | zi | 11 | 2000-01-01 00:00:00 | | 11 | 2000-01-01 00:00:00 | | 11 | 2000-01-01 00:00:00 2005-12-24 00:00:00 | Baida | 30 | 2000-01-01 00:00:00 2007-08-10 00:00:00 | Colmenares | 30 | 2000-01-01 00:00:00 2006-11-15 00:00:00 | Himuro | 30 | 2000-01-01 00:00:00 2003-05-18 00:00:00 | Khoo | 30 | 2005-12-24 00:00:00 2002-12-07 00:00:00 | Raphaely | 30 | 2007-08-10 00:00:00 2005-07-24 00:00:00 | Tobias | 30 | 2006-11-15 00:00:00 | yq1 | 30 | 2003-05-18 00:00:00 | yq2 | 30 | 2003-05-18 00:00:00 2007-12-10 00:00:00 | yq3 | 30 | 2003-05-18 00:00:00 (13 rows) -- 删除表 gaussdb=# DROP TABLE ta1; DROP TABLE
- LEAD
描述:LEAD函数为各组内对应值生成提前值。即当前值对应的行数向后偏移offset位后所得行的value值作为序号。若经过向后偏移后行数超过当前组内的总行数,则对应结果取为default值。若无指定,在默认情况下,offset取为1,default值取为NULL。default值的类型需要与value值的类型保持一致。
语法:
LEAD(value any [, offset integer [, default any ]]) LEAD(value any ignore|respect nulls [, offset integer [, default any ]]) LEAD(value any [, offset integer [, default any ]]) ignore|respect nulls
ignore|respect nulls表示是否将NULL包含在向后偏移值中。若无指定,则默认是respect nulls。若设置为ignore nulls,当value为NULL时,则将其不包含在向后偏移值中。如果开启了ignore nulls功能,则该函数会存在性能劣化。
返回值类型:与参数数据类型相同。
示例1:不开启ignore nulls功能,设置offset=2,不指定default
-- 建表并插入数据 gaussdb=# CREATE TABLE ta1 (hire_date date, last_name varchar(20), department_id int); CREATE TABLE gaussdb=# INSERT INTO ta1 values('07-DEC-02', 'Raphaely', 30); INSERT 0 1 gaussdb=# INSERT INTO ta1 values('24-JUL-05', 'Tobias', 30); INSERT 0 1 gaussdb=# INSERT INTO ta1 values('24-DEC-05', 'Baida', 30); INSERT 0 1 gaussdb=# INSERT INTO ta1 values('18-MAY-03', 'Khoo', 30); INSERT 0 1 gaussdb=# INSERT INTO ta1 values('15-NOV-06', 'Himuro', 30); INSERT 0 1 gaussdb=# INSERT INTO ta1 values('10-AUG-07', 'Colmenares', 30); INSERT 0 1 gaussdb=# INSERT INTO ta1 values('10-MAY-07', 'yq', 11); INSERT 0 1 gaussdb=# INSERT INTO ta1 values('10-MAY-08', 'zi', 11); INSERT 0 1 gaussdb=# INSERT INTO ta1 values('', 'yq1', 30); INSERT 0 1 gaussdb=# INSERT INTO ta1 values(null, 'yq2', 30); INSERT 0 1 gaussdb=# INSERT INTO ta1 values('10-DEC-07', 'yq3', 30); INSERT 0 1 gaussdb=# INSERT INTO ta1 values(null, null, 11); INSERT 0 1 gaussdb=# INSERT INTO ta1 values(null, null, 11); INSERT 0 1 -- 调用LEAD,指定offset=2 gaussdb=# SELECT hire_date, last_name, department_id, lead(hire_date, 2) OVER (PARTITION BY department_id ORDER BY last_name) AS "NextHired" FROM ta1 ORDER BY department_id; hire_date | last_name | department_id | NextHired ---------------------+------------+---------------+--------------------- 2007-05-10 00:00:00 | yq | 11 | 2008-05-10 00:00:00 | zi | 11 | | | 11 | | | 11 | 2005-12-24 00:00:00 | Baida | 30 | 2006-11-15 00:00:00 2007-08-10 00:00:00 | Colmenares | 30 | 2003-05-18 00:00:00 2006-11-15 00:00:00 | Himuro | 30 | 2002-12-07 00:00:00 2003-05-18 00:00:00 | Khoo | 30 | 2005-07-24 00:00:00 2002-12-07 00:00:00 | Raphaely | 30 | 2005-07-24 00:00:00 | Tobias | 30 | | yq1 | 30 | 2007-12-10 00:00:00 | yq2 | 30 | 2007-12-10 00:00:00 | yq3 | 30 | (13 rows)
示例2:开启ignore nulls功能,设置offset=2,default='01-JAN-00'
gaussdb=# SELECT hire_date, last_name, department_id, lead(hire_date, 2, '01-JAN-00') ignore nulls OVER (PARTITION BY department_id ORDER BY last_name) AS "NextHired" FROM ta1 ORDER BY department_id; hire_date | last_name | department_id | NextHired ---------------------+------------+---------------+--------------------- 2007-05-10 00:00:00 | yq | 11 | 2000-01-01 00:00:00 2008-05-10 00:00:00 | zi | 11 | 2000-01-01 00:00:00 | | 11 | 2000-01-01 00:00:00 | | 11 | 2000-01-01 00:00:00 2005-12-24 00:00:00 | Baida | 30 | 2006-11-15 00:00:00 2007-08-10 00:00:00 | Colmenares | 30 | 2003-05-18 00:00:00 2006-11-15 00:00:00 | Himuro | 30 | 2002-12-07 00:00:00 2003-05-18 00:00:00 | Khoo | 30 | 2005-07-24 00:00:00 2002-12-07 00:00:00 | Raphaely | 30 | 2007-12-10 00:00:00 2005-07-24 00:00:00 | Tobias | 30 | 2000-01-01 00:00:00 | yq1 | 30 | 2000-01-01 00:00:00 | yq2 | 30 | 2000-01-01 00:00:00 2007-12-10 00:00:00 | yq3 | 30 | 2000-01-01 00:00:00 (13 rows) -- 删除表 gaussdb=# DROP TABLE ta1; DROP TABLE
- FIRST_VALUE(value any)
描述:FIRST_VALUE函数取各组内的第一个值作为返回结果。
返回值类型:与参数数据类型相同。
示例:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
gaussdb=# CREATE TABLE first_value_t1(a int, b int); gaussdb=# INSERT INTO first_value_t1 VALUES(1,1),(1,1),(1, 2),(1, 3),(2, 4),(2, 5),(3,6); gaussdb=# SELECT a,b,FIRST_VALUE(b) OVER(PARTITION BY a ORDER BY b) FROM first_value_t1; a | b | first_value ---+---+------------- 1 | 1 | 1 1 | 1 | 1 1 | 2 | 1 1 | 3 | 1 2 | 4 | 4 2 | 5 | 4 3 | 6 | 6 (7 rows) gaussdb=# DROP TABLE first_value_t1;
- LAST_VALUE(value any)
描述:LAST_VALUE函数取各组内的最后一个值作为返回结果。
返回值类型:与参数数据类型相同。
示例:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
gaussdb=# CREATE TABLE last_value_t1(a int, b int); gaussdb=# INSERT INTO last_value_t1 VALUES(1,1),(1,1),(1, 2),(1, 3),(2, 4),(2, 5),(3,6); gaussdb=# SELECT a,b,LAST_VALUE(b) OVER(PARTITION BY a ORDER BY b) FROM last_value_t1; a | b | last_value ---+---+------------ 1 | 1 | 1 1 | 1 | 1 1 | 2 | 2 1 | 3 | 3 2 | 4 | 4 2 | 5 | 5 3 | 6 | 6 (7 rows) gaussdb=# DROP TABLE last_value_t1;
- NTH_VALUE(value any, nth integer)
描述:NTH_VALUE函数返回该组内的第nth行作为结果。若该行不存在,则默认返回NULL。
返回值类型:与参数数据类型相同。
示例:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
gaussdb=# CREATE TABLE nth_value_t1(a int, b int); gaussdb=# INSERT INTO nth_value_t1 VALUES(1,1),(1,1),(1, 2),(1, 3),(2, 4),(2, 5),(3,6); gaussdb=# SELECT a,b,NTH_VALUE(b, 2) OVER(PARTITION BY a order by b) FROM nth_value_t1; a | b | nth_value ---+---+----------- 1 | 1 | 1 1 | 1 | 1 1 | 2 | 1 1 | 3 | 1 2 | 4 | 2 | 5 | 5 3 | 6 | (7 rows) gaussdb=# DROP TABLE nth_value_t1;
- delta
参数:numeric
返回值类型:numeric
- spread
参数:real
返回值类型:real
- RATIO_TO_REPORT(column_name)
参数:数值类型,或任意可以隐式转换成数值类型的类型。
返回值类型:入参为float4和float8,返回值类型与入参一致;其余入参类型返回numeric类型。
RATIO_TO_REPORT(column_name)与OVER()一起使用时,OVER()中入参只支持PRTITION BY和NULL。
示例1:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
gaussdb=# CREATE TABLE ratio_to_report_t1(a int, b int); gaussdb=# INSERT INTO ratio_to_report_t1 VALUES(1,1),(1,1),(1, 2),(1, 3),(2, 4),(2, 5),(3,6); gaussdb=# SELECT a,b,RATIO_TO_REPORT(b) OVER(PARTITION BY a) FROM ratio_to_report_t1; a | b | ratio_to_report ---+---+------------------------ 1 | 1 | .14285714285714285714 1 | 1 | .14285714285714285714 1 | 2 | .28571428571428571429 1 | 3 | .42857142857142857143 2 | 4 | .44444444444444444444 2 | 5 | .55555555555555555556 3 | 6 | 1.00000000000000000000 (7 rows) gaussdb=# DROP TABLE ratio_to_report_t1;
示例2:与其它函数嵌套使用。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
gaussdb=# CREATE TABLE ratio_to_report_t1(a int, b int); gaussdb=# INSERT INTO ratio_to_report_t1 VALUES(1,1),(1,1),(1, 2),(1, 3),(2, 4),(2, 5),(3,6); gaussdb=# SELECT a,b,TO_CHAR(RATIO_TO_REPORT(b) OVER(PARTITION BY a), '$999eeee') FROM ratio_to_report_t1; a | b | to_char ---+---+--------- 1 | 1 | 1e-01 1 | 1 | 1e-01 1 | 2 | 3e-01 1 | 3 | 4e-01 2 | 4 | 4e-01 2 | 5 | 6e-01 3 | 6 | 1e+00 (7 rows) gaussdb=# DROP TABLE ratio_to_report_t1;
示例3:存储过程调用。
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28
gaussdb=# CREATE TABLE ratio_to_report_t1(a int, b int); gaussdb=# INSERT INTO ratio_to_report_t1 VALUES(1,1),(1,1),(1, 2),(1, 3),(2, 4),(2, 5),(3,6); gaussdb=# CREATE OR REPLACE PROCEDURE ratio_to_report_proc IS CURSOR cur_1 IS SELECT a,b,RATIO_TO_REPORT(b) OVER(PARTITION BY a) FROM ratio_to_report_t1; BEGIN FOR cur IN cur_1 LOOP RAISE INFO '%', cur.ratio_to_report; END LOOP; END; / gaussdb=# CALL RATIO_TO_REPORT_PROC(); INFO: .14285714285714285714 INFO: .14285714285714285714 INFO: .28571428571428571429 INFO: .42857142857142857143 INFO: .44444444444444444444 INFO: .55555555555555555556 INFO: 1.00000000000000000000 ratio_to_report_proc ---------------------- (1 row) gaussdb=# DROP PROCEDURE ratio_to_report_proc; gaussdb=# DROP TABLE ratio_to_report_t1;