Updated on 2024-12-31 GMT+08:00

Window Functions

Window functions perform calculations across rows of query results. They are run after the HAVING clause but before the ORDER BY clause. To call a window function, you need to use the OVER clause to specify the special syntax of the window. The window consists of three parts:

  • Partition specification, which divides the input rows into different partitions. This is similar to how the GROUP BY clause divides rows into different groups in an aggregate function.
  • Sorting specification, which determines the order in which the window function will process the input rows
  • Window frame, which specifies the sliding window of the row to be processed by the specified function. If no frame is specified, the default value is RANGE UNBOUNDED PRECEDING, which is the same as the value of UNBOUNDEEN PREBODING AND CURRENT ROWGE. The frame contains all rows from the beginning of the partition to the last peer of the current row. In the absence of ORDER BY, all rows are treated as equivalent, so the range between the unbound preamble and the current row is equal to the range between the unbound preamble and the unbound subsequent rows.

    For example, in the following query, the information in the salary table is sorted according to the salary of employees in each department.

    -- Create a data table and insert data.
    create table salary (dept varchar, userid varchar, sal double);
     insert into salary values ('d1','user1',1000),('d1','user2',2000),('d1','user3',3000),('d2','user4',4000),('d2','user5',5000); 
    
    --Query data.
    select dept,userid,sal,rank() over (partition by dept order by sal desc) as rnk from salary order by dept,rnk;
    dept | userid |  sal   | rnk 
    ------|--------|--------|-----
     d1   | user3  | 3000.0 |   1 
     d1   | user2  | 2000.0 |   2 
     d1   | user1  | 1000.0 |   3 
     d2   | user5  | 5000.0 |   1 
     d2   | user4  | 4000.0 |   2 

Aggregate Functions

All aggregate functions can be used as window functions by adding over clauses. The aggregate function operates on each row of records in the current window framework.

The following query generates the rolling sum of the order price calculated by each employee by day:

select dept,userid,sal,sum(sal) over (partition by dept order by sal desc) as rolling_sum from salary order by dept,userid,sal;
dept | userid |  sal   | rolling_sum 
------|--------|--------|-------------
 d1   | user1  | 1000.0 |      6000.0 
 d1   | user2  | 2000.0 |      5000.0 
 d1   | user3  | 3000.0 |      3000.0 
 d2   | user4  | 4000.0 |      9000.0 
 d2   | user5  | 5000.0 |      5000.0 
(5 rows)

Ranking Functions

  • cume_dist()→ bigint

    Description: Number of lines less than or equal to the current value/Total number of lines in the group – For example, calculate the proportion of the number of employees whose salary is less than or equal to the current salary to the total number of employees.

     --Query Example
    SELECT dept, userid, sal, CUME_DIST() OVER(ORDER BY sal) AS rn1, CUME_DIST() OVER(PARTITION BY dept ORDER BY sal) AS rn2 FROM salary;
    dept | userid |  sal   | rn1 |        rn2         
    ------|--------|--------|-----|--------------------
     d2   | user4  | 4000.0 | 0.8 |                0.5 
     d2   | user5  | 5000.0 | 1.0 |                1.0 
     d1   | user1  | 1000.0 | 0.2 | 0.3333333333333333 
     d1   | user2  | 2000.0 | 0.4 | 0.6666666666666666 
     d1   | user3  | 3000.0 | 0.6 |                1.0 
    (5 rows)
  • dense_rank()→ bigint

    Description: Ranking of the returned value in a group of values. This is similar to rank (). The difference is that the tie value does not generate gaps in the sequence.

  • ntile(n)→ bigint

    Description: Divides packet data into n fragments in sequence and returns the current fragment value. NTILE does not support ROWS BETWEEN. For example, if NTILE(2) OVER(PARTITION BY cookieid ORDER BY createtime ROWS BETWEEN 3 PRECEDING AND CURRENT ROW) fragments are not evenly distributed, the distribution of the first fragment is added by default.

    --Create a table and insert data into the table.
    create table cookies_log (cookieid varchar,createtime  date,pv int);
    insert into cookies_log values 
    	('cookie1',date '2020-07-10',1),
    	('cookie1',date '2020-07-11',5), 
    	('cookie1',date '2020-07-12',7), 
    	('cookie1',date '2020-07-13',3),
    	('cookie1',date '2020-07-14',2), 
    	('cookie1',date '2020-07-15',4), 
    	('cookie1',date '2020-07-16',4), 
    	('cookie2',date '2020-07-10',2),
    	('cookie2',date '2020-07-11',3), 
    	('cookie2',date '2020-07-12',5),
            ('cookie2',date '2020-07-13',6), 
            ('cookie2',date '2020-07-14',3), 
            ('cookie2',date '2020-07-15',9),
    	('cookie2',date '2020-07-16',7);
    -- Query results.
    SELECT cookieid,createtime,pv,
    NTILE(2) OVER(PARTITION BY cookieid ORDER BY createtime) AS rn1,-- divides data into two fragments in a group.
    NTILE(3) OVER(PARTITION BY cookieid ORDER BY createtime) AS rn2,  -- divides data into three fragments in a group.
    NTILE(4) OVER(ORDER BY createtime) AS rn3   --divides all data into four fragments.
    FROM cookies_log
    ORDER BY cookieid,createtime;
    cookieid  | createtime | pv | rn1 | rn2 | rn3 
    ----------|------------|----|-----|-----|-----
     cookie1  | 2020-07-10 |  1 |   1 |   1 |   1 
     cookie1  | 2020-07-11 |  5 |   1 |   1 |   1 
     cookie1  | 2020-07-12 |  7 |   1 |   1 |   2 
     cookie1  | 2020-07-13 |  3 |   1 |   2 |   2 
     cookie1  | 2020-07-14 |  2 |   2 |   2 |   3 
     cookie1  | 2020-07-15 |  4 |   2 |   3 |   4 
     cookie1  | 2020-07-16 |  4 |   2 |   3 |   4 
     cookie2  | 2020-07-10 |  2 |   1 |   1 |   1 
     cookie2  | 2020-07-11 |  3 |   1 |   1 |   1 
     cookie2  | 2020-07-12 |  5 |   1 |   1 |   2 
     cookie2  | 2020-07-13 |  6 |   1 |   2 |   2 
     cookie2  | 2020-07-14 |  3 |   2 |   2 |   3 
     cookie2  | 2020-07-15 |  9 |   2 |   3 |   3 
     cookie2  | 2020-07-16 |  7 |   2 |   3 |   4 
    (14 rows)
  • percent_rank()→ double

    Description: Rankings of return values in percentage within a set of values. The result is (r-1)/(n-1), where r is the rank () of the row and n is the total number of rows in the window partition.

    SELECT dept,userid,sal,
    PERCENT_RANK() OVER(ORDER BY sal) AS rn1, -- in a group
    RANK() OVER(ORDER BY sal) AS rn11,          -- RANK value in a group
    SUM(1) OVER(PARTITION BY NULL) AS rn12,     --Total number of lines in the group
    PERCENT_RANK() OVER(PARTITION BY dept ORDER BY sal) AS rn2
    from salary;
     dept | userid |  sal   | rn1  | rn11 | rn12 | rn2 
    ------|--------|--------|------|------|------|-----
     d2   | user4  | 4000.0 | 0.75 |    4 |    5 | 0.0 
     d2   | user5  | 5000.0 |  1.0 |    5 |    5 | 1.0 
     d1   | user1  | 1000.0 |  0.0 |    1 |    5 | 0.0 
     d1   | user2  | 2000.0 | 0.25 |    2 |    5 | 0.5 
     d1   | user3  | 3000.0 |  0.5 |    3 |    5 | 1.0 
    (5 rows)
  • rank()→ bigint

    Description: Ranking of the returned value in a group of values. The level is 1 plus the number of rows that are not equal to the current row. Therefore, the average value in the sort will create a gap in the sequence. Ranks each window partition.

    SELECT 
    cookieid,
    createtime,
    pv,
    RANK() OVER(PARTITION BY cookieid ORDER BY pv desc) AS rn1,
    DENSE_RANK() OVER(PARTITION BY cookieid ORDER BY pv desc) AS rn2,
    ROW_NUMBER() OVER(PARTITION BY cookieid ORDER BY pv DESC) AS rn3 
    FROM cookies_log
    WHERE cookieid = 'cookie1';
     cookieid | createtime | pv | rn1 | rn2 | rn3 
    ----------|------------|----|-----|-----|-----
     cookie1  | 2020-07-12 |  7 |   1 |   1 |   1 
     cookie1  | 2020-07-11 |  5 |   2 |   2 |   2 
     cookie1  | 2020-07-15 |  4 |   3 |   3 |   3 
     cookie1  | 2020-07-16 |  4 |   3 |   3 |   4 
     cookie1  | 2020-07-13 |  3 |   5 |   4 |   5 
     cookie1  | 2020-07-14 |  2 |   6 |   5 |   6 
     cookie1  | 2020-07-10 |  1 |   7 |   6 |   7 
    (7 rows)
  • row_number()→ bigint

    Description: Starting from 1, the sequence of records in a group is generated in sequence. For example, generate the daily pv ranking in descending order. There are many application scenarios for ROW_NUMBER(). For another example, obtain the record that ranks first in a group, or obtain the first refer in a session.

    SELECT cookieid, createtime, pv, ROW_NUMBER() OVER(PARTITION BY cookieid ORDER BY pv desc) AS rn from cookies_log;
      cookieid | createtime | pv | rn 
     ----------|------------- |----|----
     cookie2  | 2020-07-15 |  9 |  1 
     cookie2  | 2020-07-16 |  7 |  2 
     cookie2  | 2020-07-13 |  6 |  3 
     cookie2  | 2020-07-12 |  5 |  4 
     cookie2  | 2020-07-14 |  3 |  5 
     cookie2  | 2020-07-11 |  3 |  6 
     cookie2  | 2020-07-10 |  2 |  7 
     cookie1  | 2020-07-12 |  7 |  1 
     cookie1  | 2020-07-11 |  5 |  2 
     cookie1  | 2020-07-15 |  4 |  3 
     cookie1  | 2020-07-16 |  4 |  4 
     cookie1  | 2020-07-13 |  3 |  5 
     cookie1  | 2020-07-14 |  2 |  6 
     cookie1  | 2020-07-10 |  1 |  7 
    (14 rows)

Value Functions

Generally, the null value must be considered. If IGNORE NULLS is specified, all rows containing x whose value is null will be excluded. If the value of x in all rows is null, the default value is returned. Otherwise, null is returned.

Data Preparation
create table cookie_views( cookieid varchar,createtime timestamp,url varchar);
insert into cookie_views values
('cookie1',timestamp '2020-07-10 10:00:02','url20'),
('cookie1',timestamp '2020-07-10 10:00:00','url10'),
('cookie1',timestamp '2020-07-10 10:03:04','urll3'),
('cookie1',timestamp '2020-07-10 10:50:05','url60'),
('cookie1',timestamp '2020-07-10 11:00:00','url70'),
('cookie1',timestamp '2020-07-10 10:10:00','url40'),
('cookie1',timestamp '2020-07-10 10:50:01','url50'),
('cookie2',timestamp '2020-07-10 10:00:02','url23'),
('cookie2',timestamp '2020-07-10 10:00:00','url11'),
('cookie2',timestamp '2020-07-10 10:03:04','url33'),
('cookie2',timestamp '2020-07-10 10:50:05','url66'),
('cookie2',timestamp '2020-07-10 11:00:00','url77'),
('cookie2',timestamp '2020-07-10 10:10:00','url47'),
('cookie2',timestamp '2020-07-10 10:50:01','url55');
  • first_value(x)→ [same as input]

    Description: Returns the first value of the window.

    SELECT cookieid,
    createtime,
    url,
    ROW_NUMBER() OVER(PARTITION BY cookieid ORDER BY createtime) AS rn,
    FIRST_VALUE(url) OVER(PARTITION BY cookieid ORDER BY createtime) AS first1 
    FROM cookie_views;
    cookieid |       createtime        |  url  | rn | first1 
    ----------|-------------------------|-------|----|--------
     cookie1  | 2020-07-10 10:00:00.000 | url10 |  1 | url10  
     cookie1  | 2020-07-10 10:00:02.000 | url20 |  2 | url10  
     cookie1  | 2020-07-10 10:03:04.000 | urll3 |  3 | url10  
     cookie1  | 2020-07-10 10:10:00.000 | url40 |  4 | url10  
     cookie1  | 2020-07-10 10:50:01.000 | url50 |  5 | url10  
     cookie1  | 2020-07-10 10:50:05.000 | url60 |  6 | url10  
     cookie1  | 2020-07-10 11:00:00.000 | url70 |  7 | url10  
     cookie2  | 2020-07-10 10:00:00.000 | url11 |  1 | url11  
     cookie2  | 2020-07-10 10:00:02.000 | url23 |  2 | url11  
     cookie2  | 2020-07-10 10:03:04.000 | url33 |  3 | url11  
     cookie2  | 2020-07-10 10:10:00.000 | url47 |  4 | url11  
     cookie2  | 2020-07-10 10:50:01.000 | url55 |  5 | url11  
     cookie2  | 2020-07-10 10:50:05.000 | url66 |  6 | url11  
     cookie2  | 2020-07-10 11:00:00.000 | url77 |  7 | url11  
    (14 rows)
  • last_value(x)→ [same as input]

    Description: Returns the last value of the window.

    SELECT cookieid,createtime,url,
    ROW_NUMBER() OVER(PARTITION BY cookieid ORDER BY createtime) AS rn,
    LAST_VALUE(url) OVER(PARTITION BY cookieid ORDER BY createtime) AS last1 
    FROM cookie_views;
     cookieid |       createtime        |  url  | rn | last1 
    ----------|-------------------------|-------|----|-------
     cookie2  | 2020-07-10 10:00:00.000 | url11 |  1 | url11 
     cookie2  | 2020-07-10 10:00:02.000 | url23 |  2 | url23 
     cookie2  | 2020-07-10 10:03:04.000 | url33 |  3 | url33 
     cookie2  | 2020-07-10 10:10:00.000 | url47 |  4 | url47 
     cookie2  | 2020-07-10 10:50:01.000 | url55 |  5 | url55 
     cookie2  | 2020-07-10 10:50:05.000 | url66 |  6 | url66 
     cookie2  | 2020-07-10 11:00:00.000 | url77 |  7 | url77 
     cookie1  | 2020-07-10 10:00:00.000 | url10 |  1 | url10 
     cookie1  | 2020-07-10 10:00:02.000 | url20 |  2 | url20 
     cookie1  | 2020-07-10 10:03:04.000 | urll3 |  3 | urll3 
     cookie1  | 2020-07-10 10:10:00.000 | url40 |  4 | url40 
     cookie1  | 2020-07-10 10:50:01.000 | url50 |  5 | url50 
     cookie1  | 2020-07-10 10:50:05.000 | url60 |  6 | url60 
     cookie1  | 2020-07-10 11:00:00.000 | url70 |  7 | url70 
    (14 rows)
  • nth_value(x, offset)→ [same as input]

    Description: Returns the value of the specified offset from the beginning of the window. The offset starts from 1. The offset can be any scalar expression. If the offset is null or greater than the number of values in the window, null is returned. The offset cannot be 0 or a negative number.

    SELECT cookieid,createtime,url,
    ROW_NUMBER() OVER(PARTITION BY cookieid ORDER BY createtime) AS rn,
    NTH_VALUE(url,3) OVER(PARTITION BY cookieid ORDER BY createtime) AS last1 
    FROM cookie_views;
    cookieid |       createtime        |  url  | rn | last1 
    ----------|-------------------------|-------|----|-------
     cookie1  | 2020-07-10 10:00:00.000 | url10 |  1 | NULL  
     cookie1  | 2020-07-10 10:00:02.000 | url20 |  2 | NULL  
     cookie1  | 2020-07-10 10:03:04.000 | urll3 |  3 | urll3 
     cookie1  | 2020-07-10 10:10:00.000 | url40 |  4 | urll3 
     cookie1  | 2020-07-10 10:50:01.000 | url50 |  5 | urll3 
     cookie1  | 2020-07-10 10:50:05.000 | url60 |  6 | urll3 
     cookie1  | 2020-07-10 11:00:00.000 | url70 |  7 | urll3 
     cookie2  | 2020-07-10 10:00:00.000 | url11 |  1 | NULL  
     cookie2  | 2020-07-10 10:00:02.000 | url23 |  2 | NULL  
     cookie2  | 2020-07-10 10:03:04.000 | url33 |  3 | url33 
     cookie2  | 2020-07-10 10:10:00.000 | url47 |  4 | url33 
     cookie2  | 2020-07-10 10:50:01.000 | url55 |  5 | url33 
     cookie2  | 2020-07-10 10:50:05.000 | url66 |  6 | url33 
     cookie2  | 2020-07-10 11:00:00.000 | url77 |  7 | url33 
    (14 rows)
  • lead(x[, offset[, default_value]])→ [same as input]

    Description: Returns the value of the offset row after the current row in the window partition. The offset starts from 0, that is, the current line. The offset can be any scalar expression. The default offset is 1. If the offset is null, null is returned. If the offset points to a row that is not in the partition, default_value is returned. If it is not specified, null is returned. The lead() function requires that the window sequence be specified. Do not specify a window frame.

    SELECT cookieid,createtime,url,
    ROW_NUMBER() OVER(PARTITION BY cookieid ORDER BY createtime) AS rn,
    LEAD(createtime,1,timestamp '2020-01-01 00:00:00') OVER(PARTITION BY cookieid ORDER BY createtime) AS next_1_time,
    LEAD(createtime,2) OVER(PARTITION BY cookieid ORDER BY createtime) AS next_2_time 
    FROM cookie_views;
     cookieid |       createtime        |  url  | rn |       next_1_time       |       next_2_time       
    ----------|-------------------------|-------|----|-------------------------|-------------------------
     cookie2  | 2020-07-10 10:00:00.000 | url11 |  1 | 2020-07-10 10:00:02.000 | 2020-07-10 10:03:04.000 
     cookie2  | 2020-07-10 10:00:02.000 | url23 |  2 | 2020-07-10 10:03:04.000 | 2020-07-10 10:10:00.000 
     cookie2  | 2020-07-10 10:03:04.000 | url33 |  3 | 2020-07-10 10:10:00.000 | 2020-07-10 10:50:01.000 
     cookie2  | 2020-07-10 10:10:00.000 | url47 |  4 | 2020-07-10 10:50:01.000 | 2020-07-10 10:50:05.000 
     cookie2  | 2020-07-10 10:50:01.000 | url55 |  5 | 2020-07-10 10:50:05.000 | 2020-07-10 11:00:00.000 
     cookie2  | 2020-07-10 10:50:05.000 | url66 |  6 | 2020-07-10 11:00:00.000 | NULL                    
     cookie2  | 2020-07-10 11:00:00.000 | url77 |  7 | 2020-01-01 00:00:00.000 | NULL                    
     cookie1  | 2020-07-10 10:00:00.000 | url10 |  1 | 2020-07-10 10:00:02.000 | 2020-07-10 10:03:04.000 
     cookie1  | 2020-07-10 10:00:02.000 | url20 |  2 | 2020-07-10 10:03:04.000 | 2020-07-10 10:10:00.000 
     cookie1  | 2020-07-10 10:03:04.000 | urll3 |  3 | 2020-07-10 10:10:00.000 | 2020-07-10 10:50:01.000 
     cookie1  | 2020-07-10 10:10:00.000 | url40 |  4 | 2020-07-10 10:50:01.000 | 2020-07-10 10:50:05.000 
     cookie1  | 2020-07-10 10:50:01.000 | url50 |  5 | 2020-07-10 10:50:05.000 | 2020-07-10 11:00:00.000 
     cookie1  | 2020-07-10 10:50:05.000 | url60 |  6 | 2020-07-10 11:00:00.000 | NULL                    
     cookie1  | 2020-07-10 11:00:00.000 | url70 |  7 | 2020-01-01 00:00:00.000 | NULL                    
    (14 rows)
  • lag(x[, offset[, default_value]])→ [same as input]
    Description: Returns the value of the offset row before the current row in the window partition. The offset starts from 0, that is, the current row. The offset can be any scalar expression. The default offset is 1. If the offset is null, null is returned. If the offset points to a row that is not in the partition, default_value is returned. If this parameter is not specified, null is returned. The lag() function requires that the window sequence be specified and the window frame cannot be specified.
     SELECT cookieid, createtime, url, ROW_NUMBER() OVER(PARTITION BY cookieid  ORDER BY createtime) AS rn, 
       LAG(createtime,1, timestamp '2020-01-01 00:00:00') OVER(PARTITION BY cookieid ORDER BY createtime)  AS last_1_time, 
       LAG(createtime,2) OVER(PARTITION BY cookieid ORDER BY createtime)  AS last_2_time 
        FROM cookie_views;
    
    cookieid |       createtime         |  url  | rn |       last_1_time       |       last_2_time       
    ----------|-------------------------|-------|----|-------------------------|-----------------------
     cookie2  | 2020-07-10 10:00:00.000 | url11 |  1 | 2020-01-01 00:00:00.000 | NULL                    
     cookie2  | 2020-07-10 10:00:02.000 | url23 |  2 | 2020-07-10 10:00:00.000 | NULL                    
     cookie2  | 2020-07-10 10:03:04.000 | url33 |  3 | 2020-07-10 10:00:02.000 | 2020-07-10 10:00:00.000 
     cookie2  | 2020-07-10 10:10:00.000 | url47 |  4 | 2020-07-10 10:03:04.000 | 2020-07-10 10:00:02.000 
     cookie2  | 2020-07-10 10:50:01.000 | url55 |  5 | 2020-07-10 10:10:00.000 | 2020-07-10 10:03:04.000 
     cookie2  | 2020-07-10 10:50:05.000 | url66 |  6 | 2020-07-10 10:50:01.000 | 2020-07-10 10:10:00.000 
     cookie2  | 2020-07-10 11:00:00.000 | url77 |  7 | 2020-07-10 10:50:05.000 | 2020-07-10 10:50:01.000 
     cookie1  | 2020-07-10 10:00:00.000 | url10 |  1 | 2020-01-01 00:00:00.000 | NULL                    
     cookie1  | 2020-07-10 10:00:02.000 | url20 |  2 | 2020-07-10 10:00:00.000 | NULL                    
     cookie1  | 2020-07-10 10:03:04.000 | urll3 |  3 | 2020-07-10 10:00:02.000 | 2020-07-10 10:00:00.000 
     cookie1  | 2020-07-10 10:10:00.000 | url40 |  4 | 2020-07-10 10:03:04.000 | 2020-07-10 10:00:02.000 
     cookie1  | 2020-07-10 10:50:01.000 | url50 |  5 | 2020-07-10 10:10:00.000 | 2020-07-10 10:03:04.000 
     cookie1  | 2020-07-10 10:50:05.000 | url60 |  6 | 2020-07-10 10:50:01.000 | 2020-07-10 10:10:00.000 
     cookie1  | 2020-07-10 11:00:00.000 | url70 |  7 | 2020-07-10 10:50:05.000 | 2020-07-10 10:50:01.000 
    (14 rows)