Updated on 2025-08-11 GMT+08:00

Top-N

Function

Top-N queries ask for the N smallest or largest values ordered by columns. Both smallest and largest values sets are considered Top-N queries. Top-N queries are useful when you need to display only the N highest or lowest records from a batch/streaming table based on a specific condition.

Syntax Format

1
2
3
4
5
6
7
SELECT [column_list]
FROM (
   SELECT [column_list],
     ROW_NUMBER() OVER ([PARTITION BY col1[, col2...]]
       ORDER BY col1 [asc|desc][, col2 [asc|desc]...]) AS rownum
   FROM table_name)
WHERE rownum <= N [AND conditions]

Syntax Description

  • ROW_NUMBER(): Allocates a unique and consecutive number to each line starting from the first line in the current partition. Currently, ROW_NUMBER can be used in the over window function only. RANK() and DENSE_RANK() will be supported soon.
  • PARTITION BY col1[, col2...]: Specifies the partition columns. Each partition will have a Top-N result.
  • ORDER BY col1 [asc|desc][, col2 [asc|desc]...]: Specifies the ordering columns. The ordering directions can be different for different columns.
  • WHERE rownum <= N: Specifies a Top-N query in Flink. Where N represents the N smallest or largest records that need to be returned.
  • [AND conditions]: You can add other query conditions in the where clause. Note that the other conditions can only be combined with rownum <= N using the AND logic.

Constraints

  • Top N queries return updated results.
  • Flink SQL will sort the input data stream according to the order key.
  • So if the top N records have been changed, the changed ones will be sent as retraction/update records to downstream.
  • If the top N records need to be stored in external storage, the result table should have the same unique key with the Top-N query.

Example

This is an example to get the top five products per category that have the maximum sales in real time.

1
2
3
4
5
6
SELECT * 
  FROM ( 
     SELECT *,
         ROW_NUMBER() OVER (PARTITION BY category ORDER BY sales DESC) as row_num
     FROM ShopSales)
  WHERE row_num <= 5;