Help Center/
    
      
      Data Lake Insight/
      
      
        
        
        SQL Syntax Reference (Paris Region)/
        
        
        Flink Opensource SQL 1.10 Syntax Reference/
        
        
        Data Manipulation Language (DML)/
        
      
      Deduplication
    
  
  
    
        Updated on 2025-01-22 GMT+08:00
        
          
          
        
      
      
      
      
      
      
      
      
  
      
      
      
        
Deduplication
Function
Deduplication removes rows that duplicate over a set of columns, keeping only the first one or the last one.
Syntax
SELECT [column_list]
FROM (
   SELECT [column_list],
     ROW_NUMBER() OVER ([PARTITION BY col1[, col2...]]
       ORDER BY time_attr [asc|desc]) AS rownum
   FROM table_name)
WHERE rownum = 1
 Description
- ROW_NUMBER(): Assigns a unique, sequential number to each row, starting with one.
- PARTITION BY col1[, col2...]: Specifies the partition columns, for example, the deduplicate key.
- ORDER BY time_attr [asc|desc]: Specifies the ordering column, it must be a time attribute. Currently Flink supports proctime only. Ordering by ASC means to keep the first row, ordering by DESC means to keep the last row.
- WHERE rownum = 1: The rownum = 1 is required for Flink to recognize this query is deduplication.
Precautions
None
Example
The following examples show how to remove duplicate rows on order_id. The proctime is an event time attribute.
SELECT order_id, user, product, number
  FROM (
     SELECT *,
         ROW_NUMBER() OVER (PARTITION BY order_id ORDER BY proctime ASC) as row_num
     FROM Orders)
  WHERE row_num = 1;
 
   Parent topic: Data Manipulation Language (DML)
  
 Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
                The system is busy. Please try again later.
                
            
        For any further questions, feel free to contact us through the chatbot.
Chatbot 
    