Help Center/
SecMaster/
User Guide/
Threat Operations/
Query and Analysis Syntax - V2/
SecMaster SQL Syntax Reference/
DML Snytax/
Deduplication
Updated on 2024-11-11 GMT+08:00
Deduplication
Function
Deduplication removes rows that duplicate over a set of columns, keeping only the first one or the last one.
Syntax Format
SELECT [column_list] FROM ( SELECT [column_list], ROW_NUMBER() OVER ([PARTITION BY col1[, col2...]] ORDER BY time_attr [asc|desc]) AS rownum FROM table_name) WHERE rownum = 1
Syntax Description
- ROW_NUMBER(): Assigns a unique, sequential number to each row, starting with one.
- PARTITION BY col1[, col2...]: specifies the column of the partition, for example, the key for deduplication.
- ORDER BY time_attr [asc|desc]: Specifies the ordering column, it must be a time attribute. Currently Flink supports proctime only. Ordering by ASC means to keep the first row, ordering by DESC means to keep the last row.
- WHERE rownum = 1: The rownum = 1 is required for Flink to recognize this query is deduplication.
Precautions
None
Example
The following examples show how to remove duplicate rows on order_id. The proctime is an event time attribute.
SELECT order_id, user, product, number FROM ( SELECT *, ROW_NUMBER() OVER (PARTITION BY order_id ORDER BY proctime ASC) as row_num FROM Orders) WHERE row_num = 1;
Parent topic: DML Snytax
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
The system is busy. Please try again later.
For any further questions, feel free to contact us through the chatbot.
Chatbot