Supporting Late Data in Flink SQL Window Functions
Window functions are added to Flink SQL to support late data processing. Currently, late data is supported in the TUMBLE, HOP, OVER, and CUMULATE window functions. An example is as follows:
CREATE TABLE T1 ( `int` INT, `double` DOUBLE, `float` FLOAT, `bigdec` DECIMAL(10, 2), `string` STRING, `name` STRING, `rowtime` TIMESTAMP(3), WATERMARK for `rowtime` AS `rowtime` - INTERVAL '1' SECOND ) WITH ( 'connector' = 'values', ); -- The fields of the sink must be the same as the input data of the window, but the sequence can be different. CREATE TABLE LD_SINK( `float` FLOAT, `string` STRING, `name` STRING, `rowtime` TIMESTAMP(3) ) WITH ( 'connector' = 'print', ); SELECT /*+ LATE_DATA_SINK('sink.name'='LD_SINK') */ `name`, MIN(`float`), COUNT(DISTINCT `string`) FROM TABLE( TUMBLE(TABLE T1, DESCRIPTOR(rowtime), INTERVAL '5' SECOND)) GROUP BY `name`, window_start, window_end
This feature also supports the output of the start time and end time of the current window when the window receives late data. The time can be output by adding window.start.field and window.end.field to the hint. The field type must be timestamp. An example is as follows:
CREATE TABLE LD_SINK( `float` FLOAT, `string` STRING, `name` STRING, `rowtime` TIMESTAMP(3), `windowStart` TIMESTAMP(3), `windowEnd` TIMESTAMP(3) ) WITH ( 'connector' = 'print', ); SELECT /*+ LATE_DATA_SINK('sink.name'='LD_SINK', 'window.start.field'='windowStart', 'window.end.field'='windowEnd') */ `name`, MIN(`float`), COUNT(DISTINCT `string`) FROM TABLE( TUMBLE(TABLE T1, DESCRIPTOR(rowtime), INTERVAL '5' SECOND)) GROUP BY `name`, window_start, window_end
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot