Java
To avoid API compatibility or reliability issues after updates to the open-source Flink, recommended that APIs of the matching version are recommended.
Common APIs of Flink
Flink mainly uses the following APIs:
- StreamExecutionEnvironment: provides the execution environment, which is the basis of Flink stream processing.
- DataStream: represents streaming data. DataStream can be regarded as unmodifiable collection that contains duplicate data. The number of elements in DataStream are unlimited.
- KeyedStream: generates the stream by performing keyBy grouping operations on DataStream. Data is grouped based on the specified key value.
- WindowedStream: generates the stream by performing the window function on KeyedStream, sets the window type, and defines window triggering conditions.
- AllWindowedStream: generates streams by performing the window function on DataStream, sets the window type, defines window triggering conditions, and performs operations on the window data.
- ConnectedStreams: generates streams by connecting the two DataStreams and retaining the original data type, and performs the map or flatMap operation.
- JoinedStreams: performs equijoin (which is performed when two values are equal, for example, a.id = b.id) operation to data in the window. The join operation is a special scenario of coGroup operation.
- CoGroupedStreams: performs the coGroup operation on data in the window. CoGroupedStreams can perform various types of join operations.
Data Stream Source
API |
Description |
---|---|
public final <OUT> DataStreamSource<OUT> fromElements(OUT... data) |
Obtain user-defined data of multiple elements as the data stream source.
|
public final <OUT> DataStreamSource<OUT> fromElements(Class<OUT> type, OUT... data) |
|
public <OUT> DataStreamSource<OUT> fromCollection(Collection<OUT> data) |
Obtain the user-defined data collection as the data stream source.
|
public <OUT> DataStreamSource<OUT> fromCollection(Collection<OUT> data, TypeInformation<OUT> typeInfo) |
|
public <OUT> DataStreamSource<OUT> fromCollection(Iterator<OUT> data, Class<OUT> type) |
|
public <OUT> DataStreamSource<OUT> fromCollection(Iterator<OUT> data, TypeInformation<OUT> typeInfo) |
|
public <OUT> DataStreamSource<OUT> fromParallelCollection(SplittableIterator<OUT> iterator, Class<OUT> type) |
Obtain the user-defined data collection as parallel data stream source.
|
public <OUT> DataStreamSource<OUT> fromParallelCollection(SplittableIterator<OUT> iterator, TypeInformation<OUT> typeInfo) |
|
private <OUT> DataStreamSource<OUT> fromParallelCollection(SplittableIterator<OUT> iterator, TypeInformation<OUT> typeInfo, String operatorName) |
|
public DataStreamSource<Long> generate Sequence(long from, long to) |
Obtain a sequence of user-defined data as the data stream source.
|
public DataStreamSource<String> readTextFile(String filePath) |
Obtain the user-defined text file from a specific path as the data stream source.
|
public DataStreamSource<String> readTextFile(String filePath, String charsetName) |
|
public <OUT> DataStreamSource<OUT> readFile(FileInputformat<OUT> inputformat, String filePath) |
Obtain the user-defined file from a specific path as the data stream source.
|
public <OUT> DataStreamSource<OUT> readFile(FileInputformat<OUT> inputformat, String filePath, FileProcessingMode watchType, long interval) |
|
public <OUT> DataStreamSource<OUT> readFile(FileInputformat<OUT> inputformat, String filePath, FileProcessingMode watchType, long interval, TypeInformation<OUT> typeInformation) |
|
public DataStreamSource<String> socketTextStream(String hostname, int port, String delimiter, long maxRetry) |
Obtain user-defined socket data as the data stream source.
|
public DataStreamSource<String> socketTextStream(String hostname, int port, String delimiter) |
|
public DataStreamSource<String> socketTextStream(String hostname, int port) |
|
public <OUT> DataStreamSource<OUT> addSource(SourceFunction<OUT> function) |
Customize the SourceFunction and addSource methods to add data sources such as Kafka. The implementation method is the run method of SourceFunction.
|
public <OUT> DataStreamSource<OUT> addSource(SourceFunction<OUT> function, String sourceName) |
|
public <OUT> DataStreamSource<OUT> addSource(SourceFunction<OUT> function, TypeInformation<OUT> typeInfo) |
|
public <OUT> DataStreamSource<OUT> addSource(SourceFunction<OUT> function, String sourceName, TypeInformation<OUT> typeInfo) |
Data Output
API |
Description |
---|---|
public DataStreamSink<T> print() |
Print data as the standard output stream. |
public DataStreamSink<T> printToErr() |
Print data as the standard error output stream. |
public DataStreamSink<T> writeAsText(String path) |
Write data to a specific text file.
|
public DataStreamSink<T> writeAsText(String path, WriteMode writeMode) |
|
public DataStreamSink<T> writeAsCsv(String path) |
Writhe data to a specific .csv file.
|
public DataStreamSink<T> writeAsCsv(String path, WriteMode writeMode) |
|
public <X extends Tuple> DataStreamSink<T> writeAsCsv(String path, WriteMode writeMode, String rowDelimiter, String fieldDelimiter) |
|
public DataStreamSink<T> writeToSocket(String hostName, int port, SerializationSchema<T> schema) |
Write data to the socket connection.
|
public DataStreamSink<T> writeUsingOutputformat(Outputformat<T> format) |
Write data to a file, for example, a binary file. |
public DataStreamSink<T> addSink(SinkFunction<T> sinkFunction) |
Export data in a user-defined manner. The flink-connectors allows the addSink method to support exporting data to Kafka. The implementation method is the invoke method of SinkFunction. |
Filtering and Mapping
API |
Description |
---|---|
public <R> SingleOutputStreamOperator<R> map(MapFunction<T, R> mapper) |
Transform an element into another element of the same type. |
public <R> SingleOutputStreamOperator<R> flatMap(FlatMapFunction<T, R> flatMapper) |
Transform an element into zero, one, or multiple elements. |
public SingleOutputStreamOperator<T> filter(FilterFunction<T> filter) |
Run a Boolean function on each element and retain elements that return true. |
Aggregation
API |
Description |
---|---|
public KeyedStream<T, Tuple> keyBy(int... fields) |
Logically divide a stream into multiple partitions, each containing elements with the same key. The partitioning is internally implemented using hash partition. A KeyedStream is returned. Return the KeyedStream after the KeyBy operation, and call the KeyedStream function, such as reduce, fold, min, minby, max, maxby, sum, and sumby.
|
public KeyedStream<T, Tuple> keyBy(String... fields) |
|
public <K> KeyedStream<T, K> keyBy(KeySelector<T, K> key) |
|
public SingleOutputStreamOperator<T> reduce(ReduceFunction<T> reducer) |
Perform reduce on KeyedStream in a rolling manner. Aggregate the current element and the last reduced value into a new value. The types of the three values are the same. |
public <R> SingleOutputStreamOperator<R> fold(R initialValue, FoldFunction<T, R> folder) |
Perform fold operation on KeyedStream based on an initial value in a rolling manner. Aggregate the current element and the last folded value into a new value. The input and returned values of Fold can be different. |
public SingleOutputStreamOperator<T> sum(int positionToSum) |
Calculate the sum in a KeyedStream in a rolling manner. positionToSum and field indicate calculating the sum of a specific column. |
public SingleOutputStreamOperator<T> sum(String field) |
|
public SingleOutputStreamOperator<T> min(int positionToMin) |
Calculate the minimum value in a KeyedStream in a rolling manner. min returns the minimum value, without guarantee of the correctness of columns of non-minimum values. positionToMin and field indicate calculating the minimum value of a specific column. |
public SingleOutputStreamOperator<T> min(String field) |
|
public SingleOutputStreamOperator<T> max(int positionToMax) |
Calculate the maximum value in KeyedStream in a rolling manner. max returns the maximum value, without guarantee of the correctness of columns of non-maximum values. positionToMax and field indicate calculating the maximum value of a specific column. |
public SingleOutputStreamOperator<T> max(String field) |
|
public SingleOutputStreamOperator<T> minBy(int positionToMinBy) |
Obtain the row where the minimum value of a column locates in a KeyedStream. minBy returns all elements of that row.
|
public SingleOutputStreamOperator<T> minBy(String positionToMinBy) |
|
public SingleOutputStreamOperator<T> minBy(int positionToMinBy, boolean first) |
|
public SingleOutputStreamOperator<T> minBy(String field, boolean first) |
|
public SingleOutputStreamOperator<T> maxBy(int positionToMaxBy) |
Obtain the row where the maximum value of a column locates in a KeyedStream. maxBy returns all elements of that row.
|
public SingleOutputStreamOperator<T> maxBy(String positionToMaxBy) |
|
public SingleOutputStreamOperator<T> maxBy(int positionToMaxBy, boolean first) |
|
public SingleOutputStreamOperator<T> maxBy(String field, boolean first) |
DataStream Distribution
API |
Description |
---|---|
public <K> DataStream<T> partitionCustom(Partitioner<K> partitioner, int field) |
Use a user-defined partitioner to select target task for each element.
|
public <K> DataStream<T> partitionCustom(Partitioner<K> partitioner, String field) |
|
public <K> DataStream<T> partitionCustom(Partitioner<K> partitioner, KeySelector<T, K> keySelector) |
|
public DataStream<T> shuffle() |
Randomly and evenly partition elements. |
public DataStream<T> rebalance() |
Partition elements in round-robin manner, ensuring load balance of each partition. This partitioning is helpful to data with skewness. |
public DataStream<T> rescale() |
Distribute elements into downstream subset in round-robin manner. |
public DataStream<T> broadcast() |
Broadcast each element to all partitions. |
Project Capabilities
API |
Description |
---|---|
public <R extends Tuple> SingleOutputStreamOperator<R> project(int... fieldIndexes) |
Select some field subset from the tuple. fieldIndexes indicates some sequences of the tuple.
NOTE:
Only tuple data type is supported by the project API. |
Configuring the eventtime Attribute
API |
Description |
---|---|
public SingleOutputStreamOperator<T> assignTimestampsAndWatermarks(AssignerWithPeriodicWatermarks<T> timestampAndWatermarkAssigner) |
Extract timestamp from records, enabling event time window to trigger computing. |
public SingleOutputStreamOperator<T> assignTimestampsAndWatermarks(AssignerWithPunctuatedWatermarks<T> timestampAndWatermarkAssigner) |
Table 8 lists differences of AssignerWithPeriodicWatermarks and AssignerWithPunctuatedWatermarks APIs.
Parameter |
Description |
---|---|
AssignerWithPeriodicWatermarks |
Generate Watermark based on the getConfig().setAutoWatermarkInterval(200L) timestamp of StreamExecutionEnvironment class. |
AssignerWithPunctuatedWatermarks |
Generate a Watermark each time an element is received. Watermarks can be different based on received elements. |
Iteration
API |
Description |
---|---|
public IterativeStream<T> iterate() |
In the flow, create in a feedback loop to redirect the output of an operator to a preceding operator.
NOTE:
|
public IterativeStream<T> iterate(long maxWaitTimeMillis) |
Stream Splitting
API |
Description |
---|---|
public SplitStream<T> split(OutputSelector<T> outputSelector) |
Use OutputSelector to rewrite select method, specify stream splitting basis (by tagging), and construct SplitStream streams. That is, a string is tagged for each element, so that a stream of a specific tag can be selected or created. |
public DataStream<OUT> select(String... outputNames) |
Select one or multiple streams from a SplitStream. outputNames indicates the sequence of string tags created for each element using the split method. |
Window
Windows can be classified as tumbling windows and sliding windows.
- APIs such as Window, TimeWindow, CountWindow, WindowAll, TimeWindowAll, and CountWindowAll can be used to generate windows.
- APIs such as Window Apply, Window Reduce, Window Fold, and Aggregations on windows can be used to operate windows.
- Multiple Window Assigners are supported, including TumblingEventTimeWindows, TumblingProcessingTimeWindows, SlidingEventTimeWindows, SlidingProcessingTimeWindows, EventTimeSessionWindows, ProcessingTimeSessionWindows, and GlobalWindows.
- ProcessingTime, EventTime, and IngestionTime are supported.
- Timestamp modes EventTime: AssignerWithPeriodicWatermarks and AssignerWithPunctuatedWatermarks are supported.
Table 11 lists APIs for generating windows.
API |
Description |
---|---|
public <W extends Window> WindowedStream<T, KEY, W> window(WindowAssigner<? super T, W> assigner) |
Define windows in partitioned KeyedStreams. A window groups each key according to some characteristics (for example, data received within the latest 5 seconds. |
public <W extends Window> AllWindowedStream<T, W> windowAll(WindowAssigner<? super T, W> assigner) |
Define windows in DataStreams. |
public WindowedStream<T, KEY, TimeWindow> timeWindow(Time size) |
Define time windows in partitioned KeyedStreams, select ProcessingTime or EventTime based on the environment.getStreamTimeCharacteristic() parameter, and determine whether the window is TumblingWindow or SlidingWindow depends on the number of parameters.
NOTE:
|
public WindowedStream<T, KEY, TimeWindow> timeWindow(Time size, Time slide) |
|
public AllWindowedStream<T, TimeWindow> timeWindowAll(Time size) |
Define time windows in partitioned DataStream, select ProcessingTime or EventTime based on the environment.getStreamTimeCharacteristic() parameter, and determine whether the window is TumblingWindow or SlidingWindow depends on the number of parameters.
|
public AllWindowedStream<T, TimeWindow> timeWindowAll(Time size, Time slide) |
|
public WindowedStream<T, KEY, GlobalWindow> countWindow(long size) |
Divide windows according to the number of elements and define windows in partitioned KeyedStreams.
NOTE:
|
public WindowedStream<T, KEY, GlobalWindow> countWindow(long size, long slide) |
|
public AllWindowedStream<T, GlobalWindow> countWindowAll(Time size) |
Divide windows according to the number of elements and define windows in DataStreams.
|
public AllWindowedStream<T, GlobalWindow> countWindowAll(Time size, Time slide) |
Table 12 lists APIs for operating windows.
Method |
API |
Description |
---|---|---|
Window |
public <R> SingleOutputStreamOperator<R> apply(WindowFunction<T, R, K, W> function) |
Apply a general function to a window. The data in the window is calculated as a whole.
|
public <R> SingleOutputStreamOperator<R> apply(WindowFunction<T, R, K, W> function, TypeInformation<R> resultType) |
||
public SingleOutputStreamOperator<T> reduce(ReduceFunction<T> function) |
Apply a reduce function to the window and return the result.
|
|
public <R> SingleOutputStreamOperator<R> reduce(ReduceFunction<T> reduceFunction, WindowFunction<T, R, K, W> function) |
||
public <R> SingleOutputStreamOperator<R> reduce( ReduceFunction<T> reduceFunction, WindowFunction<T, R, K, W> function, TypeInformation<R> resultType) |
||
public <R> SingleOutputStreamOperator<R> fold(R initialValue, FoldFunction<T, R> function) |
Apply a fold function to the window and return the result.
|
|
public <R> SingleOutputStreamOperator<R> fold(R initialValue, FoldFunction<T, R> function, TypeInformation<R> resultType) |
||
public <ACC, R> SingleOutputStreamOperator<R> fold(ACC initialValue, FoldFunction<T, ACC> foldFunction, WindowFunction<ACC, R, K, W> function) |
||
public <ACC, R> SingleOutputStreamOperator<R> fold(ACC initialValue, FoldFunction<T, ACC> foldFunction, WindowFunction<ACC, R, K, W> function, TypeInformation<ACC> foldAccumulatorType, TypeInformation<R> resultType) |
||
WindowAll |
public <R> SingleOutputStreamOperator<R> apply(AllWindowFunction<T, R, W> function) |
Apply a general function to a window. The data in the window is calculated as a whole. |
public <R> SingleOutputStreamOperator<R> apply(AllWindowFunction<T, R, W> function, TypeInformation<R> resultType) |
||
public SingleOutputStreamOperator<T> reduce(ReduceFunction<T> function) |
Apply a reduce function to the window and return the result.
|
|
public <R> SingleOutputStreamOperator<R> reduce(ReduceFunction<T> reduceFunction, AllWindowFunction<T, R, W> function) |
||
public <R> SingleOutputStreamOperator<R> reduce(ReduceFunction<T> reduceFunction, AllWindowFunction<T, R, W> function, TypeInformation<R> resultType) |
||
public <R> SingleOutputStreamOperator<R> fold(R initialValue, FoldFunction<T, R> function) |
Apply a fold function to the window and return the result.
|
|
public <R> SingleOutputStreamOperator<R> fold(R initialValue, FoldFunction<T, R> function, TypeInformation<R> resultType) |
||
public <ACC, R> SingleOutputStreamOperator<R> fold(ACC initialValue, FoldFunction<T, ACC> foldFunction, AllWindowFunction<ACC, R, W> function) |
||
public <ACC, R> SingleOutputStreamOperator<R> fold(ACC initialValue, FoldFunction<T, ACC> foldFunction, AllWindowFunction<ACC, R, W> function, TypeInformation<ACC> foldAccumulatorType, TypeInformation<R> resultType) |
||
Window and WindowAll |
public SingleOutputStreamOperator<T> sum(int positionToSum) |
Sum a specified column of the window data. field and positionToSum indicate a specific column of the data. |
public SingleOutputStreamOperator<T> sum(String field) |
||
public SingleOutputStreamOperator<T> min(int positionToMin) |
Calculate the minimum value of a specified column of the window data. min returns the minimum value, without guarantee of the correctness of columns of non-minimum values. positionToMin and field indicate calculating the minimum value of a specific column. |
|
public SingleOutputStreamOperator<T> min(String field) |
||
public SingleOutputStreamOperator<T> minBy(int positionToMinBy) |
Obtain the row where the minimum value of a column locates in the window data. minBy returns all elements of that row.
|
|
public SingleOutputStreamOperator<T> minBy(String positionToMinBy) |
||
public SingleOutputStreamOperator<T> minBy(int positionToMinBy, boolean first) |
||
public SingleOutputStreamOperator<T> minBy(String field, boolean first) |
||
public SingleOutputStreamOperator<T> max(int positionToMax) |
Calculate the maximum value of a specified column of the window data. max returns the maximum value, without guarantee of the correctness of columns of non-maximum values. positionToMax and field indicate calculating the maximum value of a specific column. |
|
public SingleOutputStreamOperator<T> max(String field) |
||
The default public SingleOutputStreamOperator<T> maxBy(int positionToMaxBy)//true |
Obtain the row where the maximum value of a column locates in the window data. maxBy returns all elements of that row.
|
|
The default public SingleOutputStreamOperator<T> maxBy(String positionToMaxBy)//true |
||
public SingleOutputStreamOperator<T> maxBy(int positionToMaxBy, boolean first) |
||
public SingleOutputStreamOperator<T> maxBy(String field, boolean first) |
Combining Multiple DataStreams
API |
Description |
---|---|
public final DataStream<T> union(DataStream<T>... streams) |
Perform union operation on multiple DataStreams to generate a new data stream containing all data from original DataStreams.
NOTE:
If you perform union operation on a piece of data with itself, there are two copies of the same data. |
public <R> ConnectedStreams<T, R> connect(DataStream<R> dataStream) |
Connect two DataStreams and retain the original type. The connect API method allows two DataStreams to share statuses. After ConnectedStreams is generated, map or flatMap operation can be called. |
public <R> SingleOutputStreamOperator<R> map(CoMapFunction<IN1, IN2, R> coMapper) |
Perform mapping operation, which is similar to map and flatMap operation in a ConnectedStreams, on elements. After the operation, the type of the new DataStreams is string. |
public <R> SingleOutputStreamOperator<R> flatMap(CoFlatMapFunction<IN1, IN2, R> coFlatMapper) |
Perform mapping operation, which is similar to flatMap operation in DataStream, on elements. |
Join Operation
API |
Description |
---|---|
public <T2> JoinedStreams<T, T2> join(DataStream<T2> otherStream) |
Join two DataStreams using a given key in a specified window. |
public <T2> CoGroupedStreams<T, T2> coGroup(DataStream<T2> otherStream) |
Co-group two DataStreams using a given key in a specified window. |
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot