Flink Application Development Overview
Overview
Flink is a unified computing framework that supports both batch processing and stream processing. It provides a stream data processing engine that supports data distribution and parallel computing. Flink features stream processing and is a top open-source stream processing engine in the industry.
Flink provides high-concurrency pipeline data processing, millisecond-level latency, and high reliability, making it suitable for low-latency data processing.
Figure 1 shows the technology stack of Flink.
The following lists the key features of Flink in the current version:
- DataStream
- Checkpoint
- Window
- Job Pipeline
- Configuration Table
Architecture
Figure 2 shows the architecture of Flink.
As shown in Figure 2, the entire Flink system consists of three parts:
- Client
Flink client is used to submit jobs (streaming jobs) to Flink.
- TaskManager
TaskManager (also called worker) is a service execution node of Flink. It executes specific tasks. A Flink system could have multiple TaskManagers. These TaskManagers are equivalent to each other.
- JobManager
JobManager (also called master) is a management node of Flink. It manages all TaskManagers and schedules tasks submitted by users to specific TaskManagers. In high-availability (HA) mode, multiple JobManagers are deployed. Among these JobManagers, one of which is selected as the leader, and the others are standby.
Flink provides the following features:
Flink Development APIs
Flink DataStream API can be developed using Scala and Java languages, as shown in Table 1.
Flink Basic Concepts
- DataStream
DataStream is the minimum unit of Flink processing and is one of core concepts of Flink. DataStreams are initially imported from external systems in formats of socket, Kafka, and files. After being processed by Flink, DataStreams are exported to external systems in formats of socket, Kafka, and files.
- Data Transformation
A data transformation is a data processing unit that transforms one or multiple DataStreams into a new DataStream.
Data transformation can be classified as follows:
- One-to-one transformation, for example, map.
- One-to-zero, one-to-one, or one-to-multiple transformation, for example, flatMap.
- One-to-zero or one-to-one transformation, for example, filter.
- Multiple-to-one transformation, for example, union.
- Transformation of multiple aggregations, for example, window and keyby.
- Checkpoint
Checkpoint is the most important Flink mechanism to ensure reliable data processing. Checkpoints ensure that all application statuses can be recovered from a checkpoint in case of failure occurs and data is processed exactly once.
- Savepoint
Savepoints are externally stored checkpoints that you can use to stop-and-resume or update your Flink programs. After the upgrade, you can set the task status to the savepoint storage status and start the restoration, ensuring data continuity.
Flink Project Description
To obtain an MRS sample project, visit https://github.com/huaweicloud/huaweicloud-mrs-example and switch to the branch that matches the MRS cluster version. Download the package to the local PC and decompress it to obtain the sample project of each component.
Example Projects |
Description |
---|---|
FlinkCheckpointJavaExample |
An application development example of the asynchronous checkpoint mechanism program. Assume that you want to collect data volume in a 4-second time window every other second and the status of operators must be strictly consistent. That is, if an application recovers from a failure, the status of all operators must the same. For details about related service scenarios, see Sample Program for Starting Checkpoint on Flink. |
FlinkCheckpointScalaExample |
|
FlinkHBaseJavaExample |
This section provides an application development example of reading and writing HBase data by using Flink API jobs. For details about related service scenarios, see Flink Sample Program for Reading HBase Tables. |
FlinkHudiJavaExample |
This section provides an application development example of reading and writing Hudi data using Flink API jobs. For details about related service scenarios, see Sample Program for Reading Hudi Tables on Flink. |
FlinkKafkaJavaExample |
This section provides an application development example of producing and consuming data to Kafka. Generates and consumes data by calling APIs of the flink-connector-kafka module. For details about related service scenarios, see Flink Kafka Sample Program. |
FlinkKafkaScalaExample |
|
FlinkPipelineJavaExample |
Application development example of the Job Pipeline program. For details about related service scenarios, see Flink Job Pipeline Sample Program. The publisher job generates 10,000 pieces of data per second and sends the data to downstream systems through the NettySink operator of the job. The other two jobs function as subscribers to subscribe to and print data. |
FlinkPipelineScalaExample |
|
FlinkRESTAPIJavaExample |
Call the RestAPI of FlinkServer to create an application development example of a tenant. For details about related service scenarios, see FlinkServer REST API Sample Program. |
FlinkStreamJavaExample |
Application development example of the DataStream program. For details about related service scenarios, see Flink DataStream Sample Program. Assume that a user has a log text of netizens' online shopping duration on weekends and a CSV table of netizens' personal information. The Flink application can be used to collect statistics on female netizens who spend more than 2 hours online shopping in real time, including detailed personal information. |
FlinkStreamScalaExample |
|
FlinkStreamSqlJoinExample |
Application development example of the Stream SQL Join program. For details about related service scenarios, see Flink Join Sample Program. Assume that a Flink service receives a message every second. The message records the basic information about a user, including the name, gender, and age. Another Flink service (service 2) receives a message irregularly, and the message records the name and career information about the user. The function of performing joint query on the two service data in real time by using the user name recorded in the message of service 2 as the keyword is implemented. |
FlinkStreamSqlJoinScalaExample |
|
flink-sql |
SQL job submission through Jar jobs on the client For details about related service scenarios, see Flink Jar Job Submission SQL Sample Program. |
pyflink-example |
Provides examples of reading and writing Kafka jobs using Python and submitting SQL jobs using Python. For details about related service scenarios, see PyFlink Sample Program. |
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot