Updated on 2024-08-16 GMT+08:00

Introduction to Flink Application Development

Flink is a unified computing framework that supports both batch processing and stream processing. It provides a stream data processing engine that supports data distribution and parallel computing.

Flink provides high-concurrency pipeline data processing, millisecond-level latency, and high reliability, making it extremely suitable for low-latency data processing.

The entire Flink system consists of three parts:

  • Client

    Flink client is used to submit jobs (streaming jobs) to Flink.

  • TaskManager

    TaskManager is a service execution node of Flink. It executes specific tasks. A Flink system can have multiple TaskManagers. These TaskManagers are equivalent to each other.

  • JobManager

    JobManager is a management node of Flink. It manages all TaskManagers and schedules tasks submitted by users to specific TaskManagers. In high-availability (HA) mode, multiple JobManagers are deployed. Among these JobManagers, one is selected as the active JobManager, and the others are standby.

Flink provides the following features:

  • Low latency

    Millisecond-level processing capability

  • Exactly Once

    Asynchronous snapshot mechanism, ensuring that all data is processed only once

  • HA

    Active/standby JobManagers, preventing single point of failure (SPOF)

  • Scale-out

    Manual scale-out supported by TaskManagers

Flink DataStream APIs can be developed in Scala and Java, as shown in Table 1.

Table 1 Flink DataStream APIs

Function

Description

Scala API

API in Scala, which can be used for data processing, such as filtering, joining, windowing, and aggregation. Since Scala is easy to read, you are advised to use Scala APIs to develop applications.

Java API

API in Java, which can be used for data processing, such as filtering, joining, windowing, and aggregation.

For details about Flink, visit https://flink.apache.org/.