Updated on 2022-06-01 GMT+08:00

Basic Concepts

  • DataStream

    A DataStream is the minimum data unit processed by Flink. DataStreams are initially imported from external systems in formats of socket, Kafka, and files. After being processed by Flink, DataStreams are exported to external systems in formats of socket, Kafka, and files.

  • Data Transformation

    A data transformation is a data processing unit that transforms one or multiple DataStreams into a new DataStream.

    Data transformation can be classified as follows:

    • One-to-one transformation, for example, map.
    • One-to-zero, one-to-one, or one-to-multiple transformation, for example, flatMap.
    • One-to-zero or one-to-one transformation, for example, filter.
    • Multiple-to-one transformation, for example, union.
    • Transformation of multiple aggregations, for example, window and keyby.
  • Topology

    A topology represents an execution task of a user. A topology is composed of the input (for example, Kafka source), output (for example, Kafka sink), and data transformations.

  • CheckPoint

    Checkpoint is the most important Flink mechanism to ensure reliable data processing. Checkpoints ensure that all application statuses can be recovered from a checkpoint in case of failure and data is processed exactly once.

  • SavePoint

    Savepoints are externally stored checkpoints that you can use to stop-and-resume or update your Flink programs. After the upgrade, you can set the task status to the savepoint storage status and start the restoration, ensuring data continuity.