El contenido no se encuentra disponible en el idioma seleccionado. Estamos trabajando continuamente para agregar más idiomas. Gracias por su apoyo.

Compute
Elastic Cloud Server
Huawei Cloud Flexus
Bare Metal Server
Auto Scaling
Image Management Service
Dedicated Host
FunctionGraph
Cloud Phone Host
Huawei Cloud EulerOS
Networking
Virtual Private Cloud
Elastic IP
Elastic Load Balance
NAT Gateway
Direct Connect
Virtual Private Network
VPC Endpoint
Cloud Connect
Enterprise Router
Enterprise Switch
Global Accelerator
Management & Governance
Cloud Eye
Identity and Access Management
Cloud Trace Service
Resource Formation Service
Tag Management Service
Log Tank Service
Config
OneAccess
Resource Access Manager
Simple Message Notification
Application Performance Management
Application Operations Management
Organizations
Optimization Advisor
IAM Identity Center
Cloud Operations Center
Resource Governance Center
Migration
Server Migration Service
Object Storage Migration Service
Cloud Data Migration
Migration Center
Cloud Ecosystem
KooGallery
Partner Center
User Support
My Account
Billing Center
Cost Center
Resource Center
Enterprise Management
Service Tickets
HUAWEI CLOUD (International) FAQs
ICP Filing
Support Plans
My Credentials
Customer Operation Capabilities
Partner Support Plans
Professional Services
Analytics
MapReduce Service
Data Lake Insight
CloudTable Service
Cloud Search Service
Data Lake Visualization
Data Ingestion Service
GaussDB(DWS)
DataArts Studio
Data Lake Factory
DataArts Lake Formation
IoT
IoT Device Access
Others
Product Pricing Details
System Permissions
Console Quick Start
Common FAQs
Instructions for Associating with a HUAWEI CLOUD Partner
Message Center
Security & Compliance
Security Technologies and Applications
Web Application Firewall
Host Security Service
Cloud Firewall
SecMaster
Anti-DDoS Service
Data Encryption Workshop
Database Security Service
Cloud Bastion Host
Data Security Center
Cloud Certificate Manager
Edge Security
Managed Threat Detection
Blockchain
Blockchain Service
Web3 Node Engine Service
Media Services
Media Processing Center
Video On Demand
Live
SparkRTC
MetaStudio
Storage
Object Storage Service
Elastic Volume Service
Cloud Backup and Recovery
Storage Disaster Recovery Service
Scalable File Service Turbo
Scalable File Service
Volume Backup Service
Cloud Server Backup Service
Data Express Service
Dedicated Distributed Storage Service
Containers
Cloud Container Engine
SoftWare Repository for Container
Application Service Mesh
Ubiquitous Cloud Native Service
Cloud Container Instance
Databases
Relational Database Service
Document Database Service
Data Admin Service
Data Replication Service
GeminiDB
GaussDB
Distributed Database Middleware
Database and Application Migration UGO
TaurusDB
Middleware
Distributed Cache Service
API Gateway
Distributed Message Service for Kafka
Distributed Message Service for RabbitMQ
Distributed Message Service for RocketMQ
Cloud Service Engine
Multi-Site High Availability Service
EventGrid
Dedicated Cloud
Dedicated Computing Cluster
Business Applications
Workspace
ROMA Connect
Message & SMS
Domain Name Service
Edge Data Center Management
Meeting
AI
Face Recognition Service
Graph Engine Service
Content Moderation
Image Recognition
Optical Character Recognition
ModelArts
ImageSearch
Conversational Bot Service
Speech Interaction Service
Huawei HiLens
Video Intelligent Analysis Service
Developer Tools
SDK Developer Guide
API Request Signing Guide
Terraform
Koo Command Line Interface
Content Delivery & Edge Computing
Content Delivery Network
Intelligent EdgeFabric
CloudPond
Intelligent EdgeCloud
Solutions
SAP Cloud
High Performance Computing
Developer Services
ServiceStage
CodeArts
CodeArts PerfTest
CodeArts Req
CodeArts Pipeline
CodeArts Build
CodeArts Deploy
CodeArts Artifact
CodeArts TestPlan
CodeArts Check
CodeArts Repo
Cloud Application Engine
MacroVerse aPaaS
KooMessage
KooPhone
KooDrive

Flink Application Development Overview

Updated on 2024-08-10 GMT+08:00

Overview

Flink is a unified computing framework that supports both batch processing and stream processing. It provides a stream data processing engine that supports data distribution and parallel computing. Flink features stream processing and is a top open-source stream processing engine in the industry.

Flink provides high-concurrency pipeline data processing, millisecond-level latency, and high reliability, making it suitable for low-latency data processing.

Figure 1 shows the technology stack of Flink.

Figure 1 Technology stack of Flink

The following lists the key features of Flink in the current version:

  • DataStream
  • Checkpoint
  • Window
  • Job Pipeline
  • Configuration Table

Architecture

Figure 2 shows the architecture of Flink.

Figure 2 Flink architecture

As shown in Figure 2, the entire Flink system consists of three parts:

  • Client

    Flink client is used to submit jobs (streaming jobs) to Flink.

  • TaskManager

    TaskManager (also called worker) is a service execution node of Flink. It executes specific tasks. A Flink system could have multiple TaskManagers. These TaskManagers are equivalent to each other.

  • JobManager

    JobManager (also called master) is a management node of Flink. It manages all TaskManagers and schedules tasks submitted by users to specific TaskManagers. In high-availability (HA) mode, multiple JobManagers are deployed. Among these JobManagers, one of which is selected as the leader, and the others are standby.

Flink provides the following features:

  • Low latency

    Millisecond-level processing capability.

  • Exactly once

    Asynchronous snapshot mechanism, ensuring that all data is processed only once.

  • High availability

    Leader/Standby JobManagers, preventing single point of failure (SPOF).

  • Scale out

    Manual scale out supported by TaskManagers.

Flink Development APIs

Flink DataStream API can be developed using Scala and Java languages, as shown in Table 1.

Table 1 Flink DataStream API

Function

Description

Scala API

API in Scala, which can be used for data processing, such as filtering, joining, windowing, and aggregation.

Java API

API in Java, which can be used for data processing, such as filtering, joining, windowing, and aggregation.

Flink Basic Concepts

  • DataStream

    DataStream is the minimum unit of Flink processing and is one of core concepts of Flink. DataStreams are initially imported from external systems in formats of socket, Kafka, and files. After being processed by Flink, DataStreams are exported to external systems in formats of socket, Kafka, and files.

  • Data Transformation

    A data transformation is a data processing unit that transforms one or multiple DataStreams into a new DataStream.

    Data transformation can be classified as follows:

    • One-to-one transformation, for example, map.
    • One-to-zero, one-to-one, or one-to-multiple transformation, for example, flatMap.
    • One-to-zero or one-to-one transformation, for example, filter.
    • Multiple-to-one transformation, for example, union.
    • Transformation of multiple aggregations, for example, window and keyby.
  • Checkpoint

    Checkpoint is the most important Flink mechanism to ensure reliable data processing. Checkpoints ensure that all application statuses can be recovered from a checkpoint in case of failure occurs and data is processed exactly once.

  • Savepoint

    Savepoints are externally stored checkpoints that you can use to stop-and-resume or update your Flink programs. After the upgrade, you can set the task status to the savepoint storage status and start the restoration, ensuring data continuity.

Flink Project Description

To obtain an MRS sample project, visit https://github.com/huaweicloud/huaweicloud-mrs-example and switch to the branch that matches the MRS cluster version. Download the package to the local PC and decompress it to obtain the sample project of each component.

Currently, MRS provides the following Flink sample projects. The path in security mode is flink-examples/flink-examples-security, and the path in common mode is flink-examples/flink-examples-normal.
Table 2 Flink-related sample projects

Example Projects

Description

FlinkCheckpointJavaExample

An application development example of the asynchronous checkpoint mechanism program.

Assume that you want to collect data volume in a 4-second time window every other second and the status of operators must be strictly consistent. That is, if an application recovers from a failure, the status of all operators must the same.

For details about related service scenarios, see Sample Program for Starting Checkpoint on Flink.

FlinkCheckpointScalaExample

FlinkHBaseJavaExample

This section provides an application development example of reading and writing HBase data by using Flink API jobs.

For details about related service scenarios, see Flink Sample Program for Reading HBase Tables.

FlinkHudiJavaExample

This section provides an application development example of reading and writing Hudi data using Flink API jobs.

For details about related service scenarios, see Sample Program for Reading Hudi Tables on Flink.

FlinkKafkaJavaExample

This section provides an application development example of producing and consuming data to Kafka.

Generates and consumes data by calling APIs of the flink-connector-kafka module.

For details about related service scenarios, see Flink Kafka Sample Program.

FlinkKafkaScalaExample

FlinkPipelineJavaExample

Application development example of the Job Pipeline program.

For details about related service scenarios, see Flink Job Pipeline Sample Program.

The publisher job generates 10,000 pieces of data per second and sends the data to downstream systems through the NettySink operator of the job. The other two jobs function as subscribers to subscribe to and print data.

FlinkPipelineScalaExample

FlinkRESTAPIJavaExample

Call the RestAPI of FlinkServer to create an application development example of a tenant.

For details about related service scenarios, see FlinkServer REST API Sample Program.

FlinkStreamJavaExample

Application development example of the DataStream program.

For details about related service scenarios, see Flink DataStream Sample Program.

Assume that a user has a log text of netizens' online shopping duration on weekends and a CSV table of netizens' personal information. The Flink application can be used to collect statistics on female netizens who spend more than 2 hours online shopping in real time, including detailed personal information.

FlinkStreamScalaExample

FlinkStreamSqlJoinExample

Application development example of the Stream SQL Join program.

For details about related service scenarios, see Flink Join Sample Program.

Assume that a Flink service receives a message every second. The message records the basic information about a user, including the name, gender, and age. Another Flink service (service 2) receives a message irregularly, and the message records the name and career information about the user. The function of performing joint query on the two service data in real time by using the user name recorded in the message of service 2 as the keyword is implemented.

FlinkStreamSqlJoinScalaExample

flink-sql

SQL job submission through Jar jobs on the client

For details about related service scenarios, see Flink Jar Job Submission SQL Sample Program.

pyflink-example

Provides examples of reading and writing Kafka jobs using Python and submitting SQL jobs using Python.

For details about related service scenarios, see PyFlink Sample Program.

Utilizamos cookies para mejorar nuestro sitio y tu experiencia. Al continuar navegando en nuestro sitio, tú aceptas nuestra política de cookies. Descubre más

Feedback

Feedback

Feedback

0/500

Selected Content

Submit selected content with the feedback