Updated on 2024-12-10 GMT+08:00

Configuration Rules

Flink Job Parameter Configuration Specifications

The following table describes the rules for configuring Flink job parameters.

Table 1 Parameter configuration specifications

Parameter

Mandatory

Description

Recommended Value

-c

Yes

Main class name

Set this parameter as you need.

-ynm

Yes

Flink YARN job name

Set this parameter as you need.

execution.checkpointing.interval

Yes

Interval for triggering a checkpoint, which can be added using -yD. The unit is ms.

60000

execution.checkpointing.timeout

Yes

Checkpoint timeout interval. You can run the -yD command to add a checkpoint timeout interval. The default value is 30 minutes.

30min

parallelism.default

No

Job parallelism. For example, to add the job parallelism for the join operator, use -yD. The default value is 1.

Set this parameter based on the site requirements.

table.exec.state.ttl

Yes

TTL (join ttl) of Flink state, which can be added using -yD. The default value is 0.

Set this parameter based on the site requirements.

Checkpoint Interval Should Be Longer Than the Checkpoint Execution Duration

The checkpoint execution duration depends on checkpoint data volume. The larger the data volume, the longer the execution duration.

Checkpoint Timeout Duration Should Be Longer Than the Checkpoint Interval

The checkpoint interval indicates the interval for triggering a checkpoint. If the execution duration is longer than the checkpoint timeout interval, the job fails.

If CDC is used, changelog needs to be enabled for Hudi table read and write.

To ensure Flink calculation accuracy when CDC is used, retain +I, +U, -U, and -D in Hudi tables. Changelog must be enabled when data is written to or read from the same Hudi table.