Help Center/ Data Lake Insight/ FAQs/ Flink Jobs/ Flink Job Performance Tuning/ How Can I Check if a Flink Job Can Be Restored From a Checkpoint After Restarting It?
Updated on 2025-07-10 GMT+08:00

How Can I Check if a Flink Job Can Be Restored From a Checkpoint After Restarting It?

What Is Restoration from a Checkpoint?

Flink's checkpointing is a fault tolerance and recovery mechanism. This mechanism ensures that real-time programs can self-recover in case of exceptions or machine issues during runtime.

Principles for Restoration from Checkpoints

  • When a job fails to be executed or a resource restarts due to an exception that is not triggered by manual operations, data can be restored from a checkpoint.
  • However, if the calculation logic of a job is modified, the job cannot be restored from a checkpoint.

Application Scenarios

Table 1 lists some common scenarios of restoring data from a checkpoint for your reference.

For more scenarios, refer to Principles for Restoration from Checkpoints and assess whether data can be restored from a checkpoint based on the actual situation.

Table 1 Common scenarios of restoring data from a checkpoint

Scenario

Restoration from a Checkpoint

Description

Adjust or increase the number of concurrent tasks.

Not supported

This operation alters the parallelism of the job, thereby changing its execution logic.

Modify Flink SQL statements and Flink Jar jobs.

Not supported

This operation modifies the algorithmic logic of the job with respect to resources.

For example, if the original algorithm involves addition and subtraction, but the desired state requires multiplication, division, and modulo operations, it cannot be restored directly from the checkpoint.

Modify the static stream graph.

Not supported

This operation modifies the algorithmic logic of the job with respect to resources.

Modify the CU(s) per TM parameter.

Supported

The modification of compute resources does not affect the operational logic of the job's algorithm or operators.

A job runs abnormally or there is a physical power outage.

Supported

The job parameters and algorithm logic are not modified.

Related Operation: How Do I Restore a Job from a Checkpoint?

Since the Flink checkpoint and savepoint generation mechanisms and formats are consistent, you can restore the Flink job from the latest successful checkpoint in OBS. Specifically, in the Flink job list, locate the desired Flink job, click More in the Operation column, and select Import Savepoint to import the checkpoint.

  1. Log in to the DLI console. In the navigation pane on the left, choose Job Management > Flink Jobs.
  2. Locate the row that contains the target Flink job and click Import Savepoint in the Operation column.
  3. In the displayed dialog box, select the OBS bucket path storing the checkpoint. The checkpoint save path is Bucket name/jobs/checkpoint/Directory starting with the job ID. Click OK.
  4. Restart the Flink job again. The job will be restored fom the checkpoint path.