Updated on 2024-11-29 GMT+08:00

Restoring a Job

If the checkpoint function is enabled for a Flink job that once has run, the job can be restored from a specified checkpoint and re-executed at the checkpoint. You can also restore a Flink job from a specified savepoint or create a savepoint after the job is submitted. You stop the job, create a savepoint, and re-execute the job from that savepoint.

You can delete specified checkpoints and savepoints of jobs in the Failed, Running succeeded, Submission failed, Stopped, Draft, or Saved state.

Restoring a Job from a Checkpoint

You can rectify faults with checkpoints for jobs in the Failed, Running succeeded, or Stopped state.

  1. Check that the checkpoint function has been enabled for the job.

    You can check whether checkpoint is enabled for the job management page mentioned in Creating a Job. If the function is disabled, no checkpoint can be specified to restore the job.

  2. (Optional) Set the number of checkpoints.

    Log in to FusionInsight Manager, choose Cluster > Services > Flink, click Configurations and then All Configurations. Search for state.checkpoints.num-retained, and set the number of checkpoints. The default value is 5.

  3. Specify a checkpoint to restore the job.

    1. Access the Flink web UI by referring to Accessing the Flink Web UI.
    2. Click Job Management. The job management page is displayed.
    3. In the Operation column of the job you want to restore, click More to expand options.
      • Restore from a Checkpoint: The checkpoint list of the job is displayed. The number of checkpoints is the same as the value of state.checkpoints.num-retained you set in 2. Select a checkpoint to restore the job.
      • Restore from Latest Checkpoint: The job will be restored from the latest checkpoint.

Restoring a Job from a Savepoint

  • A job in the Running state can be stopped, and a savepoint can be created for the job.
  • Savepoint can be used restore jobs in the Failed, Running succeeded, or Stopped state.
  1. (Optional) Set the savepoint directory used by Flink to restore and update jobs.

    Log in to FusionInsight Manager, choose Cluster > Services > Flink, click Configurations > All Configurations, and search for state.backend.fs.savepointdir. In the Flink-> FlinkServer option, set this parameter to the savepoint directory. The default value is hdfs://hacluster/flink/savepoint.

  2. Specify a savepoint to restore the job.

    1. Access the Flink web UI by referring to Accessing the Flink Web UI.
    2. Click Job Management. The job management page is displayed.
    3. In the Operation column of the target job, choose More > Stop and Keep Savepoint. Stop the job as prompted and save the savepoint of the job.
      • If the job has saved a historical savepoint, skip this step and select one to restore the job.
      • After you click Stop and Keep Savepoint, the system deletes the latest checkpoint. In this case, you cannot restore jobs from the latest checkpoint. Select a historical savepoint instead.
    4. Choose More > Restore from a Savepoint in the Operation column and restore the job as prompted.