Updated on 2022-08-16 GMT+08:00

Savepoints-related Problems

  1. Should I assign IDs to all operators of the job?

    Strictly speaking, you can assign IDs to only operators with statuses because savepoints save only statuses of operators that have statuses and not operators without statuses.

    However, in actual situations, you are advised to allocate IDs to all operators because some internal operators, such as window operators of Flink have statuses. Whether an operator has status or not is not obvious. If you are specific that an operator does not have status, you do not need to call uid() to allocate an ID to the operator.

  1. What would be the impact if I add an operator with status while upgrading the job?

    If you add an operator with status to the job, the status of the operator is not saved in the savepoint and thus the status cannot be recovered. The operator is processed as an operator without status and is executed from the start.

  1. What would be the impact if I delete an operator with status while upgrading the job?

    By default, savepoints attempt to recover all saved statuses. If the savepoint saves the status of the deleted operator, recovery fails.

    You can run the following command and use the -allowNonRestoredState (-n in the following command) parameter to skip recovering the status of the deleted operator:

    $ bin/flink run -s savepointPath -n [runArgs]
  1. What would be the impact if I rearrange the sequence of operators with statuses?
    • If you have allocated IDs to the operators, the statuses would be recovered normally.
    • If you do not allocate IDs to the operators, IDs would be automatically allocated to the operators in the new sequence. Then, the status recovery would fail.
  2. What would be the impact if I delete or add an operator without status or rearrange the sequence of operators without statuses?
    • If you have allocated IDs to operators with statuses, operators without statuses do not affect status recovery from savepoints.
    • If you do not allocate IDs to operators, operators with statuses may be allocated with new IDs due to the sequence change. This would cause status recovery failure.
  3. What would be the impact if I change the operator concurrency during the status recovery?

    If the Flink version is higher than 1.2.0 and discarded status APIs, such as checkpointed, are not used, you can recover statuses from savepoints. Otherwise, statuses cannot be recovered.