Updated on 2024-08-10 GMT+08:00

Restrictions on Restoring the Spark Application from the Checkpoint

Question

The Spark application can be restored from the checkpoint and continues to execute the task from the breakpoint of the last task, ensuring that data is not lost. However, in some cases, the Spark application fails to be restored from the checkpoint.

Answer

The checkpoint contains the object serialization information, task execution status information, and configuration information of the Spark application. Therefore, the Spark application cannot be restored from the checkpoint if the following problems exist:

  1. The service code is changed and the SerialVersionUID is not specified in the changed class.
  2. The internal Spark class is changed and the SerialVersionUID is not specified in the changed class.

Besides, some configuration items are stored in the checkpoint. Therefore, if some configuration items of the service are modified, the configuration items may remain unchanged when the service is restored from the checkpoint. Currently, only the following configurations are reloaded when the service is restored from the checkpoint.

"spark.yarn.app.id",
 "spark.yarn.app.attemptId",
 "spark.driver.host",
 "spark.driver.bindAddress",
 "spark.driver.port",
 "spark.master",
 "spark.yarn.jars",
 "spark.yarn.keytab",
 "spark.yarn.principal",
 "spark.yarn.credentials.file",
 "spark.yarn.credentials.renewalTime",
 "spark.yarn.credentials.updateTime",
 "spark.ui.filters",
 "spark.mesos.driver.frameworkId",
 "spark.yarn.jars"

Solution

Manually delete the checkpoint directory and restart the service program.

Deleting a file or folder is a high-risk operation. Ensure that the file or folder is no longer required before performing this operation.