Updated on 2025-07-11 GMT+08:00

Recovery from Failures

  • System-Level

    DLI uses a decoupled storage and compute architecture. In the event of a system fault, a compute cluster can be automatically recovered due to Kubernetes' resource scheduling and failover mechanism.

  • Job-Level

    You can enable automatic restart and recovery for Flink and Spark jobs. After this function is enabled, jobs will be automatically restarted and recovered if exceptions occur.