Updated on 2023-05-19 GMT+08:00

Recovery from Failures

  • System-Level

    DLI uses an architecture with separated storage and compute resources. A compute cluster can be autocratically recovered if a system fault occurs, thanks to the Kubernetes resource scheduling and failover mechanism.

  • Job-Level

    You can enable automatic restart and recovery for Flink and Spark jobs. After this function is enabled, jobs will be automatically restarted and recovered if exceptions occur.