Help Center/ MapReduce Service/ Developer Guide (LTS)/ Spark2x Development Guide (Normal Mode)/ More Information/ FAQ/ Application Fails When ApplicationManager Is Terminated During Data Processing in the Cluster Mode of Structured Streaming
Updated on 2022-11-18 GMT+08:00

Application Fails When ApplicationManager Is Terminated During Data Processing in the Cluster Mode of Structured Streaming

Question

If ApplicationManager is terminated during data processing in the cluster mode of Structured Streaming, the following information is displayed when the application is executed, indicating an error:

2017-05-09 20:46:02,393 | INFO  | main | 
  client token: Token { kind: YARN_CLIENT_TOKEN, service:  }
  diagnostics: User class threw exception: org.apache.spark.sql.AnalysisException: This query does not support recovering from checkpoint location. Delete hdfs://hacluster/structuredtest/checkpoint/offsets to start over.;
  ApplicationMaster host: 10.96.101.170
  ApplicationMaster RPC port: 0
  queue: default
  start time: 1494333891969
  final status: FAILED
  tracking URL: https://9-96-101-191:8090/proxy/application_1493689105146_0052/
  user: spark2x | org.apache.spark.internal.Logging$class.logInfo(Logging.scala:54)
Exception in thread "main" org.apache.spark.SparkException: Application application_1493689105146_0052 finished with failed status

Answer

Possible causes: The error occurs because recoverFromCheckpointLocation is determined as false but the checkpoint directory is configured.

The value of the recoverFromCheckpointLocation parameter is the result of the outputMode == OutputMode.Complete() statement in the code. (The default outputMode is append.)

Solution: When compiling an application, you can change the data output mode based on the actual conditions.

When the output mode is changed to complete, the value of recoverFromCheckpointLocation is determined as true. No error would be indicated if the checkpoint directory is configured at the time.