Help Center/ MapReduce Service/ Developer Guide (Normal_Earlier Than 3.x)/ Spark Application Development/ FAQs/ Why Does the ApplicationManager Fail to Be Terminated When Data Is Being Processed in the Structured Streaming Cluster Mode?
Updated on 2022-09-14 GMT+08:00

Why Does the ApplicationManager Fail to Be Terminated When Data Is Being Processed in the Structured Streaming Cluster Mode?

Question

In the Structured Streaming cluster mode, when the ApplicationManager is stopped during data processing, the following error information is displayed:

2017-05-09 20:46:02,393 | INFO  | main |
client token: Token { kind: YARN_CLIENT_TOKEN, service:  }
diagnostics: User class threw exception: org.apache.spark.sql.AnalysisException: This query does not support recovering from checkpoint location. Delete hdfs://hacluster/structuredtest/checkpoint/offsets to start over.;
ApplicationMaster host: 9.96.101.170
ApplicationMaster RPC port: 0
queue: default
start time: 1494333891969
final status: FAILED
tracking URL: https://9-96-101-191:26001/proxy/application_1493689105146_0052/
user: spark2x | org.apache.spark.internal.Logging$class.logInfo(Logging.scala:54)
Exception in thread "main" org.apache.spark.SparkException: Application application_1493689105146_0052 finished with failed status

Answer

Possible cause: The value of recoverFromCheckpointLocation is false, but the checkpoint directory is configured.

The value of recoverFromCheckpointLocation is the result of the outputMode == OutputMode.Complete() statement in the code. The default output mode of the outputMode is append.

Troubleshooting solutions: When compiling an application, you can modify the data output mode according to your actual needs. For details about how to call the outputMode method to modify the output mode, see the DataSight Spark V100R002CXX Spark2.1 API Reference.

Change the output mode to complete, and the value of recoverFromCheckpointLocation will be true. In such case, if the checkpoint directory is configured, no exception is displayed.