Why Does the ApplicationManager Fail to Be Terminated When Data Is Being Processed in the Structured Streaming Cluster Mode?
Question
In the Structured Streaming cluster mode, when the ApplicationManager is stopped during data processing, the following error information is displayed:
2017-05-09 20:46:02,393 | INFO | main | client token: Token { kind: YARN_CLIENT_TOKEN, service: } diagnostics: User class threw exception: org.apache.spark.sql.AnalysisException: This query does not support recovering from checkpoint location. Delete hdfs://hacluster/structuredtest/checkpoint/offsets to start over.; ApplicationMaster host: 10.96.101.170 ApplicationMaster RPC port: 0 queue: default start time: 1494333891969 final status: FAILED tracking URL: https://9-96-101-191:8090/proxy/application_1493689105146_0052/ user: spark2x | org.apache.spark.internal.Logging$class.logInfo(Logging.scala:54) Exception in thread "main" org.apache.spark.SparkException: Application application_1493689105146_0052 finished with failed status
Answer
Possible cause: The value of recoverFromCheckpointLocation is false, but the checkpoint directory is configured.
The value of the recoverFromCheckpointLocation parameter is the result of the outputMode == OutputMode.Complete() statement in the code. (The default outputMode is append.)
Solution: When compiling an application, you can change the data output mode based on the actual conditions.
When the output mode is changed to complete, the value of recoverFromCheckpointLocation is determined as true. In such case, if the checkpoint directory is configured, no exception is displayed.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot