Structured Streaming的cluster模式,在数据处理过程中终止ApplicationManager,应用失败
问题
Structured Streaming的cluster模式,在数据处理过程中终止ApplicationManager,执行应用时显示如下异常。
2017-05-09 20:46:02,393 | INFO | main | client token: Token { kind: YARN_CLIENT_TOKEN, service: } diagnostics: User class threw exception: org.apache.spark.sql.AnalysisException: This query does not support recovering from checkpoint location. Delete hdfs://hacluster/structuredtest/checkpoint/offsets to start over.; ApplicationMaster host: 10.96.101.170 ApplicationMaster RPC port: 0 queue: default start time: 1494333891969 final status: FAILED tracking URL: https://9-96-101-191:8090/proxy/application_1493689105146_0052/ user: spark2x | org.apache.spark.internal.Logging$class.logInfo(Logging.scala:54) Exception in thread "main" org.apache.spark.SparkException: Application application_1493689105146_0052 finished with failed status
回答
原因分析:显示该异常是因为“recoverFromCheckpointLocation”的值判定为false,但却配置了checkpoint目录。
参数“recoverFromCheckpointLocation”的值为代码中“outputMode == OutputMode.Complete()”语句的判断结果(outputMode的默认输出方式为“append”)。
处理方法:编写应用时,用户可以根据具体情况修改数据的输出方式。
将输出方式修改为“complete”,“recoverFromCheckpointLocation”的值会判定为true。此时配置了checkpoint目录时就不会显示异常。