文档首页/ MapReduce服务 MRS/ 开发指南(普通版_2.x)/ Spark开发指南/ Spark应用开发常见问题/ 由于kafka配置的限制,导致Spark Streaming应用运行失败
更新时间:2024-06-14 GMT+08:00
分享

由于kafka配置的限制,导致Spark Streaming应用运行失败

问题

使用运行的Spark Streaming任务回写kafka时,kafka上接收不到回写的数据,且kafka日志报错信息如下:

2016-03-02 17:46:19,017 | INFO | [kafka-network-thread-21005-1] | Closing socket connection to /10.91.8.208 due to invalid request: Request of length
122371301 is not valid, it is larger than the maximum size of 104857600 bytes. | kafka.network.Processor (Logging.scala:68)
2016-03-02 17:46:19,155 | INFO | [kafka-network-thread-21005-2] | Closing socket connection to /10.91.8.208. | kafka.network.Processor (Logging.scala:68)
2016-03-02 17:46:19,270 | INFO | [kafka-network-thread-21005-0] | Closing socket connection to /10.91.8.208 due to invalid request:
Request of length 122371301 is not valid, it is larger than the maximum size of 104857600 bytes. | kafka.network.Processor (Logging.scala:68)
2016-03-02 17:46:19,513 | INFO | [kafka-network-thread-21005-1] | Closing socket connection to /10.91.8.208 due to invalid request:
Request of length 122371301 is not valid, it is larger than the maximum size of 104857600 bytes. | kafka.network.Processor (Logging.scala:68)
2016-03-02 17:46:19,763 | INFO | [kafka-network-thread-21005-2] | Closing socket connection to /10.91.8.208 due to invalid request:
Request of length 122371301 is not valid, it is larger than the maximum size of 104857600 bytes. | kafka.network.Processor (Logging.scala:68)
53393 [main] INFO  org.apache.hadoop.mapreduce.Job  - Counters: 50

回答

如下图所示,Spark Streaming应用中定义的逻辑为,从kafka中读取数据,执行对应处理之后,然后将结果数据回写至kafka中。

例如:Spark Streming中定义了批次时间,如果数据传入Kafka的速率为10MB/s,而Spark Streaming中定义了每60s一个批次,回写数据总共为600MB。而Kafka中定义了接收数据的阈值大小为500MB。那么此时回写数据已超出阈值。此时,会出现上述错误。

图1 应用场景

解决措施:

方式一:推荐优化Spark Streaming应用程序中定义的批次时间,降低批次时间,可避免超过kafka定义的阈值。一般建议以5-10秒/次为宜。

方式二:将kafka的阈值调大,建议在MRS Manager中的Kafka服务进行参数设置,将socket.request.max.bytes参数值根据应用场景,适当调整。

相关文档