文档首页/ 数据治理中心 DataArts Studio/ 常见问题/ 数据集成(实时作业)/ 在实时集成作业中,如果作业运行时出现异常,预聚合键为空,报错信息包含关键字“The value of col_double can not be null”怎么办?
更新时间:2025-11-03 GMT+08:00
分享

在实时集成作业中,如果作业运行时出现异常,预聚合键为空,报错信息包含关键字“The value of col_double can not be null”怎么办?

问题描述

在实时集成作业中,作业运行时出现异常,预聚合键为空,报错信息包含关键字“The value of col_double can not be null”。

报错信息详情:

com.huawei.clouds.dataarts.shaded.org.apache.hudi.exception.
HoodieException: The value of col_double can not be null
    at com.huawei.clouds.dataarts.shaded.org.apache.hudi.avro.HoodieAvroUtils.getNestedFieldVal(HoodieAvroUtils.java:603) ~[blob_p-b7cce999877870001e778eb40e09ff21b660374d-2d3d3e59daade3326e00ec7c6d260618:?]
    at com.huawei.clouds.dataarts.shaded.org.apache.hudi.sink.utils.
PayloadCreation.createPayload
(PayloadCreation.java:85) ~[blob_p-b7cce999877870001e778eb40e09ff21b660374d-2d3d3e59daade3326e00ec7c6d260618:?]
    at com.huawei.clouds.dataarts.migration.connector.hudi.sink.StreamWriteFunction.generateRecord(StreamWriteFunction.java:330) ~[blob_p-b7cce999877870001e778eb40e09ff21b660374d-2d3d3e59daade3326e00ec7c6d260618:?]
    at com.huawei.clouds.dataarts.migration.connector.hudi.sink.bucket.BucketStreamWriteFunction.processElement(BucketStreamWriteFunction.java:110) ~[blob_p-b7cce999877870001e778eb40e09ff21b660374d-2d3d3e59daade3326e00ec7c6d260618:?]
    at org.apache.flink.streaming.api.operators.ProcessOperator.processElement(ProcessOperator.java:40) ~[flink-dist-1.15.0-h0.cbu.dli.321.r2.jar:1.15.0-h0.cbu.dli.321.r2]
    at org.apache.flink.streaming.runtime.tasks.OneInputStreamTask$StreamTaskNetworkOutput.emitRecord(OneInputStreamTask.java:233) ~[flink-dist-1.15.0-h0.cbu.dli.321.r2.jar:1.15.0-h0.cbu.dli.321.r2]
    at org.apache.flink.streaming.runtime.io.AbstractStreamTaskNetworkInput.processElement(AbstractStreamTaskNetworkInput.java:134) ~[flink-dist-1.15.0-h0.cbu.dli.321.r2.jar:1.15.0-h0.cbu.dli.321.r2]
    at org.apache.flink.streaming.runtime.io.AbstractStreamTaskNetworkInput.emitNext(AbstractStreamTaskNetworkInput.java:105) ~[flink-dist-1.15.0-h0.cbu.dli.321.r2.jar:1.15.0-h0.cbu.dli.321.r2]
    at org.apache.flink.streaming.runtime.io.StreamOneInputProcessor.processInput(StreamOneInputProcessor.java:65) ~[flink-dist-1.15.0-h0.cbu.dli.321.r2.jar:1.15.0-h0.cbu.dli.321.r2]
    at org.apache.flink.streaming.runtime.tasks.StreamTask.processInput(StreamTask.java:527) ~[flink-dist-1.15.0-h0.cbu.dli.321.r2.jar:1.15.0-h0.cbu.dli.321.r2]
    at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:216) ~[flink-dist-1.15.0-h0.cbu.dli.321.r2.jar:1.15.0-h0.cbu.dli.321.r2]
    at org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:817) ~[flink-dist-1.15.0-h0.cbu.dli.321.r2.jar:1.15.0-h0.cbu.dli.321.r2]
    at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:766) ~[flink-dist-1.15.0-h0.cbu.dli.321.r2.jar:1.15.0-h0.cbu.dli.321.r2]
    at org.apache.flink.runtime.taskmanager.Task.runWithSystemExitMonitoring(Task.java:958) ~[flink-dist-1.15.0-h0.cbu.dli.321.r2.jar:1.15.0-h0.cbu.dli.321.r2]
    at org.apache.flink.runtime.taskmanager.Task.restoreAndInvoke(Task.java:937) [flink-dist-1.15.0-h0.cbu.dli.321.r2.jar:1.15.0-h0.cbu.dli.321.r2]
    at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:751) [flink-dist-1.15.0-h0.cbu.dli.321.r2.jar:1.15.0-h0.cbu.dli.321.r2]
    at org.apache.flink.runtime.taskmanager.Task.run(Task.java:573) [flink-dist-1.15.0-h0.cbu.dli.321.r2.jar:1.15.0-h0.cbu.dli.321.r2]

原因分析

无法获取到Hudi表预聚合键(preCombineField)的值,或者获取到的值为空。可能由以下原因导致:

  • 存在脏数据包含的预聚合键字段为null。
  • 对于时间类型字段值为全0,如timestamp类型='0000-00-00 00:00:00',datetime类型='0000-00-00 00:00:00',date类型='0000-00-00',这些时间值在CDC场景下被解析出来可能为null值。
  • 源表与Hudi表字段映射有误:
    • Hudi表为全小写字段表时,忽略字段大小写的情况下,源端没有预聚合键对应的同名字段。
    • Hudi表包含大写字段时,字段大小写敏感的情况下,源端没有预聚合键对应的同名字段。

解决方案

  • 确认存在脏数据时,用户可以在作业编辑的【任务配置】中打开脏数据归档以忽略脏数据。
  • 确认为字段映射问题时,需要及时修改Hudi建表,修改预聚合键为可用字段。

相关文档