文档首页/ 数据治理中心 DataArts Studio/ 常见问题/ 数据集成(实时作业)/ 在实时集成作业中,如果作业运行时出现异常,主键为空,报错信息包含关键字“HoodieKeyException: recordKey values: 'id:__null__' for fields: [xxx] cannot be entirely null or empty”怎么办?
更新时间:2025-11-03 GMT+08:00
分享

在实时集成作业中,如果作业运行时出现异常,主键为空,报错信息包含关键字“HoodieKeyException: recordKey values: 'id:__null__' for fields: [xxx] cannot be entirely null or empty”怎么办?

问题描述

在实时集成作业中,作业运行时出现异常,主键为空,报错信息包含关键字“HoodieKeyException: recordKey values: 'id:__null__' for fields: [xxx] cannot be entirely null or empty”。

报错信息详情:

2025-07-28 11:41:41,876 ERROR com.huawei.clouds.dataarts.migration.connector.hudi.sink.transform.RowDataToHoodieFunction [] - serialize error, table identifier: llch96.rds_source_upper_tbl_0726, raw data: {"before":{},"after":{"ID":6,"COL_TEXT_NEW":"2023-07-05 22:11:01","COL_INT_NEW":2,"COL_FLOAT":2.34,"COL_DOUBLE":1.326548},"source":{"name":"mysql_binlog_source","db":"TCDB","table":"RDS_SOURCE_UPPER_TBL","pkNames":["ID"],"sqlType":{"ID":4,"COL_TEXT_NEW":12,"COL_INT_NEW":4,"COL_FLOAT":6,"COL_DOUBLE":8},"mysqlType":{"ID":"INT(11)","COL_TEXT_NEW":"VARCHAR(50)","COL_INT_NEW":"INT(11)","COL_FLOAT":"FLOAT(-1)","COL_DOUBLE":"DOUBLE(-1)"}},"tableChange":{"type":"CREATE","previousId":null,"id":{},"table":{}},"tsMs":1753674056000,"op":"c","dataSourceName":"mysql_binlog_source","dbType":"mysql"}
com.huawei.clouds.dataarts.shaded.org.apache.hudi.exception.HoodieKeyException: recordKey values: "id:__null__" for fields: [id] cannot be entirely null or empty.
    at com.huawei.clouds.dataarts.shaded.org.apache.hudi.keygen.KeyGenUtils.getRecordKey(KeyGenUtils.java:120) ~[blob_p-b7cce999877870001e778eb40e09ff21b660374d-69257d4b8f6a214251e95d06920fd0e2:?]
    at org.apache.hudi.keygen.ComplexAvroKeyGenerator.getRecordKey(ComplexAvroKeyGenerator.java:53) ~[blob_p-b7cce999877870001e778eb40e09ff21b660374d-69257d4b8f6a214251e95d06920fd0e2:?]
    at com.huawei.clouds.dataarts.shaded.org.apache.hudi.keygen.BaseKeyGenerator.getKey(BaseKeyGenerator.java:71) ~[blob_p-b7cce999877870001e778eb40e09ff21b660374d-69257d4b8f6a214251e95d06920fd0e2:?]
    at com.huawei.clouds.dataarts.migration.connector.hudi.sink.transform.RowDataToHoodieFunction.toHoodieRecord(RowDataToHoodieFunction.java:268) ~[blob_p-b7cce999877870001e778eb40e09ff21b660374d-69257d4b8f6a214251e95d06920fd0e2:?]
    at com.huawei.clouds.dataarts.migration.connector.hudi.sink.transform.RowDataToHoodieFunction.lambda$toHoodieRecord$0(RowDataToHoodieFunction.java:260) ~[blob_p-b7cce999877870001e778eb40e09ff21b660374d-69257d4b8f6a214251e95d06920fd0e2:?]
    at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193) ~[?:1.8.0_362]
    at java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1384) ~[?:1.8.0_362]
    at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482) ~[?:1.8.0_362]
    at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:472) ~[?:1.8.0_362]
    at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708) ~[?:1.8.0_362]
    at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) ~[?:1.8.0_362]
    at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:566) ~[?:1.8.0_362]
    at com.huawei.clouds.dataarts.migration.connector.hudi.sink.transform.RowDataToHoodieFunction.toHoodieRecord(RowDataToHoodieFunction.java:261) ~[blob_p-b7cce999877870001e778eb40e09ff21b660374d-69257d4b8f6a214251e95d06920fd0e2:?]
    at com.huawei.clouds.dataarts.migration.connector.hudi.sink.transform.RowDataToHoodieFunction.processElement(RowDataToHoodieFunction.java:233) ~[blob_p-b7cce999877870001e778eb40e09ff21b660374d-69257d4b8f6a214251e95d06920fd0e2:?]
    at com.huawei.clouds.dataarts.migration.connector.hudi.sink.transform.RowDataToHoodieFunction.processElement(RowDataToHoodieFunction.java:86) ~[blob_p-b7cce999877870001e778eb40e09ff21b660374d-69257d4b8f6a214251e95d06920fd0e2:?]
    at org.apache.flink.streaming.api.operators.ProcessOperator.processElement(ProcessOperator.java:40) ~[flink-dist-1.15.0-h0.cbu.dli.321.r2.jar:1.15.0-h0.cbu.dli.321.r2]

原因分析

无法获取到Hudi表主键的值,或者获取到的值为空。可能由以下原因导致:

  • 存在脏数据包含的主键字段为null。
  • 对于时间类型字段值为全0,如timestamp类型='0000-00-00 00:00:00',datetime类型='0000-00-00 00:00:00',date类型='0000-00-00',这些时间值在CDC场景下被解析出来可能为null值。
  • 源表与Hudi表字段映射有误:
    • Hudi表为全小写字段表时,忽略字段大小写的情况下,源端没有主键对应的同名字段。
    • Hudi表包含大写字段时,字段大小写敏感的情况下,源端没有主键对应的同名字段。

解决方案

  • 确认存在脏数据时,用户可以在作业编辑的【任务配置】中打开脏数据归档以忽略脏数据。
  • 确认为字段映射问题时,需要及时修改Hudi建表,修改主键为可用字段。

相关文档