Hudi作为目标端时,如果作业启动失败且错误信息包含“Custom defined extra column is missing in schema which is predefined in config”怎么办?
问题描述
用户在为作业配置附加字段后,作业启动失败。配置的附加字段名在Hudi表中未找到,错误信息中包含关键字“Custom defined extra column is missing in schema which is predefined in config”。
报错信息详情:
java.lang.
IllegalArgumentException: Custom defined extra column is missing in schema which is predefined in config
.
at com.huawei.clouds.dataarts.migration.connector.hudi.sink.schema.SchemaParser.parseExtraField(SchemaParser.java:120) ~[blob_p-b7cce999877870001e778eb40e09ff21b660374d-5d1a8c677d8652bd7d54ecbb6d3ae814:?]
at com.huawei.clouds.dataarts.migration.connector.hudi.sink.schema.SchemaParser.<init>(SchemaParser.java:70) ~[blob_p-b7cce999877870001e778eb40e09ff21b660374d-5d1a8c677d8652bd7d54ecbb6d3ae814:?]
at com.huawei.clouds.dataarts.migration.connector.hudi.sink.schema.RowDataExtendTool.<init>(RowDataExtendTool.java:130) ~[blob_p-b7cce999877870001e778eb40e09ff21b660374d-5d1a8c677d8652bd7d54ecbb6d3ae814:?]
at com.huawei.clouds.dataarts.migration.connector.hudi.sink.transform.RowDataToHoodieFunction.open(RowDataToHoodieFunction.java:177) ~[blob_p-b7cce999877870001e778eb40e09ff21b660374d-5d1a8c677d8652bd7d54ecbb6d3ae814:?]
at org.apache.flink.api.common.functions.util.FunctionUtils.openFunction(FunctionUtils.java:34) ~[flink-core-1.15.0-h0.cbu.dli.321.r2.jar:1.15.0-h0.cbu.dli.321.r2]
at org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.open(AbstractUdfStreamOperator.java:101) ~[flink-dist-1.15.0-h0.cbu.dli.321.r2.jar:1.15.0-h0.cbu.dli.321.r2]
at org.apache.flink.streaming.api.operators.AbstractProcessOperator.open(AbstractProcessOperator.java:68) ~[flink-dist-1.15.0-h0.cbu.dli.321.r2.jar:1.15.0-h0.cbu.dli.321.r2]
at org.apache.flink.streaming.runtime.tasks.RegularOperatorChain.initializeStateAndOpenOperators(RegularOperatorChain.java:107) ~[flink-dist-1.15.0-h0.cbu.dli.321.r2.jar:1.15.0-h0.cbu.dli.321.r2]
at org.apache.flink.streaming.runtime.tasks.StreamTask.restoreGates(StreamTask.java:713) ~[flink-dist-1.15.0-h0.cbu.dli.321.r2.jar:1.15.0-h0.cbu.dli.321.r2]
at org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$1.call(StreamTaskActionExecutor.java:55) ~[flink-dist-1.15.0-h0.cbu.dli.321.r2.jar:1.15.0-h0.cbu.dli.321.r2]
at org.apache.flink.streaming.runtime.tasks.StreamTask.restoreInternal(StreamTask.java:688) ~[flink-dist-1.15.0-h0.cbu.dli.321.r2.jar:1.15.0-h0.cbu.dli.321.r2]
at org.apache.flink.streaming.runtime.tasks.StreamTask.restore(StreamTask.java:654) ~[flink-dist-1.15.0-h0.cbu.dli.321.r2.jar:1.15.0-h0.cbu.dli.321.r2]
at org.apache.flink.runtime.taskmanager.Task.runWithSystemExitMonitoring(Task.java:958) ~[flink-dist-1.15.0-h0.cbu.dli.321.r2.jar:1.15.0-h0.cbu.dli.321.r2]
at org.apache.flink.runtime.taskmanager.Task.restoreAndInvoke(Task.java:927) [flink-dist-1.15.0-h0.cbu.dli.321.r2.jar:1.15.0-h0.cbu.dli.321.r2]
at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:751) [flink-dist-1.15.0-h0.cbu.dli.321.r2.jar:1.15.0-h0.cbu.dli.321.r2]
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:573) [flink-dist-1.15.0-h0.cbu.dli.321.r2.jar:1.15.0-h0.cbu.dli.321.r2]
at java.lang.Thread.run(Thread.java:750) [?:1.8.0_362]
2025-07-28 12:00:28,736 WARN org.apache.flink.runtime.taskmanager.Task [] - Call stack:
at java.lang.Thread.getStackTrace(Thread.java:1564)
at org.apache.flink.runtime.taskmanager.Task.transitionState(Task.java:1139)
at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:801)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:573)
at java.lang.Thread.run(Thread.java:750)
原因分析
用户配置的附加字段必须是Hudi表中的已有字段。在作业启动时会进行校验,如果附加字段在Hudi表中不存在,校验将不通过,导致作业异常。
解决方案
用户需要核查作业编辑配置中的附加字段名是否为Hudi表中已存在的字段,及时修改附加字段名或重建Hudi表。