文档首页/ 数据治理中心 DataArts Studio/ 常见问题/ 数据集成(实时作业)/ 在实时集成作业中,作业启动失败报错信息包含关键字“Found commits after time :20250728163331006, please rollback greater commits first”怎么办?
更新时间:2025-11-03 GMT+08:00
分享

在实时集成作业中,作业启动失败报错信息包含关键字“Found commits after time :20250728163331006, please rollback greater commits first”怎么办?

问题描述

在实时集成作业中,作业启动失败报错信息包含关键字“Found commits after time :20250728163331006, please rollback greater commits first”。

报错信息详情:

com.huawei.clouds.dataarts.shaded.org.apache.hudi.exception.HoodieRollbackException: Failed to rollback obs://akc-ma-test/tangchuan/mysql2hudi/aaa/llch96.db/rds_source_tbl_961_0726 commits 20250728163331006
    at com.huawei.clouds.dataarts.shaded.org.apache.hudi.client.BaseHoodieWriteClient.rollback(BaseHoodieWriteClient.java:962) ~[?:?]
    at com.huawei.clouds.dataarts.shaded.org.apache.hudi.client.BaseHoodieWriteClient.rollbackFailedWrites(BaseHoodieWriteClient.java:1376) ~[?:?]
    at com.huawei.clouds.dataarts.shaded.org.apache.hudi.client.BaseHoodieWriteClient.rollbackFailedWrites(BaseHoodieWriteClient.java:1346) ~[?:?]
    at com.huawei.clouds.dataarts.shaded.org.apache.hudi.client.BaseHoodieWriteClient.rollbackFailedWrites(BaseHoodieWriteClient.java:1334) ~[?:?]
    at com.huawei.clouds.dataarts.shaded.org.apache.hudi.client.BaseHoodieWriteClient.lambda$startCommit$afea71c0$1(BaseHoodieWriteClient.java:1115) ~[?:?]
    at com.huawei.clouds.dataarts.shaded.org.apache.hudi.common.util.CleanerUtils.rollbackFailedWrites(CleanerUtils.java:157) ~[?:?]
    at com.huawei.clouds.dataarts.shaded.org.apache.hudi.client.BaseHoodieWriteClient.startCommit(BaseHoodieWriteClient.java:1114) ~[?:?]
    at com.huawei.clouds.dataarts.migration.connector.hudi.sink.StreamWriteOperatorCoordinator.startInstant(StreamWriteOperatorCoordinator.java:631) ~[?:?]
    at com.huawei.clouds.dataarts.migration.connector.hudi.sink.StreamWriteOperatorCoordinator.lambda$initInstant$9(StreamWriteOperatorCoordinator.java:666) ~[?:?]
    at com.huawei.clouds.dataarts.shaded.org.apache.hudi.sink.utils.NonThrownExecutor.lambda$execute$0(NonThrownExecutor.java:99) ~[?:?]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_362]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_362]
    at java.lang.Thread.run(Thread.java:750) [?:1.8.0_362]
Caused by: com.huawei.clouds.dataarts.shaded.org.apache.hudi.exception.
HoodieRollbackException: Found commits after time :20250728163331006, please rollback greater commits first
    at com.huawei.clouds.dataarts.shaded.org.apache.hudi.table.action.rollback.BaseRollbackActionExecutor.validateRollbackCommitSequence(BaseRollbackActionExecutor.java:189) ~[?:?]
    at com.huawei.clouds.dataarts.shaded.org.apache.hudi.table.action.rollback.BaseRollbackActionExecutor.doRollbackAndGetStats(BaseRollbackActionExecutor.java:228) ~[?:?]
    at com.huawei.clouds.dataarts.shaded.org.apache.hudi.table.action.rollback.BaseRollbackActionExecutor.runRollback(BaseRollbackActionExecutor.java:124) ~[?:?]
    at com.huawei.clouds.dataarts.shaded.org.apache.hudi.table.action.rollback.BaseRollbackActionExecutor.execute(BaseRollbackActionExecutor.java:151) ~[?:?]
    at com.huawei.clouds.dataarts.migration.connector.hudi.table.HoodieFlinkMergeOnReadTable.rollback(HoodieFlinkMergeOnReadTable.java:143) ~[?:?]
    at com.huawei.clouds.dataarts.shaded.org.apache.hudi.client.BaseHoodieWriteClient.rollback(BaseHoodieWriteClient.java:945) ~[?:?]
    ... 12 more
2025-07-28 16:49:52,495 INFO  org.apache.flink.runtime.jobmaster.JobMaster                 [] - Trying to recover from a global failure.
org.apache.flink.util.FlinkException: Global failure triggered by OperatorCoordinator for 'Sink: bucket_write_llch96.rds_source_tbl_961_0726' (operator 2460c47fb2039644fa5ad82006f0efce).
    at org.apache.flink.runtime.operators.coordination.OperatorCoordinatorHolder$LazyInitializedCoordinatorContext.failJob(OperatorCoordinatorHolder.java:556) ~[flink-dist-1.15.0-h0.cbu.dli.321.r2.jar:1.15.0-h0.cbu.dli.321.r2]
    at com.huawei.clouds.dataarts.migration.connector.hudi.sink.StreamWriteOperatorCoordinator.lambda$null$1(StreamWriteOperatorCoordinator.java:256) ~[?:?]
    at com.huawei.clouds.dataarts.shaded.org.apache.hudi.sink.utils.NonThrownExecutor.lambda$execute$0(NonThrownExecutor.java:109) ~[?:?]
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_362]
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_362]
    at java.lang.Thread.run(Thread.java:750) ~[?:1.8.0_362]
Caused by: com.huawei.clouds.dataarts.shaded.org.apache.hudi.exception.HoodieException: Executor executes action [initialize instant ] error
    ... 5 more
Caused by: com.huawei.clouds.dataarts.shaded.org.apache.hudi.exception.HoodieRollbackException: Failed to rollback obs://akc-ma-test/tangchuan/mysql2hudi/aaa/llch96.db/rds_source_tbl_961_0726 commits 20250728163331006
    at com.huawei.clouds.dataarts.shaded.org.apache.hudi.client.BaseHoodieWriteClient.rollback(BaseHoodieWriteClient.java:962) ~[?:?]
    at com.huawei.clouds.dataarts.shaded.org.apache.hudi.client.BaseHoodieWriteClient.rollbackFailedWrites(BaseHoodieWriteClient.java:1376) ~[?:?]
    at com.huawei.clouds.dataarts.shaded.org.apache.hudi.client.BaseHoodieWriteClient.rollbackFailedWrites(BaseHoodieWriteClient.java:1346) ~[?:?]
    at com.huawei.clouds.dataarts.shaded.org.apache.hudi.client.BaseHoodieWriteClient.rollbackFailedWrites(BaseHoodieWriteClient.java:1334) ~[?:?]
    at com.huawei.clouds.dataarts.shaded.org.apache.hudi.client.BaseHoodieWriteClient.lambda$startCommit$afea71c0$1(BaseHoodieWriteClient.java:1115) ~[?:?]
    at com.huawei.clouds.dataarts.shaded.org.apache.hudi.common.util.CleanerUtils.rollbackFailedWrites(CleanerUtils.java:157) ~[?:?]
    at com.huawei.clouds.dataarts.shaded.org.apache.hudi.client.BaseHoodieWriteClient.startCommit(BaseHoodieWriteClient.java:1114) ~[?:?]
    at com.huawei.clouds.dataarts.migration.connector.hudi.sink.
StreamWriteOperatorCoordinator.startInstant
(StreamWriteOperatorCoordinator.java:631) ~[?:?]
    at com.huawei.clouds.dataarts.migration.connector.hudi.sink.StreamWriteOperatorCoordinator.lambda$initInstant$9(StreamWriteOperatorCoordinator.java:666) ~[?:?]
    at com.huawei.clouds.dataarts.shaded.org.apache.hudi.sink.utils.NonThrownExecutor.lambda$execute$0(NonThrownExecutor.java:99) ~[?:?]
    ... 3 more
Caused by: com.huawei.clouds.dataarts.shaded.org.apache.hudi.exception.HoodieRollbackException: Found commits after time :20250728163331006, please rollback greater commits first
    at com.huawei.clouds.dataarts.shaded.org.apache.hudi.table.action.rollback.BaseRollbackActionExecutor.validateRollbackCommitSequence(BaseRollbackActionExecutor.java:189) ~[?:?]
    at com.huawei.clouds.dataarts.shaded.org.apache.hudi.table.action.rollback.BaseRollbackActionExecutor.doRollbackAndGetStats(BaseRollbackActionExecutor.java:228) ~[?:?]
    at com.huawei.clouds.dataarts.shaded.org.apache.hudi.table.action.rollback.BaseRollbackActionExecutor.runRollback(BaseRollbackActionExecutor.java:124) ~[?:?]
    at com.huawei.clouds.dataarts.shaded.org.apache.hudi.table.action.rollback.BaseRollbackActionExecutor.execute(BaseRollbackActionExecutor.java:151) ~[?:?]
    at com.huawei.clouds.dataarts.migration.connector.hudi.table.HoodieFlinkMergeOnReadTable.rollback(HoodieFlinkMergeOnReadTable.java:143) ~[?:?]
    at com.huawei.clouds.dataarts.shaded.org.apache.hudi.client.BaseHoodieWriteClient.rollback(BaseHoodieWriteClient.java:945) ~[?:?]
    at com.huawei.clouds.dataarts.shaded.org.apache.hudi.client.BaseHoodieWriteClient.rollbackFailedWrites(BaseHoodieWriteClient.java:1376) ~[?:?]
    at com.huawei.clouds.dataarts.shaded.org.apache.hudi.client.BaseHoodieWriteClient.rollbackFailedWrites(BaseHoodieWriteClient.java:1346) ~[?:?]
    at com.huawei.clouds.dataarts.shaded.org.apache.hudi.client.BaseHoodieWriteClient.rollbackFailedWrites(BaseHoodieWriteClient.java:1334) ~[?:?]
    at com.huawei.clouds.dataarts.shaded.org.apache.hudi.client.BaseHoodieWriteClient.lambda$startCommit$afea71c0$1(BaseHoodieWriteClient.java:1115) ~[?:?]
    at com.huawei.clouds.dataarts.shaded.org.apache.hudi.common.util.CleanerUtils.rollbackFailedWrites(CleanerUtils.java:157) ~[?:?]
    at com.huawei.clouds.dataarts.shaded.org.apache.hudi.client.BaseHoodieWriteClient.startCommit(BaseHoodieWriteClient.java:1114) ~[?:?]
    at com.huawei.clouds.dataarts.migration.connector.hudi.sink.StreamWriteOperatorCoordinator.startInstant(StreamWriteOperatorCoordinator.java:631) ~[?:?]
    at com.huawei.clouds.dataarts.migration.connector.hudi.sink.StreamWriteOperatorCoordinator.lambda$initInstant$9(StreamWriteOperatorCoordinator.java:666) ~[?:?]
    at com.huawei.clouds.dataarts.shaded.org.apache.hudi.sink.utils.NonThrownExecutor.lambda$execute$0(NonThrownExecutor.java:99) ~[?:?]
    ... 3 more

原因分析

可能存在多个作业并发操作了同一张Hudi表,导致Hudi表的timeline混乱,在一个未完成的commit之后出现了已完成的commit。在对应Hudi表的.hoodie目录下可以看到错乱的Hudi timeline。

图1 查看路径

解决方案

参考图1 查看路径,删除未完成的instant手动进行rollback,然后重新启动作业。

注意:此场景通常发生在多个作业同时写入同一张Hudi表时,可能会导致数据丢失。用户需要及时核对数据,确认是否需要补充数据。

相关文档