Data in ro and rt Tables Cannot Be Synchronized to a MOR Table Recreated After Being Deleted Using Spark SQL
Question
After a MOR table is deleted using Spark SQL and then re-created, data in ro and rt tables cannot be synchronized to the MOR table in real time. The following error information is displayed:
WARN HiveSyncTool: Got runtime exception when hive syncing, but continuing as ignoreExceptions config is set java.lang.IllegalArgumentException: Failed to get schema for table hudi_table2_ro does not exist at org.apache.hudi.hive.HoodieHiveClient.getTableSchema(HoodieHiveClient.java:183) at org.apache.hudi.hive.HiveSyncTool.syncHoodieTable(HiveSyncTool.java:286) at org.apache.hudi.hive.HiveSyncTool.doSync(HiveSyncTool.java:213)
Answer
Cause:
To reduce access to Hive Metastore, a cache mechanism is added for Hudi tables. By default, data is cached for 1 hour. So, after a MOR table is deleted using Spark SQL and then recreated, data in ro and rt tables cannot be synchronized to the MOR table in real time.
Solution:
Set hoodie.datasource.hive_sync.interval to 0.
set hoodie.datasource.hive_sync.interval=0;
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.