Parquet/Avro schema Is Reported When Updated Data Is Written
Question
The following error is reported when data is written:
org.apache.parquet.io.InvalidRecordException: Parquet/Avro schema mismatch: Avro field 'col1' not found
Answer
You are advised to evolve schemas in backward compatible mode while using Hudi. This error usually occurs when you delete some columns, such as col1, in backward incompatible mode and then update col1 written with the old schema in the Parquet file. In this case, the Parquet file attempts to search for all the current fields in the input record, if col1 does not exist, the preceding exception is thrown.
To solve this problem, create an uber schema using all the schema versions evolved and use this uber schema as the target schema. You can obtain a schema from Hive MetaStore and merge it with the current schema.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot