Updated on 2022-12-14 GMT+08:00

Parquet/Avro schema Is Reported When Updated Data Is Written

Question

The following error is reported when data is written:

org.apache.parquet.io.InvalidRecordException: Parquet/Avro schema mismatch: Avro field 'col1' not found

Answer

You are advised to evolve schemas in backward compatible mode while using Hudi. This error usually occurs when you delete some columns, such as col1, in backward incompatible mode and then update col1 written with the old schema in the Parquet file. In this case, the Parquet file attempts to search for all the current fields in the input record, if col1 does not exist, the preceding exception is thrown.

To solve this problem, create an uber schema using all the schema versions evolved and use this uber schema as the target schema. You can obtain a schema from Hive MetaStore and merge it with the current schema.