Why Cannot Data Be Queried After the MapReduce Engine Is Switched After the Tez Engine Is Used to Execute Union-related Statements?

Question

Hive uses the Tez engine to execute union-related statements to write data. After Hive is switched to the MapReduce engine for query, no data is found.

Answer

When Hive uses the Tez engine to execute the union-related statement, the generated output file is stored in the HIVE_UNION_SUBDIR directory. After Hive is switched back to the MapReduce engine, files in the directory are not read by default. Therefore, data in the HIVE_UNION_SUBDIR directory is not read.

In this case, you can set mapreduce.input.fileinputformat.input.dir.recursive to true to enable union optimization and determine whether to read data in the directory.

Parent topic: Common Issues About Hive

Previous topic: Description of Hive Table Location (Either Be an OBS or HDFS Path)

Next topic: Why Does Hive Not Support Concurrent Data Writing to the Same Table or Partition?