Help Center/ MapReduce Service/ Component Operation Guide (LTS)/ Using Spark/Spark2x/ Spark Troubleshooting/ What Should I Do If Statistics of Hudi or Hive Tables Created Using Spark SQLs Are Empty Before Data Is Inserted?
Updated on 2024-12-13 GMT+08:00

What Should I Do If Statistics of Hudi or Hive Tables Created Using Spark SQLs Are Empty Before Data Is Inserted?

Question

When Spark SQLs are used to create Hudi or Hive tables, the table statistics are empty before data is inserted.

Answer

You can use either of the following methods to collect the statistics:

  1. Run the analyze command to trigger statistics collection. If no data is inserted, run the desc formatted table_name command to check whether the value of totalsize is 0 after the analyze command is executed.
  2. Set spark.sql.statistics.size.autoUpdate.enabled to true and insert data. Statistics collection will be triggered in the background.