Updated on 2024-11-15 GMT+08:00

How Do I Merge Small Files?

If a large number of small files are generated during SQL execution, job execution and table query will take a long time. In this case, you should merge small files.

  1. Set the configuration item as follows:

    spark.sql.shuffle.partitions = Number of partitions (number of the generated small files in this case)

  2. Execute the following SQL statements:
    INSERT OVERWRITE TABLE tablename
    select  * FROM  tablename distribute by rand()