Updated on 2024-12-11 GMT+08:00

Storage Configuration

Parameter

Description

Default Value

hoodie.parquet.max.file.size

Specifies the target size for Parquet files generated in Hudi write phases. For DFS, this parameter needs to be aligned with the underlying file system block size for optimal performance.

120 x 1024 x 1024 bytes

hoodie.parquet.block.size

Specifies the Parquet page size. Page is the unit of read in a Parquet file. In a block, pages are compressed separately.

120 x 1024 x 1024 bytes

hoodie.parquet.compression.ratio

Specifies the expected compression ratio of Parquet data when Hudi attempts to adjust the size of a new Parquet file. If the size of the file generated by bulk_insert is smaller than the expected size, increase the value.

0.1

hoodie.parquet.compression.codec

Specifies the name of the Parquet compression encoding or decoding mode. The default value is gzip. Possible options are [gzip | snappy | uncompressed | lzo].

snappy

hoodie.logfile.max.size

Specifies the maximum size of LogFile. It is the maximum size allowed for a log file before it is rolled over to the next version.

1GB

hoodie.logfile.data.block.max.size

Specifies the maximum size of a LogFile data block. It is the maximum size allowed for a single data block to be appended to a log file. It helps to ensure that the data appended to the log file is broken up into sizable blocks to prevent OOM errors. The size should be greater than the JVM memory.

256MB

hoodie.logfile.to.parquet.compression.ratio

Specifies the expected additional compression when records move from log files to Parquet files. It is used for MOR tables to send inserted content into log files and control the size of compacted Parquet files.

0.35