Storage Configuration
Parameter |
Description |
Default Value |
---|---|---|
hoodie.parquet.max.file.size |
Specifies the target size for Parquet files generated in Hudi write phases. For DFS, this parameter needs to be aligned with the underlying file system block size for optimal performance. |
120 x 1024 x 1024 bytes |
hoodie.parquet.block.size |
Specifies the Parquet page size. Page is the unit of read in a Parquet file. In a block, pages are compressed separately. |
120 x 1024 x 1024 bytes |
hoodie.parquet.compression.ratio |
Specifies the expected compression ratio of Parquet data when Hudi attempts to adjust the size of a new Parquet file. If the size of the file generated by bulk_insert is smaller than the expected size, increase the value. |
0.1 |
hoodie.parquet.compression.codec |
Specifies the name of the Parquet compression encoding or decoding mode. The default value is gzip. Possible options are [gzip | snappy | uncompressed | lzo]. |
gzip |
hoodie.logfile.max.size |
Specifies the maximum size of LogFile. It is the maximum size allowed for a log file before it is rolled over to the next version. |
1GB |
hoodie.logfile.data.block.max.size |
Specifies the maximum size of a LogFile data block. It is the maximum size allowed for a single data block to be appended to a log file. It helps to ensure that the data appended to the log file is broken up into sizable blocks to prevent OOM errors. The size should be greater than the JVM memory. |
256MB |
hoodie.logfile.to.parquet.compression.ratio |
Specifies the expected additional compression when records move from log files to Parquet files. It is used for MOR tables to send inserted content into log files and control the size of compacted Parquet files. |
0.35 |
hoodie.parquet.compression.codec |
Specifies the compression encoding and decoding mode for Parquet files. |
gzip |
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.