Configuring the Compression Format of a Parquet Table
Scenario
The compression format of a Parquet table can be configured as follows:
- If the Parquet table is a partitioned one, set the parquet.compression parameter of the Parquet table to specify the compression format. For example, set tblproperties in the table creation statement: "parquet.compression"="snappy".
- If the Parquet table is a non-partitioned one, set the spark.sql.parquet.compression.codec parameter to specify the compression format. The configuration of the parquet.compression parameter is invalid, because the value of the spark.sql.parquet.compression.codec parameter is read by the parquet.compression parameter. If the spark.sql.parquet.compression.codec parameter is not configured, the default value is snappy and will be read by the parquet.compression parameter.
Therefore, the spark.sql.parquet.compression.codec parameter can only be used to set the compression format of a non-partitioned Parquet table.
Configuration parameters
Navigation path for setting parameters:
On Manager, choose Cluster > Services > Spark, click Configurations then All Configurations, and enter a parameter name in the search box.
Parameter |
Description |
Default Value |
---|---|---|
spark.sql.parquet.compression.codec |
Used to set the compression format of a non-partitioned Parquet table. |
snappy |
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot