Updated on 2024-11-29 GMT+08:00

Hive Supporting ZSTD Compression Formats

Zstandard (ZSTD) is an open-source lossless data compression algorithm. Its compression performance and compression ratio are better than those of other compression algorithms supported by Hadoop. Hive with this feature supports tables in ZSTD compression formats. The ZSTD compression formats supported by Hive include ORC, RCFile, TextFile, JsonFile, Parquet, Sequence, and CSV.

You can create a table in ZSTD compression format as follows:

  • To create a table in ORC format, specify TBLPROPERTIES("orc.compress"="zstd").

    create table tab_1(...) stored as orc TBLPROPERTIES("orc.compress"="zstd");

  • To create a table in Parquet format, specify TBLPROPERTIES("parquet.compression"="zstd").

    create table tab_2(...) stored as parquet TBLPROPERTIES("parquet.compression"="zstd");

  • To create a table in other formats or common formats, run the following commands to set the compress.codec parameters to org.apache.hadoop.io.compress.ZStandardCode.

    set hive.exec.compress.output=true;

    set mapreduce.map.output.compress=true;

    set mapreduce.map.output.compress.codec=org.apache.hadoop.io.compress.ZStandardCodec;

    set mapreduce.output.fileoutputformat.compress=true;

    set mapreduce.output.fileoutputformat.compress.codec=org.apache.hadoop.io.compress.ZStandardCodec;

    set hive.exec.compress.intermediate=true;

    create table tab_3(...) stored as textfile;

  • The SQL operations on a table compressed using ZSTD are the same as those on a common compressed table. Addition, deletion, query, and aggregation are supported.
  • To default the compression format of Parquet tables to ZSTD, run the following command on the Hive Beeline client:

    set hive.parquet.default.compression.codec=zstd;

    This command is applied to the current session only.