Updated on 2023-05-06 GMT+08:00

Configuring LZC Compression

Scenario

File compression can reduce the space occupied by stored files, and fasten data reading from disks and data transmission in the network. HDFS supports two default compression formats: Gzip and Snappy. For the new compression format Lempel-Ziv compression (LZC), this section describes its configuration procedures. This compression format enhances the Hadoop compression capability. For more information about Snappy, see https://code.google.com/p/snappy/.

This section applies to MRS 3.x or later.

Configuration Description

To make the LZC compression take effect, configure the following parameters in the core-site.xml file (for example, Client installation path/HDFS/hadoop/etc/hadoop/) of the client:

Table 1 Parameter Description

Parameter

Description

Default Value

io.compression.codecs

To make LZC take effect, the following values are added to the existing compression format list:

com.huawei.hadoop.datasight.io.compress.lzc.ZCodec

NOTE:

If more than one compression format is configured, use commas (,) to separate them.

org.apache.hadoop.io.compress.BZip2Codec,org.apache.hadoop.io.compress.DefaultCodec,org.apache.hadoop.io.compress.DeflateCodec,org.apache.hadoop.io.compress.Lz4Codec,org.apache.hadoop.io.compress.SnappyCodec,org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.ZStandardCodec,com.huawei.hadoop.datasight.io.compress.lzc.ZCodec

io.compression.codec.lzc.class

To make the LZC compression format take effect, use the default value. If you configure this parameter, set it to com.huawei.hadoop.datasight.io.compress.lzc.ZCodec.

com.huawei.hadoop.datasight.io.compress.lzc.ZCodec

  1. LZC does not support FSImage and SequenceFile compression.
  2. HDFS provides multiple compression algorithms, including Gzip, LZ4, Snappy, and Bzip2. The compression ratio and decompression speed of these compression algorithms are as follows:

    Compression ratio in descending order: Bzip2 > Gzip > LZ4 > Snappy

    Decompression speed in descending order: LZ4 > Snappy > Gzip > Bzip2

  3. Application scenarios:
    • In scenarios where speed is required, for example, intermediate data storage of MapReduce tasks, LZ4 and Snappy are recommended. However, Snappy is recommended in scenarios requiring high reliability.
    • In scenarios where the compression ratio instead of compression speed is highly required, for example, cold data storage, Bzip2 or Gzip is recommended.
  4. Except LZC, the preceding compression algorithms can be implemented using Native (C language). The compression and decompression efficiency is high. You are advised to use the compression algorithm supporting Native implementation based on service scenarios.