Help Center/ MapReduce Service/ Component Operation Guide (Normal)/ Using HDFS/ HDFS Performance Tuning/ Using the LZC Compression Algorithm to Store HDFS Files
Updated on 2025-10-11 GMT+08:00

Using the LZC Compression Algorithm to Store HDFS Files

Scenario

File compression can reduce the space occupied by stored files, and accelerate data reading from disks and data transmission in the network. HDFS supports two default compression formats: Gzip and Snappy. For the new compression format Lempel-Ziv compression (LZC), this section describes its configuration procedures. This compression format enhances the Hadoop compression capability. For more information about Snappy, visit http://google.github.io/snappy/.

  • HDFS provides multiple compression algorithms, including Gzip, LZ4, Snappy, and Bzip2. The compression ratio and decompression speed of these compression algorithms are as follows:

    Compression ratio in descending order: Bzip2 > Gzip > LZ4 > Snappy

    Decompression speed in descending order: LZ4 > Snappy > Gzip > Bzip2

  • Application scenarios:
    • In scenarios where speed is required, for example, intermediate data storage of MapReduce tasks, LZ4 and Snappy are recommended. In high reliability scenarios, Snappy is recommended.
    • In scenarios where the compression ratio is highly required, for example, cold data storage, Bzip2 or Gzip is recommended.
  • Except LZC, these compression algorithms can be implemented using Native (C language), providing more efficient compression and decompression. You are advised to use the compression algorithm implemented by Native based on service scenarios.

Notes and Constraints

  • This section applies to MRS 3.x or later.
  • LZC cannot compress files in FSImage or SequenceFile format.

Prerequisites

The client containing the HDFS has been installed. For details, see Using an MRS Client.

Configuring the LZC Compression Algorithm

  1. Log in to the node where the client is installed as the root user.
  2. Run the following command to modify the client configuration file core-site.xml:

    vi Client installation path/HDFS/hadoop/etc/hadoop/core-site.xml

  3. Modify the following parameters as required and save the settings.

    Table 1 Parameters

    Parameter

    Description

    Default Value

    io.compression.codecs

    To make LZC take effect, the following values are added to the existing compression format list:

    com.huawei.hadoop.datasight.io.compress.lzc.ZCodec.

    If more than one compression format is configured, use commas (,) to separate them.

    org.apache.hadoop.io.compress.BZip2Codec,org.apache.hadoop.io.compress.DefaultCodec,org.apache.hadoop.io.compress.DeflateCodec,org.apache.hadoop.io.compress.Lz4Codec,org.apache.hadoop.io.compress.SnappyCodec,org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.ZStandardCodec,com.huawei.hadoop.datasight.io.compress.lzc.ZCodec

    io.compression.codec.lzc.class

    To make the LZC compression format take effect, use the default value. If you configure this parameter, set it to com.huawei.hadoop.datasight.io.compress.lzc.ZCodec.

    com.huawei.hadoop.datasight.io.compress.lzc.ZCodec