Using the LZC Compression Algorithm to Store HDFS Files
Scenario
File compression can reduce the space occupied by stored files, and accelerate data reading from disks and data transmission in the network. HDFS supports two default compression formats: Gzip and Snappy. For the new compression format Lempel-Ziv compression (LZC), this section describes its configuration procedures. This compression format enhances the Hadoop compression capability. For more information about Snappy, see https://code.google.com/p/snappy/.
This section applies to MRS 3.x or later.
Configuration Description
To make the LZC compression take effect, configure the following parameters in the core-site.xml file (for example, Client installation path/HDFS/hadoop/etc/hadoop/) of the client:
Parameter |
Description |
Default Value |
---|---|---|
io.compression.codecs |
To make LZC take effect, the following values are added to the existing compression format list: com.huawei.hadoop.datasight.io.compress.lzc.ZCodec
NOTE:
If more than one compression format is configured, use commas (,) to separate them. |
org.apache.hadoop.io.compress.BZip2Codec,org.apache.hadoop.io.compress.DefaultCodec,org.apache.hadoop.io.compress.DeflateCodec,org.apache.hadoop.io.compress.Lz4Codec,org.apache.hadoop.io.compress.SnappyCodec,org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.ZStandardCodec,com.huawei.hadoop.datasight.io.compress.lzc.ZCodec |
io.compression.codec.lzc.class |
To make the LZC compression format take effect, use the default value. If you configure this parameter, set it to com.huawei.hadoop.datasight.io.compress.lzc.ZCodec. |
com.huawei.hadoop.datasight.io.compress.lzc.ZCodec |
- LZC does not support FSImage and SequenceFile compression.
- HDFS provides multiple compression algorithms, including Gzip, LZ4, Snappy, and Bzip2. The compression ratio and decompression speed of these compression algorithms are as follows:
Compression ratio in descending order: Bzip2 > Gzip > LZ4 > Snappy
Decompression speed in descending order: LZ4 > Snappy > Gzip > Bzip2
- Application scenarios:
- In scenarios where speed is required, for example, intermediate data storage of MapReduce tasks, LZ4 and Snappy are recommended. However, Snappy is recommended in scenarios requiring high reliability.
- In scenarios where the compression ratio instead of compression speed is highly required, for example, cold data storage, Bzip2 or Gzip is recommended.
- Except LZC, the preceding compression algorithms can be implemented using Native (C language). The compression and decompression efficiency is high. You are advised to use the compression algorithm supporting Native implementation based on service scenarios.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot