Using the LZC Compression Algorithm to Store HDFS Files
Scenario
File compression can reduce the space occupied by stored files, and accelerate data reading from disks and data transmission in the network. HDFS supports two default compression formats: Gzip and Snappy. For the new compression format Lempel-Ziv compression (LZC), this section describes its configuration procedures. This compression format enhances the Hadoop compression capability. For more information about Snappy, visit http://google.github.io/snappy/.
- HDFS provides multiple compression algorithms, including Gzip, LZ4, Snappy, and Bzip2. The compression ratio and decompression speed of these compression algorithms are as follows:
Compression ratio in descending order: Bzip2 > Gzip > LZ4 > Snappy
Decompression speed in descending order: LZ4 > Snappy > Gzip > Bzip2
- Application scenarios:
- In scenarios where speed is required, for example, intermediate data storage of MapReduce tasks, LZ4 and Snappy are recommended. In high reliability scenarios, Snappy is recommended.
- In scenarios where the compression ratio is highly required, for example, cold data storage, Bzip2 or Gzip is recommended.
- Except LZC, these compression algorithms can be implemented using Native (C language), providing more efficient compression and decompression. You are advised to use the compression algorithm implemented by Native based on service scenarios.
Notes and Constraints
- This section applies to MRS 3.x or later.
- LZC cannot compress files in FSImage or SequenceFile format.
Prerequisites
The client containing the HDFS has been installed. For details, see Using an MRS Client.
Configuring the LZC Compression Algorithm
- Log in to the node where the client is installed as the root user.
- Run the following command to modify the client configuration file core-site.xml:
vi Client installation path/HDFS/hadoop/etc/hadoop/core-site.xml
- Modify the following parameters as required and save the settings.
Table 1 Parameters Parameter
Description
Default Value
io.compression.codecs
To make LZC take effect, the following values are added to the existing compression format list:
com.huawei.hadoop.datasight.io.compress.lzc.ZCodec.
If more than one compression format is configured, use commas (,) to separate them.
org.apache.hadoop.io.compress.BZip2Codec,org.apache.hadoop.io.compress.DefaultCodec,org.apache.hadoop.io.compress.DeflateCodec,org.apache.hadoop.io.compress.Lz4Codec,org.apache.hadoop.io.compress.SnappyCodec,org.apache.hadoop.io.compress.GzipCodec,org.apache.hadoop.io.compress.ZStandardCodec,com.huawei.hadoop.datasight.io.compress.lzc.ZCodec
io.compression.codec.lzc.class
To make the LZC compression format take effect, use the default value. If you configure this parameter, set it to com.huawei.hadoop.datasight.io.compress.lzc.ZCodec.
com.huawei.hadoop.datasight.io.compress.lzc.ZCodec
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot