Configuring HBase Data Compression and Encoding Formats
Scenario
HBase encodes data blocks in HFiles to reduce duplicate keys in Key-Value pairs, reducing used space. Currently, the following data block encoding modes are supported: NONE, PREFIX, DIFF, FAST_DIFF, and ROW_INDEX_V1. NONE indicates that data blocks are not encoded. HBase also supports compression algorithms for HFile compression. The following algorithms are supported by default: NONE, GZ, SNAPPY, and ZSTD. NONE indicates that HFiles are not compressed.
The two methods are used on the HBase column family. They can be used together or separately.
Prerequisites
- The HBase client has been installed in a directory, for example, /opt/client.
- If Kerberos authentication has been enabled for the cluster, you must have the corresponding operation permissions. For example, you must have the creation (C) or administration (A) permission on the corresponding namespace or higher-level items to create a table, and the creation (C) or administration (A) permission on the created table or higher-level items to modify a table. For details about how to grant permissions, see Creating HBase Roles.
Configuring HBase Data Compression and Encoding Formats
Setting data block encoding and compression algorithms during creation
- Method 1: Using hbase shell
- Log in to the node where the client is installed as the client installation user.
- Run the following command to go to the client directory:
- Run the following command to configure environment variables:
- If the Kerberos authentication is enabled for the current cluster, run the following command to authenticate the user. If Kerberos authentication is disabled for the current cluster, skip this step:
For example, kinit hbaseuser.
- Run the following command to log in to the HBase client:
- Create a table.
create 't1', {NAME => 'f1', COMPRESSION => 'SNAPPY', DATA_BLOCK_ENCODING => 'FAST_DIFF'}
- t1: indicates the table name.
- f1: indicates the column family name.
- SNAPPY: indicates the column family uses the SNAPPY compression algorithm.
- FAST_DIFF: indicates FAST_DIFF is used for encoding.
- The parameter in the braces specifies the column family. You can specify multiple column families using multiple braces and separate them by commas (,). For details about table creation statements, run the help 'create' statement in the HBase shell.
Setting or modifying the data block encoding mode and compression algorithm for an existing table
- Method 1: Using hbase shell
- Log in to the node where the client is installed as the client installation user.
- Run the following command to go to the client directory:
- Run the following command to configure environment variables:
- If the Kerberos authentication is enabled for the current cluster, run the following command to authenticate the user. If Kerberos authentication is disabled for the current cluster, skip this step:
For example, kinit hbaseuser.
- Run the following command to log in to the HBase client:
- Run the following command to modify the HBase table:
alter 't1', {NAME => 'f1', COMPRESSION => 'SNAPPY', DATA_BLOCK_ENCODING => 'FAST_DIFF'}
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.