Updated on 2024-11-29 GMT+08:00

Configuring HBase Data Compression and Encoding

Scenario

HBase encodes data blocks in HFiles to reduce duplicate keys in KeyValues, reducing used space. Currently, the following data block encoding modes are supported: NONE, PREFIX, DIFF, FAST_DIFF, and ROW_INDEX_V1. NONE indicates that data blocks are not encoded. HBase also supports compression algorithms for HFile compression. The following algorithms are supported by default: NONE, GZ, SNAPPY, and ZSTD. NONE indicates that HFiles are not compressed.

The two methods are used on the HBase column family. They can be used together or separately.

Prerequisites

  • The HBase client has been installed in a directory, for example, /opt/client.
  • If authentication has been enabled for HBase, you must have the corresponding operation permissions. For example, you must have the creation (C) or administration (A) permission on the corresponding namespace or higher-level items to create a table, and the creation (C) or administration (A) permission on the created table or higher-level items to modify a table. For details about how to grant permissions, see Creating HBase Roles.

Procedure

Setting data block encoding and compression algorithms during creation

  1. Log in to the node where the client is installed as the client installation user.
  2. Run the following command to go to the client directory:

    cd /opt/client

  3. Run the following command to configure environment variables:

    source bigdata_env

  4. If the Kerberos authentication is enabled for the current cluster, run the following command to authenticate the user. If Kerberos authentication is disabled for the current cluster, skip this step:

    kinit Component service user

    For example, kinit hbaseuser.

  5. Run the following HBase client command:

    hbase shell

  6. Create a table.
    create 't1', {NAME => 'f1', COMPRESSION => 'SNAPPY', DATA_BLOCK_ENCODING => 'FAST_DIFF'}
    • t1: indicates the table name.
    • f1: indicates the column family name.
    • SNAPPY: indicates the column family uses the SNAPPY compression algorithm.
    • FAST_DIFF: indicates FAST_DIFF is used for encoding.
    • The parameter in the braces specifies the column family. You can specify multiple column families using multiple braces and separate them by commas (,). For details about table creation statements, run the help 'create' statement in the HBase shell.

Setting or modifying the data block encoding mode and compression algorithm for an existing table

  1. Log in to the node where the client is installed as the client installation user.
  2. Run the following command to go to the client directory:

    cd /opt/client

  3. Run the following command to configure environment variables:

    source bigdata_env

  4. If the Kerberos authentication is enabled for the current cluster, run the following command to authenticate the user. If Kerberos authentication is disabled for the current cluster, skip this step:

    kinit Component service user

    For example, kinit hbaseuser.

  5. Run the following HBase client command:

    hbase shell

  6. Run the following command to modify the table:

    alter 't1', {NAME => 'f1', COMPRESSION => 'SNAPPY', DATA_BLOCK_ENCODING => 'FAST_DIFF'}