Updated on 2025-07-24 GMT+08:00

Configuring HBase Cold and Hot Data Separation Using HBase Shell

HBase supports cold and hot data separation. Cold and hot data can be stored in different media, improving data query efficiency and reducing data storage costs. This section describes how to configure HBase cold and hot data separation using HBase Shell.

Prerequisites

Step 1: Enabling HBase Cold and Hot Data Separation

  1. Log in to the CloudTable management console.
  2. Select a region in the upper left corner.
  3. Click Buy Cluster in the upper right corner.
  4. On the Buy Cluster page, set Database Engine to HBase and select Enable Hot/Cold in Advanced Feature. The cold and hot separation feature is enabled for the created cluster.

    Figure 1 Enabling cold and hot data separation

Step 2: Setting the Cold and Hot Data Separation Boundary

  1. Connect to the HBase cluster. For details, see Connecting to an HBase Normal Cluster Using HBase Shell.
  2. Set the time boundary for separating hot and cold data. The time boundary must be longer than the major compaction execution period. The default execution period of major compactions is seven days.

    • Create a table that separately stores cold and hot data.
      hbase(main):002:0> create 'hot_cold_table', {NAME=>'f', COLD_BOUNDARY=>'86400'}

      Parameter description:

      • NAME indicates the column family that requires cold and hot separation.
      • COLD_BOUNDARY specifies the time boundary for separating cold and hot data. The time boundary is measured in seconds. For example, if COLD_BOUNDARY is set to 86400, new data is archived as cold data after 86,400 seconds, which is equal to one day.
    • Disable cold and hot data separation.
      hbase(main):004:0> alter 'hot_cold_table', {NAME=>'f', COLD_BOUNDARY=>""}
    • Enable cold and hot data separation for an existing table or change the time boundary. The time boundary is measured in seconds.
      hbase(main):005:0> alter 'hot_cold_table', {NAME=>'f', COLD_BOUNDARY=>'86400'}

  3. Check whether the cold and hot separation is enabled or modified successfully.

    hbase(main):002:0> desc 'hot_cold_table'
    Table hot_cold_table is ENABLED
    hot_cold_table
    COLUMN FAMILIES DESCRIPTION
    {NAME => 'f', VERSIONS => '1', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', MIN_VERSIONS => '0', REPLICATION_SCOPE => '0', BLOOMFILTER => 'ROW', IN_MEMORY => 'false', COMPRE
    SSION => 'NONE', BLOCKCACHE => 'true', BLOCKSIZE => '65536', METADATA => {'COLD_BOUNDARY' => '86400'}}
    1 row(s)
    Quota is disabled
    Took 0.0339 seconds

Step 3: Inserting Data

You can write data to a table that separately stores cold and hot data in a similar manner that you write data to a standard table. When the data is written to a table, new data is stored in the hot storage (EVS disks). If the storage duration of the data exceeds the value specified by the COLD_BOUNDARY parameter, the system automatically moves the data to the cold storage (OBS) during the major compaction process.

Run the put command to insert a piece of data record to the specified table. You need to specify the table name, primary key, customized column, and inserted value.

hbase(main):004:0> put 'hot_cold_table','row1','cf:a','value1'
0 row(s) in 0.2720 seconds

The following describes parameters in the command:

  • hot_cold_table: table name
  • row1: primary key
  • cf: a: customized column
  • value1: inserted value

Step 4: Querying Data

CloudTable HBase allows you to use a table to store cold and hot data. You can query data only from one table. You can configure TimeRange to specify the time range of the data that you want to query. The system automatically determines whether the target data is hot or cold based on the time range that you specify and choose the optimal query mode. If the time range is not specified during the query, cold data will be queried. The throughput of reading cold data is lower than the throughput of reading hot data.

The cold storage is used only to archive data that is rarely accessed. If your cluster receives a large number of queries that hit cold data, you can check whether the time boundary (COLD_BOUNDARY) is set to an appropriate value. The query performance deteriorates if data that is frequently accessed are stored in the cold storage.

If you update a field in a row that is stored in the cold storage, the field is moved to the hot storage after the update. When this row is hit by a query that carries the HOT_ONLY hint or has a time range that is configured to hit hot data, only the updated field in the hot storage is returned. If you want the system to return the entire row, you must delete the HOT_ONLY hint from the query statement or make sure that the specified time range covers the time period from when this row is inserted to when this row is last updated. It is recommended that you do not update data that is stored in the cold storage.

  • Random queries with Get
    • Do not specify HOT_ONLY to query data. In this case, data in cold storage is queried.
      hbase(main):001:0> get 'hot_cold_table', 'row1'
    • Specify HOT_ONLY to query data. In this case, only data in hot storage is queried.
      hbase(main):002:0> get 'hot_cold_table', 'row1', {HOT_ONLY=>true}
    • Query data within a time range that is specified by the TIMERANGE parameter. The system determines whether the query hits cold or hot data based on the values of the TIMERANGE and COLD_BOUNDARY parameters.
      hbase(main):003:0> get 'hot_cold_table', 'row1', {TIMERANGE => [0, 1568203111265]}

      TimeRange specifies the query time range. The time in the range is a UNIX timestamp, which is the number of milliseconds that have elapsed since the Unix epoch.

  • Range query scan
    • Do not specify HOT_ONLY to query data. In this case, data in cold storage is queried.
      hbase(main):001:0> scan 'hot_cold_table', {STARTROW =>'row1', STOPROW=>'row9'}
    • Specify HOT_ONLY to query data. In this case, only data in hot storage is queried.
      hbase(main):002:0> scan 'hot_cold_table', {STARTROW =>'row1', STOPROW=>'row9', HOT_ONLY=>true}
    • Query data within a time range that is specified by the TIMERANGE parameter. The system determines whether the query hits cold or hot data based on the values of the TIMERANGE and COLD_BOUNDARY parameters.
      hbase(main):003:0> scan 'hot_cold_table', {STARTROW =>'row1', STOPROW=>'row9', TIMERANGE => [0, 1568203111265]}

      TimeRange specifies the query time range. The time in the range is a UNIX timestamp, which is the number of milliseconds that have elapsed since the Unix epoch.

  • Prioritizing hot data selection
    CloudTable may look up cold and hot data for SCAN queries, for example, queries that are submitted to search all records of a customer. The query results are paginated based on the timestamps of the data in descending order. In most cases, hot data appears before cold data. If the SCAN queries do not carry the HOT_ONLY hint, CloudTable must scan cold and hot data. As a result, the query response time increases. When hot data query is prioritized, CloudTable will preferentially retrieve data from hot storage. Cold storage data is only queried if the number of rows in hot storage falls below the specified minimum query threshold. In this way, the frequency of cold data access is minimized and the response time is reduced.
    hbase(main):001:0> scan 'hot_cold_table', {STARTROW =>'row1', STOPROW=>'row9',COLD_HOT_MERGE=>true}
  • Major compaction
    • Merge hot data areas of all partitions in a table.
      hbase(main):002:0> major_compact 'hot_cold_table', nil, 'NORMAL', 'HOT'
    • Merge cold data areas of all partitions in a table.
      hbase(main):002:0> major_compact 'hot_cold_table', nil, 'NORMAL', 'COLD'
    • Merge hot and cold data areas of all partitions in a table.
      hbase(main):002:0> major_compact 'hot_cold_table', nil, 'NORMAL', 'ALL'