Cold-Hot Separation Commands

The following content describes how to use the commands related to cold-hot separation, including shell commands and Java API commands.

Shell commands are executed on the HBase client.

Setting the Hot and Cold Data Boundary of an HBase Table

Shell
- Create a table where data will be separately stored.
  create 'hot_cold_table', {NAME=>'f', COLD_BOUNDARY=>'86400'}
  
  Required parameters are as follows:
  - NAME indicates the column family that requires cold-hot separation.
  - COLD_BOUNDARY indicates the time boundary (in seconds) for separating cold and hot data. For example, if COLD_BOUNDARY is set to 86400, data that was written 86,400 seconds (one day) ago will be archived as cold data.
    
    The boundary time must be greater than the Major Compaction execution period. The default Major Compaction execution period is 7 days.
- Disable cold-hot separation.
  alter 'hot_cold_table', {NAME=>'f', COLD_BOUNDARY=>""}
- Enable cold-hot separation for an existing table or change the time boundary. The time boundary is measured in seconds.
  alter 'hot_cold_table', {NAME=>'f', COLD_BOUNDARY=>'86400'}
- Check whether cold-hot separation is enabled or whether the time boundary is successfully modified.
  desc 'hot_cold_table'
```
Table hot_cold_table is ENABLED
hot_cold_table
COLUMN FAMILIES DESCRIPTION
{NAME => 'f', VERSIONS => '1', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', MIN_VERSIONS => '0', REPLICATION_SCOPE => '0', BLOOMFILTER => 'ROW', IN_MEMORY => 'false', COMPRE
SSION => 'NONE', BLOCKCACHE => 'true', BLOCKSIZE => '65536', METADATA => {'COLD_BOUNDARY' => '1200'}}
1 row(s)
Quota is disabled
Took 0.0339 seconds
```

You must perform a major compaction before you move the data between the cold storage and the hot storage.

Writing Data

You can write data to a table with separated cold and hot storage in the same way that you write data to a regular table. Newly written data is stored in hot storage (HDFS). If the storage duration of a data record exceeds the value specified by the COLD_BOUNDARY parameter, the system automatically moves the data to cold storage (OBS) during the compaction process.

Insert a data record to a table.
Run the put command to insert a data record to the specified table. Specify the table name, primary key, custom column, and value. The following is an example:

put 'hot_cold_table','row1','cf:a','value1'

The following parameters are required in the command:
- hot_cold_table: table name
- row1: primary key
- cf: a: custom column
- value1: value to insert

Querying Data

Both cold data and hot data are in the same HBase table. You can query the data only on one table. You can configure TimeRange to specify the time range of the data you want to query. The system automatically determines whether the hot storage, cold storage, or both will be searched based on the specified time range. If the time range is not specified during the query, only cold storage will be searched. The throughput of reading cold data is lower than that of reading hot data.

The cold storage is used to archive data that is rarely accessed. If your cluster receives a large number of queries that hit cold data, you can check whether the time boundary (COLD_BOUNDARY) is set to an appropriate value. The query performance deteriorates if data that is frequently accessed are stored in the cold storage.
If you update a field in a row that is stored in the cold storage, the field is moved to the hot storage after the update. When this row is hit by a query that carries the HOT_ONLY hint or has a time range that is configured to hit hot data, only the updated field in the hot storage is returned. If you want the system to return the entire row, you must delete the HOT_ONLY hint from the query statement or make sure that the specified time range covers the time period from when this row is inserted to when this row is last updated. It is recommended that you do not update data that is stored in the cold storage.

Random queries with Get
- Shell
  - Query data in cold storage without the HOT_ONLY hint.
    get 'hot_cold_table', 'row1'
  - Query data in hot storage with the HOT_ONLY hint.
    get 'hot_cold_table', 'row1', {HOT_ONLY=>true}
  - Query data within a time range that is specified by TimeRange. The system determines whether the query hits cold or hot data based on the values of TIMERANGE and COLD_BOUNDARY.
    get 'hot_cold_table', 'row1', {TIMERANGE => [0, 1568203111265]}
    
    TimeRange specifies the query time range. The time in the range is a UNIX timestamp, which is the number of milliseconds that have elapsed since the Unix epoch.
SCAN queries
- Shell
  - Query data in cold storage without the HOT_ONLY hint.
    scan 'hot_cold_table', {STARTROW =>'row1', STOPROW=>'row9'}
  - Query data in hot storage with the HOT_ONLY hint.
    scan 'hot_cold_table', {STARTROW =>'row1', STOPROW=>'row9', HOT_ONLY=>true}
  - Query data within a time range that is specified by TimeRange. The system determines whether the query hits cold or hot data based on the values of TIMERANGE and COLD_BOUNDARY.
    scan 'hot_cold_table', {STARTROW =>'row1', STOPROW=>'row9', TIMERANGE => [0, 1568203111265]}
    
    TimeRange specifies the query time range. The time in the range is a UNIX timestamp, which is the number of milliseconds that have elapsed since the Unix epoch.
Prioritizing hot data query
HBase can search cold storage and hot storage for SCAN queries, for example, that are submitted to search all records of a customer. The query results are returned based on the timestamps when the data records are written in descending order. In most cases, hot data appears before cold data. If the SCAN queries do not carry the HOT_ONLY hint, HBase must scan data in both cold and hot storage. As a result, the query takes more time. If you prioritize hot data query, HBase preferentially queries hot data. Cold data is queried only when the number of rows in hot storage is less than the minimum number of rows to be queried. This reduces access to cold storage and improves the response speed.
- Shell
  scan 'hot_cold_table', {STARTROW =>'row1', STOPROW=>'row9',COLD_HOT_MERGE=>true}
Major compaction
- Shell
  - Merge hot data areas of all partitions in a table.
    major_compact 'hot_cold_table', nil, 'NORMAL', 'HOT'
  - Merge cold data areas of all partitions in a table.
    major_compact 'hot_cold_table', nil, 'NORMAL', 'COLD'
  - Merge hot and cold data areas of all partitions in a table.
    major_compact 'hot_cold_table', nil, 'NORMAL', 'ALL'