Cold-Hot Separation Commands

The following content describes how to use the commands related to cold-hot separation, including shell commands and Java API commands.

Shell commands are executed on the HBase client. For details about how to install the client, see Installing a Client.

Setting the Hot and Cold Data Boundary of an HBase Table

Shell
- Create a table where data will be separately stored.
  create 'hot_cold_table', {NAME=>'f', COLD_BOUNDARY=>'86400'}
  
  Required parameters are as follows:
  - NAME indicates the column family that requires cold-hot separation.
  - COLD_BOUNDARY indicates the time boundary (in seconds) for separating cold and hot data. For example, if COLD_BOUNDARY is set to 86400, data that was written 86,400 seconds (one day) ago will be archived as cold data.
    
    The boundary time must be greater than the Major Compaction execution period. The default Major Compaction execution period is 7 days.
- Disable cold-hot separation.
  alter 'hot_cold_table', {NAME=>'f', COLD_BOUNDARY=>""}
- Set cold and hot separation for an existing table or modify the cold and hot data boundary (unit: second) to convert between hot storage and cold storage. The following are some examples:
  - Convert hot data to cold data.
    1. Archive data that has been written to column f of the hot_cold_table table in more than one day (86400 seconds) to cold storage.
      alter 'hot_cold_table', {NAME=>'f', COLD_BOUNDARY=>'86400'}
    2. Perform Major Compaction during off-peak hours to avoid affecting service performance.
      major_compact 'hot_cold_table'
  - Convert cold data to hot data.
    1. Change the value of COLD_BOUNDARY to 172800 to archive data that has been written to column f of the hot_cold_table table in more than one day and less than two days from the cold storage to the hot storage. Set this parameter as you need.
      alter 'hot_cold_table', {NAME=>'f', COLD_BOUNDARY=>'172800'}
    2. Perform Major Compaction during off-peak hours to avoid affecting service performance.
      major_compact 'hot_cold_table'
- Check whether cold-hot separation is enabled or whether the time boundary is successfully modified.
  desc 'hot_cold_table'
```
Table hot_cold_table is ENABLED
hot_cold_table
COLUMN FAMILIES DESCRIPTION
{NAME => 'f', VERSIONS => '1', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', MIN_VERSIONS => '0', REPLICATION_SCOPE => '0', BLOOMFILTER => 'ROW', IN_MEMORY => 'false', COMPRE
SSION => 'NONE', BLOCKCACHE => 'true', BLOCKSIZE => '65536', METADATA => {'COLD_BOUNDARY' => '1200'}}
1 row(s)
Quota is disabled
Took 0.0339 seconds
```

Java API

Create a table where data will be separately stored.

COLD_BOUNDARY specifies the time boundary for separating cold and hot data. The time boundary is measured in seconds. In the following example, data that was written one day ago will be archived as cold data.

Admin admin = connection.getAdmin();
TableName tableName = TableName.valueOf("hot_cold_table");
HTableDescriptor descriptor = new HTableDescriptor(tableName);
HColumnDescriptor cf = new HColumnDescriptor("f");
cf.setValue(HColumnDescriptor.COLD_BOUNDARY, "86400");
descriptor.addFamily(cf);
admin.createTable(descriptor);

Disable cold-hot separation.

HTableDescriptor descriptor = admin.getTableDescriptor(tableName);
HColumnDescriptor cf = descriptor.getFamily("f".getBytes());
cf.setValue(HColumnDescriptor.COLD_BOUNDARY, null);
admin.modifyTable(tableName, descriptor);

Enable cold-hot separation for an existing table or change the time boundary.
COLD_BOUNDARY specifies the time boundary for separating cold and hot data. The time boundary is measured in seconds. In the following example, data that was written one day ago will be archived as cold data.
```
HTableDescriptor descriptor = admin.getTableDescriptor(tableName);
HColumnDescriptor cf = descriptor.getFamily("f".getBytes());
cf.setValue(HColumnDescriptor.COLD_BOUNDARY, "86400");
admin.modifyTable(tableName, descriptor);
```

You must perform a major compaction before you move the data between the cold storage and the hot storage.

Writing Data

You can write data to a table with separated cold and hot storage in the same way that you write data to a regular table. Newly written data is stored in hot storage (HDFS). If the storage duration of a data record exceeds the value specified by the COLD_BOUNDARY parameter, the system automatically moves the data to cold storage (OBS) during the compaction process.

Insert a data record to a table.
Run the put command to insert a data record to the specified table. Specify the table name, primary key, custom column, and value. The following is an example:

put 'hot_cold_table','row1','cf:a','value1'

The following parameters are required in the command:
- hot_cold_table: table name
- row1: primary key
- cf: a: custom column
- value1: value to insert

Querying Data

Both cold data and hot data are in the same HBase table. You can query the data only on one table. You can configure TimeRange to specify the time range of the data you want to query. The system automatically determines whether the hot storage, cold storage, or both will be searched based on the specified time range. If the time range is not specified during the query, only cold storage will be searched. The throughput of reading cold data is lower than that of reading hot data.

The cold storage is used to archive data that is rarely accessed. If your cluster receives a large number of queries that hit cold data, you can check whether the time boundary (COLD_BOUNDARY) is set to an appropriate value. The query performance deteriorates if data that is frequently accessed are stored in the cold storage.
If you update a field in a row that is stored in the cold storage, the field is moved to the hot storage after the update. When this row is hit by a query that carries the HOT_ONLY hint or has a time range that is configured to hit hot data, only the updated field in the hot storage is returned. If you want the system to return the entire row, you must delete the HOT_ONLY hint from the query statement or make sure that the specified time range covers the time period from when this row is inserted to when this row is last updated. It is recommended that you do not update data that is stored in the cold storage.

Random queries with Get
- Shell
  - Query data in cold storage without the HOT_ONLY hint.
    get 'hot_cold_table', 'row1'
  - Query data in hot storage with the HOT_ONLY hint.
    get 'hot_cold_table', 'row1', {HOT_ONLY=>true}
  - Query data within a time range that is specified by TimeRange. The system determines whether the query hits cold or hot data based on the values of TIMERANGE and COLD_BOUNDARY.
    get 'hot_cold_table', 'row1', {TIMERANGE => [0, 1568203111265]}
    
    TimeRange specifies the query time range. The time in the range is a UNIX timestamp, which is the number of milliseconds that have elapsed since the Unix epoch.
- Java API
  - Query data in cold storage without the HOT_ONLY hint.
```
Get get = new Get("row1".getBytes());
```
  - Query data in hot storage with the HOT_ONLY hint.
```
Get get = new Get("row1".getBytes());
get.setAttribute(HBaseConstants.HOT_ONLY, Bytes.toBytes(true));
```
  - Query data within a time range that is specified by TimeRange. The system determines whether the query hits cold or hot data based on the values of TimeRange and COLD_BOUNDARY.
```
Get get = new Get("row1".getBytes());
get.setTimeRange(0, 1568203111265)
```
    TimeRange specifies the query time range. The time in the range is a UNIX timestamp, which is the number of milliseconds that have elapsed since the Unix epoch.
SCAN queries
- Shell
  - Query data in cold storage without the HOT_ONLY hint.
    scan 'hot_cold_table', {STARTROW =>'row1', STOPROW=>'row9'}
  - Query data in hot storage with the HOT_ONLY hint.
    scan 'hot_cold_table', {STARTROW =>'row1', STOPROW=>'row9', HOT_ONLY=>true}
  - Query data within a time range that is specified by TimeRange. The system determines whether the query hits cold or hot data based on the values of TIMERANGE and COLD_BOUNDARY.
    scan 'hot_cold_table', {STARTROW =>'row1', STOPROW=>'row9', TIMERANGE => [0, 1568203111265]}
    
    TimeRange specifies the query time range. The time in the range is a UNIX timestamp, which is the number of milliseconds that have elapsed since the Unix epoch.
- Java API
  - Query data in cold storage without the HOT_ONLY hint.
```
TableName tableName = TableName.valueOf("chsTable");
Table table = connection.getTable(tableName);
Scan scan = new Scan();
ResultScanner scanner = table.getScanner(scan);
```
  - Query data in hot storage with the HOT_ONLY hint.
```
Scan scan = new Scan();
scan.setAttribute(HBaseConstants.HOT_ONLY, Bytes.toBytes(true));
```
  - Query data within a time range that is specified by TimeRange. The system determines whether the query hits cold or hot data based on the values of TimeRange and COLD_BOUNDARY.
```
Scan scan = new Scan();
scan.setTimeRange(0, 1568203111265);
```
    TimeRange specifies the query time range. The time in the range is a UNIX timestamp, which is the number of milliseconds that have elapsed since the Unix epoch.
Prioritizing hot data query
HBase can search cold storage and hot storage for SCAN queries, for example, that are submitted to search all records of a customer. The query results are returned based on the timestamps when the data records are written in descending order. In most cases, hot data appears before cold data. If the SCAN queries do not carry the HOT_ONLY hint, HBase must scan data in both cold and hot storage. As a result, the query takes more time. If you prioritize hot data query, HBase preferentially queries hot data. Cold data is queried only when the number of rows in hot storage is less than the minimum number of rows to be queried. This reduces access to cold storage and improves the response speed.
- Shell
  scan 'hot_cold_table', {STARTROW =>'row1', STOPROW=>'row9',COLD_HOT_MERGE=>true}
- Java API
```
TableName tableName = TableName.valueOf("hot_cold_table");
Table table = connection.getTable(tableName);
Scan scan = new Scan();
scan.setAttribute(HBaseConstants.COLD_HOT_MERGE, Bytes.toBytes(true));
scanner = table.getScanner(scan);
```

Major compaction

Shell
- Merge hot data areas of all partitions in a table.
  major_compact 'hot_cold_table', nil, 'NORMAL', 'HOT'
- Merge cold data areas of all partitions in a table.
  major_compact 'hot_cold_table', nil, 'NORMAL', 'COLD'
- Merge hot and cold data areas of all partitions in a table.
  major_compact 'hot_cold_table', nil, 'NORMAL', 'ALL'

Java API

Merge hot data areas of all partitions in a table.

Admin admin = connection.getAdmin();
TableName tableName = TableName.valueOf("hot_cold_table");
admin. majorCompact (tableName,null, CompactType.NORMAL, CompactionScopeType.HOT);

Merge cold data areas of all partitions in a table.

Admin admin = connection.getAdmin();
TableName tableName = TableName.valueOf("hot_cold_table");
admin. majorCompact (tableName,null, CompactType.NORMAL, CompactionScopeType.COLD);

Merge hot and cold data areas of all partitions in a table.

Admin admin = connection.getAdmin();
TableName tableName = TableName.valueOf("hot_cold_table");
admin. majorCompact (tableName,null, CompactType.NORMAL, CompactionScopeType.ALL);

Parent topic: Configuring Hot-Cold Data Separate in HBase

Previous topic: Configuring Separate Storage for HBase Cold and Hot Data

Next topic: Configuring RSGroup to Manage RegionServer Resource