Updated on 2024-12-24 GMT+08:00

Separating Cold and Hot Data

This section describes the cold and hot separation commands. For details about how to use basic HBase commands, see Introducing HBase Shell Commands.

Specifying a Time Boundary for a Table

  • Shell
    • Create a table that separately stores cold and hot data.
      hbase(main):002:0> create 'hot_cold_table', {NAME=>'f', COLD_BOUNDARY=>'86400'}

      Parameter description:

      • NAME indicates the column family that requires cold and hot separation.
      • COLD_BOUNDARY specifies the time boundary for separating cold and hot data. The time boundary is measured in seconds. For example, if COLD_BOUNDARY is set to 86400, new data is archived as cold data after 86,400 seconds, which is equal to one day.

        The time boundary must be longer than the major compaction execution period. The default execution period of major compactions is seven days.

    • Disable cold and hot data separation.
      hbase(main):004:0> alter 'hot_cold_table', {NAME=>'f', COLD_BOUNDARY=>""}
    • Enable cold and hot data separation for an existing table or change the time boundary. The time boundary is measured in seconds.
      hbase(main):005:0> alter 'hot_cold_table', {NAME=>'f', COLD_BOUNDARY=>'86400'}

      Check whether the cold and hot separation is enabled or modified successfully.

      hbase:002:0> desc 'hot_cold_table'
      Table hot_cold_table is ENABLED
      hot_cold_table
      COLUMN FAMILIES DESCRIPTION
      {NAME => 'f', VERSIONS => '1', KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'NONE', TTL => 'FOREVER', MIN_VERSIONS => '0', REPLICATION_SCOPE => '0', BLOOMFILTER => 'ROW', IN_MEMORY => 'false', COMPRE
      SSION => 'NONE', BLOCKCACHE => 'true', BLOCKSIZE => '65536', METADATA => {'COLD_BOUNDARY' => '1200'}}
      1 row(s)
      Quota is disabled
      Took 0.0339 seconds
  • Java API
    • Create a table that separately stores cold and hot data.
      • COLD_BOUNDARY specifies the time boundary for separating cold and hot data. The time boundary is measured in seconds. In this example, new data is archived as cold data after one day.
      Admin admin = connection.getAdmin();
      TableName tableName = TableName.valueOf("hot_cold_table");
      HTableDescriptor descriptor = new HTableDescriptor(tableName);
      HColumnDescriptor cf = new HColumnDescriptor("f");
      cf.setValue(HColumnDescriptor.COLD_BOUNDARY, "86400");
      descriptor.addFamily(cf);
      admin.createTable(descriptor);
    • Disable cold and hot data separation.
      HTableDescriptor descriptor = admin.getTableDescriptor(tableName);
      HColumnDescriptor cf = descriptor.getFamily("f".getBytes());
      cf.setValue(HColumnDescriptor.COLD_BOUNDARY, null);
      admin.modifyTable(tableName, descriptor);
    • Enable cold and hot data separation for an existing table or change the time boundary.

      COLD_BOUNDARY specifies the time boundary for separating cold and hot data. The time boundary is measured in seconds. In this example, new data is archived as cold data after one day.

      HTableDescriptor descriptor = admin.getTableDescriptor(tableName);
      HColumnDescriptor cf = descriptor.getFamily("f".getBytes());
      cf.setValue(HColumnDescriptor.COLD_BOUNDARY, "86400");
      admin.modifyTable(tableName, descriptor);

      You must perform a major compaction before you move the data between the cold storage and the hot storage.

Performing Data Write

You can write data to a table that separately stores cold and hot data in a similar manner that you write data to a standard table. When the data is written to a table, new data is stored in the hot storage (ultra-high I/O). If the storage duration of the data exceeds the value specified by the COLD_BOUNDARY parameter, the system automatically moves the data to the cold storage (common I/O) during the major compaction process.

  • Insert a piece of data record to a table.

    Run the put command to insert a piece of data record to the specified table. You need to specify the table name, primary key, customized column, and inserted value.

    hbase(main):004:0> put 'hot_cold_table','row1','cf:a','value1'
    0 row(s) in 0.2720 seconds

    The following describes parameters in the command:

    • hot_cold_table: table name
    • row1: primary key
    • cf: a: customized column
    • value1: inserted value

Performing Data Query

CloudTable HBase allows you to use a table to store cold and hot data. You can query data only from one table. You can configure TimeRange to specify the time range of the data that you want to query. The system automatically determines whether the target data is hot or cold based on the time range that you specify and choose the optimal query mode. If the time range is not specified during the query, cold data will be queried. The throughput of reading cold data is lower than the throughput of reading hot data.

  • The cold storage is used only to archive data that is rarely accessed. If your cluster receives a large number of queries that hit cold data, you can check whether the time boundary (COLD_BOUNDARY) is set to an appropriate value. The query performance deteriorates if data that is frequently accessed are stored in the cold storage.
  • If you update a field in a row that is stored in the cold storage, the field is moved to the hot storage after the update. When this row is hit by a query that carries the HOT_ONLY hint or has a time range that is configured to hit hot data, only the updated field in the hot storage is returned. If you want the system to return the entire row, you must delete the HOT_ONLY hint from the query statement or make sure that the specified time range covers the time period from when this row is inserted to when this row is last updated. It is recommended that you do not update data that is stored in the cold storage.
  • Get
    • Shell
      • The query that does not contain the HOT_ONLY hint may hit cold data.
        hbase(main):001:0> get 'hot_cold_table', 'row1'
      • The query that contains the HOT_ONLY hint hits only hot data.
        hbase(main):002:0> get 'hot_cold_table', 'row1', {HOT_ONLY=>true}
      • Query data within a time range that is specified by the TIMERANGE parameter. The system determines whether the query hits cold or hot data based on the values of the TIMERANGE and COLD_BOUNDARY parameters.
        hbase(main):003:0> get 'hot_cold_table', 'row1', {TIMERANGE => [0, 1568203111265]}

      TimeRange specifies the query time range. The time in the range is a UNIX timestamp, which is the number of milliseconds that have elapsed since the Unix epoch.

    • Java API
      • The query that does not contain the HOT_ONLY hint may hit cold data.
        Get get = new Get("row1".getBytes());
      • The query that contains the HOT_ONLY hint hits only hot data.
        Get get = new Get("row1".getBytes());
        get.setAttribute(HBaseConstants.HOT_ONLY, Bytes.toBytes(true));
      • Query data within a time range that is specified by the TimeRange parameter. The system determines whether the query hits cold or hot data based on the values of the TimeRange and COLD_BOUNDARY parameters.
        Get get = new Get("row1".getBytes());
        get.setTimeRange(0, 1568203111265)

        TimeRange specifies the query time range. The time in the range is a UNIX timestamp, which is the number of milliseconds that have elapsed since the Unix epoch.

  • SCAN queries
    • Shell
      • The query that does not contain the HOT_ONLY hint may hit cold data.
        hbase(main):001:0> scan 'hot_cold_table', {STARTROW =>'row1', STOPROW=>'row9'}
      • The query that contains the HOT_ONLY hint hits only hot data.
        hbase(main):002:0> scan 'hot_cold_table', {STARTROW =>'row1', STOPROW=>'row9', HOT_ONLY=>true}
      • Query data within a time range that is specified by the TimeRange parameter. The system determines whether the query hits cold or hot data based on the values of the TIMERANGE and COLD_BOUNDARY parameters.
        hbase(main):003:0> scan 'hot_cold_table', {STARTROW =>'row1', STOPROW=>'row9', TIMERANGE => [0, 1568203111265]}

      TimeRange specifies the query time range. The time in the range is a UNIX timestamp, which is the number of milliseconds that have elapsed since the Unix epoch.

    • Java API
      • The query that does not contain the HOT_ONLY hint may hit cold data.
        TableName tableName = TableName.valueOf("chsTable");
        Table table = connection.getTable(tableName);
        Scan scan = new Scan();
        ResultScanner scanner = table.getScanner(scan);
      • The query that contains the HOT_ONLY hint hits only hot data.
        Scan scan = new Scan();
        scan.setAttribute(HBaseConstants.HOT_ONLY, Bytes.toBytes(true));
      • Query data within a time range that is specified by the TimeRange parameter. The system determines whether the query hits cold or hot data based on the values of the TimeRange and COLD_BOUNDARY parameters.
        Scan scan = new Scan();
        scan.setTimeRange(0, 1568203111265);

        TimeRange specifies the query time range. The time in the range is a UNIX timestamp, which is the number of milliseconds that have elapsed since the Unix epoch.

  • Prioritizing hot data selection

    CloudTable may look up to cold and hot data for SCAN queries, for example, queries that are submitted to search all records of a customer. The query results are paginated based on the timestamps of the data in descending order. In most cases, hot data appears before cold data. If the SCAN queries do not carry the HOT_ONLY hint, CloudTable must scan cold and hot data. As a result, the query response time increases. If you prioritize hot data selection, CloudTable preferentially queries hot data and cold data is queried only if you want to view more query results. In this way, the frequency of cold data access is minimized and the response time is reduced.

    • Shell
      hbase(main):001:0> scan 'hot_cold_table', {STARTROW =>'row1', STOPROW=>'row9',COLD_HOT_MERGE=>true}
    • Java API
      TableName tableName = TableName.valueOf("hot_cold_table");
      Table table = connection.getTable(tableName);
      Scan scan = new Scan();
      scan.setAttribute(HBaseConstants.COLD_HOT_MERGE, Bytes.toBytes(true));
      scanner = table.getScanner(scan);
  • Major compaction
    • Shell
      • Merge hot data areas of all partitions in a table.
        hbase(main):002:0> major_compact 'hot_cold_table', nil, 'NORMAL', 'HOT'
      • Merge cold data areas of all partitions in a table.
        hbase(main):002:0> major_compact 'hot_cold_table', nil, 'NORMAL', 'COLD'
      • Merge hot and cold data areas of all partitions in a table.
        hbase(main):002:0> major_compact 'hot_cold_table', nil, 'NORMAL', 'ALL'
    • Java API
      • Merge hot data areas of all partitions in a table.
        Admin admin = connection.getAdmin();
        TableName tableName = TableName.valueOf("hot_cold_table");
        admin. majorCompact (tableName,null, CompactType.NORMAL, CompactionScopeType.HOT);
      • Merge cold data areas of all partitions in a table.
        Admin admin = connection.getAdmin();
        TableName tableName = TableName.valueOf("hot_cold_table");
        admin. majorCompact (tableName,null, CompactType.NORMAL, CompactionScopeType.COLD);
      • Merge hot and cold data areas of all partitions in a table.
        Admin admin = connection.getAdmin();
        TableName tableName = TableName.valueOf("hot_cold_table");
        admin. majorCompact (tableName,null, CompactType.NORMAL, CompactionScopeType.ALL);