Configuring HBase Cold and Hot Data Separation Using Java APIs
HBase supports cold and hot data separation. Cold and hot data can be stored in different media, improving data query efficiency and reducing data storage costs. This section describes how to configure HBase cold and hot data separation using JAVA APIs.
Prerequisites
- You have created an HBase cluster by referring to Creating an HBase Cluster.
- You have installed an HBase client.
Step 1: Enabling HBase Cold and Hot Data Separation
- Log in to the CloudTable management console.
- Select a region in the upper left corner.
- Click Buy Cluster in the upper right corner.
- On the Buy Cluster page, set Database Engine to HBase and select Enable Hot/Cold in Advanced Feature. The cold and hot separation feature is enabled for the created cluster.
Figure 1 Enabling cold and hot data separation
Step 2: Specifying a Time Boundary for a Table
- Access the HBase cluster through the Java APIs APIs described in .
- Set the time boundary for separating hot and cold data.
- Create a table that separately stores cold and hot data. COLD_BOUNDARY specifies the time boundary for separating cold and hot data. The time boundary is measured in seconds. In this example, new data is archived as cold data after one day.
Admin admin = connection.getAdmin(); TableName tableName = TableName.valueOf("hot_cold_table"); HTableDescriptor descriptor = new HTableDescriptor(tableName); HColumnDescriptor cf = new HColumnDescriptor("f"); cf.setValue(HColumnDescriptor.COLD_BOUNDARY, "86400"); descriptor.addFamily(cf); admin.createTable(descriptor);
- Disable cold and hot data separation.
HTableDescriptor descriptor = admin.getTableDescriptor(tableName); HColumnDescriptor cf = descriptor.getFamily("f".getBytes()); cf.setValue(HColumnDescriptor.COLD_BOUNDARY, null); admin.modifyTable(tableName, descriptor);
- Enable cold and hot data separation for an existing table or change the time boundary. COLD_BOUNDARY specifies the time boundary for separating cold and hot data. The time boundary is measured in seconds. In this example, new data is archived as cold data after one day.
HTableDescriptor descriptor = admin.getTableDescriptor(tableName); HColumnDescriptor cf = descriptor.getFamily("f".getBytes()); cf.setValue(HColumnDescriptor.COLD_BOUNDARY, "86400"); admin.modifyTable(tableName, descriptor);
- Create a table that separately stores cold and hot data. COLD_BOUNDARY specifies the time boundary for separating cold and hot data. The time boundary is measured in seconds. In this example, new data is archived as cold data after one day.
Step 3: Inserting Data
You can write data to a table that separately stores cold and hot data in a similar manner that you write data to a standard table. When the data is written to a table, new data is stored in the hot storage (EVS disks). If the storage duration of the data exceeds the value specified by the COLD_BOUNDARY parameter, the system automatically moves the data to the cold storage (OBS) during the major compaction process.
Write data using the Java APIs.
Step 4: Querying Data
CloudTable HBase allows you to use a table to store cold and hot data. You can query data only from one table. You can configure TimeRange to specify the time range of the data that you want to query. The system automatically determines whether the target data is hot or cold based on the time range that you specify and choose the optimal query mode. If the time range is not specified during the query, cold data will be queried. The throughput of reading cold data is lower than the throughput of reading hot data.
The cold storage is used only to archive data that is rarely accessed. If your cluster receives a large number of queries that hit cold data, you can check whether the time boundary (COLD_BOUNDARY) is set to an appropriate value. The query performance deteriorates if data that is frequently accessed are stored in the cold storage.
If you update a field in a row that is stored in the cold storage, the field is moved to the hot storage after the update. When this row is hit by a query that carries the HOT_ONLY hint or has a time range that is configured to hit hot data, only the updated field in the hot storage is returned. If you want the system to return the entire row, you must delete the HOT_ONLY hint from the query statement or make sure that the specified time range covers the time period from when this row is inserted to when this row is last updated. It is recommended that you do not update data that is stored in the cold storage.
- Get
- The query that does not contain the HOT_ONLY hint may hit cold data.
Get get = new Get("row1".getBytes());
- The query that contains the HOT_ONLY hint hits only hot data.
Get get = new Get("row1".getBytes()); get.setAttribute(HBaseConstants.HOT_ONLY, Bytes.toBytes(true));
- Query data within a time range that is specified by the TimeRange parameter. The system determines whether the query hits cold or hot data based on the values of the TimeRange and COLD_BOUNDARY parameters.
Get get = new Get("row1".getBytes()); get.setTimeRange(0, 1568203111265)
TimeRange specifies the query time range. The time in the range is a UNIX timestamp, which is the number of milliseconds that have elapsed since the Unix epoch.
- The query that does not contain the HOT_ONLY hint may hit cold data.
- Range query scan
- The query that does not contain the HOT_ONLY hint may hit cold data.
TableName tableName = TableName.valueOf("chsTable"); Table table = connection.getTable(tableName); Scan scan = new Scan(); ResultScanner scanner = table.getScanner(scan);
- The query that contains the HOT_ONLY hint hits only hot data.
Scan scan = new Scan(); scan.setAttribute(HBaseConstants.HOT_ONLY, Bytes.toBytes(true));
- Query data within a time range that is specified by the TimeRange parameter. The system determines whether the query hits cold or hot data based on the values of the TimeRange and COLD_BOUNDARY parameters.
Scan scan = new Scan(); scan.setTimeRange(0, 1568203111265);
TimeRange specifies the query time range. The time in the range is a UNIX timestamp, which is the number of milliseconds that have elapsed since the Unix epoch.
- The query that does not contain the HOT_ONLY hint may hit cold data.
- Prioritizing hot data selection
CloudTable may look up to cold and hot data for SCAN queries, for example, queries that are submitted to search all records of a customer. The query results are paginated based on the timestamps of the data in descending order. In most cases, hot data appears before cold data. If the SCAN queries do not carry the HOT_ONLY hint, CloudTable must scan cold and hot data. As a result, the query response time increases. If you prioritize hot data selection, CloudTable preferentially queries hot data and cold data is queried only if you want to view more query results. In this way, the frequency of cold data access is minimized and the response time is reduced.
TableName tableName = TableName.valueOf("hot_cold_table"); Table table = connection.getTable(tableName); Scan scan = new Scan(); scan.setAttribute(HBaseConstants.COLD_HOT_MERGE, Bytes.toBytes(true)); scanner = table.getScanner(scan);
- Major compaction
- Merge hot data areas of all partitions in a table.
Admin admin = connection.getAdmin(); TableName tableName = TableName.valueOf("hot_cold_table"); admin. majorCompact (tableName,null, CompactType.NORMAL, CompactionScopeType.HOT);
- Merge cold data areas of all partitions in a table.
Admin admin = connection.getAdmin(); TableName tableName = TableName.valueOf("hot_cold_table"); admin. majorCompact (tableName,null, CompactType.NORMAL, CompactionScopeType.COLD);
- Merge hot and cold data areas of all partitions in a table.
Admin admin = connection.getAdmin(); TableName tableName = TableName.valueOf("hot_cold_table"); admin. majorCompact (tableName,null, CompactType.NORMAL, CompactionScopeType.ALL);
- Merge hot data areas of all partitions in a table.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot