Deleting Rows Based on Prefixes
When storing and managing massive data volumes, users often need to clear expired or unnecessary data. Apache HBase only supports single-row deletion. When a large amount of data needs to be deleted, the entire table must be scanned. This process is complex and inefficient. You can delete rows with keys matching a given prefix on GeminiDB HBase instances. This function takes effect quickly. You do not need to scan data before deleting it anymore. GeminiDB HBase API allows you to delete rows based on prefixes, which is more convenient and faster. This capability provides significant advantages for tasks like range data clearance and historical data purging. GeminiDB HBase API allows you to delete rows based on prefixes, which is efficient and fast.
 
   Incorrect use of this function may have a significant impact on data. Before using this function, ensure that you have fully understood the following usage notes.
Usage Notes (Mandatory)
A message is displayed, indicating the data has been deleted as expected, but it is not deleted immediately and is only marked for deletion. The marked data needs to be gradually deleted in stored procedures, and generated range tombstones will also be deleted. Therefore, to ensure that this does not affect database performance, you need to comply with the following conventions:
- Do not repeatedly delete and write a single range of data within a short period of time.
 - Do not delete massive volumes of data within a short period of time.
 - Do not scan data that has been deleted.
 - Verify the range in advance to prevent accidental deletion of a large amount of data.
 
After data is deleted by prefixes, it is not removed from the storage immediately. Instead, the database gradually eliminates the marked data in stored procedures and clears the tombstones generated during range deletion.
To prevent background operations from affecting database performance, you are advised to comply with the following conventions:
- Do not repeatedly delete and write the same data range in a short period of time. Frequent operations will quickly accumulate tombstones, increasing deletion workloads in the background.
 - Do not initiate a large number of range deletions in a short period of time. Centralized deletions will generate a large number of tombstones, which may block normal read and write operations.
 - Do not scan data that has been deleted by range. Skipping tombstones during data scan will reduce the query efficiency.
 - Verify the conditions and range before range deletion. Pre-verification can prevent a large amount of data from being deleted by mistake, reducing the cost of subsequent restoration.
 
Typical Violations
- If 50,000 prefix-based deletions are initiated within one day for keys in a specific range of 1 billion data records while a large amount of data in the same range is written, two performance issues may occur.
 - Initiating 100,000 or more large-scale prefix-based deletions within one day (for example, millions or tens of millions of keys are deleted each time) will causes a surge in database load.
 - Unverified short prefix-based (for example, deleting data with prefix 0 or a) deletions can lead to unintended data loss, as short prefixes match a wide range of entries.
 
 
   Severe violations will increase the read latency, result in failed requests, and affect read and write performance. You need to check the service status in a timely manner. Before final deletion, verify the results in a test environment.
If the preceding issues occur for a large amount of data, stop using this function immediately and consult experts. In the upper right corner of the console, choose Service Tickets > Create Service Ticket and contact the customer service.
Usage Guide
You can add additional attributes to mark deletion requests as those based on prefixes. After a deletion request is marked as a prefix-based deletion request, only the key parameter is used as the basis for prefix matching. Other parameters (such as the specified column and qualifier) are not involved in the execution logic. All data that matches the prefix will be deleted.
Currently, prefix-based deletion can be called only through Java HBase Client. The following is a code example of key steps. After the deletion, all keys starting with row1 will be deleted.
Delete delete = new Delete(Bytes.toBytes("row1"));
delete.setAttribute("PREFIXDELETE", "true".getBytes(StandardCharsets.UTF_8));
table.delete(delete);
  FAQs
- Q: If a request times out or fails, has my data been deleted? 
     
A: GeminiDB HBase APIs do not support transactions, and operation atomicity cannot be ensured. If a request fails, the target data may be completely or partially deleted. If the request is successful, all data has been deleted. If the request fails due to network disconnection or other reasons, you are advised to try again.
 - Q: How can I perform a large number of deletions for historical data based on prefixes? 
     
A: First, you are advised to specify the range of historical data to be deleted. You are strongly advised to verify the effect of prefix-based deletion in the test environment to avoid errors. No more than 2,000 deletions per day are recommended. In a short period of time, a few deletions based on prefixes can be applied to massive volumes of data, which can meet your requirements. While deleting data based on prefixes, monitor read latency of your workloads. If any exception occurs, stop the deletion immediately.
 
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot