How Can I Delete Rows Based On Prefixes?

You can delete rows with keys matching a given prefix on GeminiDB HBase instances. This function takes effect quickly. You do not need to scan data before deleting it anymore. Unlike Apache HBase that supports only single-row deletion, GeminiDB HBase API allows you to delete rows based on prefixes, which is more convenient and faster. This capability provides significant advantages for tasks like range data clearance and historical data purging.

Incorrect use of this function may have a significant impact on data. Before using this function, ensure that you have fully understood the following usage notes.

Usage Notes (Mandatory)

A message is displayed, indicating the data has been deleted as expected, but it is not deleted immediately and is only marked for deletion. The marked data needs to be gradually deleted in stored procedures, and generated range tombstones will also be deleted. Therefore, to ensure that this does not affect database performance, you need to comply with the following conventions:

Do not repeatedly delete and write a single range of data in a short timeframe.
Do not delete massive volumes of data in a short timeframe.
Do not scan data that has been deleted.
Verify the range in advance to prevent accidental deletion of a large amount of data.

Typical Violations

In a specific data range, rows with 1 billion keys matching a given prefix are deleted 50,000 times within a day. At the same time, a large amount of data in the same range is written.
Rows are deleted based on a given prefix 100,000 or more times within a day.
A large amount of data is deleted by mistake based on short prefixes (for example, 0 or a) without verification.

Severe violations will increase the read latency, result in failed requests, and affect read and write performance. You need to check the service status in a timely manner. Before final deletion, verify the results in a test environment.

If the preceding issues occur for a large amount of data, stop using this function immediately and consult experts. In the upper right corner of the console, choose Service Tickets > Create Service Ticket and contact the customer service.

How to Use

You can add additional attributes to mark deletion requests as those based on prefixes. After the requests are marked, only key takes effect. Other parameters, such as the specified column and qualifier, will not take effect. Data that matches a prefix is deleted immediately.

Currently, this function can be used only through the Java HBase client. In the following Java code, all keys starting with row1 will be deleted.

Delete delete = new Delete(Bytes.toBytes("row1"));
delete.setAttribute("PREFIXDELETE", "true".getBytes(StandardCharsets.UTF_8));
table.delete(delete);

FAQs

Q: If a request times out or fails, has my data been deleted?
A: GeminiDB HBase API does not provide transactions and cannot ensure atomicity. If a request fails, the target data may be completely or partially deleted. If the request is successful, all data is deleted. If the failure is caused by network disconnection or other reasons, try again.
Q: How can I perform a large number of deletions for historical data based on prefixes?
A: Specify the range of historical data to be deleted. Verification in a test environment is recommended to prevent unexpected data deletion. No more than 2,000 times per day are recommended for this function. In a short period of time, a few deletions based on prefixes can be applied to massive volumes of data, which can meet your requirements. Continuously check the read latency while data is deleted. If there is any exception, stop deleting data immediately.

Parent Topic: Getting Started with GeminiDB HBase API

Previous topic: Buying and Connecting to a GeminiDB HBase Instance

Next topic: Working with GeminiDB HBase API