Updated on 2024-10-09 GMT+08:00

Deleting HBase Data in Batches Using BulkLoad

Scenario

Rows need to be deleted in batches based on the row key naming rule, row key scope, field name, and field value.

Procedure

Run the following command to delete the rows from row_start to row_stop and direct the output to /output/destdir/.

hbase com.huawei.hadoop.hbase.tools.bulkload.DeleteData  
  -Ddelete.rowkey.start="row_start"  
  -Ddelete.rowkey.stop="row_stop"   
  -Ddelete.hfile.output="/output/destdir/"  
  -Ddelete.qualifier="cf1,cf0:vch,cf0:lng:1000"  
  'table1'     
  • -Ddelete.rowkey.start="row_start": indicates that the start row number is row_start.
  • -Ddelete.rowkey.stop="row_stop": indicates that the end row number is row_stop.
  • -Ddelete.hfile.output="/output/destdir/": indicates that the output results are directed to /output/destdir/.
  • -Ddelete.qualifier="cf1,cf0:vch,cf0:lng:1000": indicates that all columns in column family cf1, the vch column in column family cf0, and the column whose value is 1,000 in column lng in column family cf0 are to be deleted.

If transparent encryption is configured for HBase, see 7 for precautions on batch deletion.

Run the following command to load HFiles:

hbase org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles <path/for/output> <tablename>

Precautions

  1. If indexes have been created for column qualifier, the field cannot be deleted in batches because batch deletion cannot be performed on fields where indexes are created.
  2. If you do not set output data file delete.hfile.output of the execution result, the default value is /tmp/deletedata/table name.