Help Center/ MapReduce Service/ Component Operation Guide (Normal)/ Using HBase/ HBase Troubleshooting/ Modified and Deleted Data Can Still Be Queried by the Scan Command
Updated on 2024-12-11 GMT+08:00

Modified and Deleted Data Can Still Be Queried by the Scan Command

Question

Why can I still query the modified and deleted data by running the following scan command?

scan '<table_name>',{FILTER=>"SingleColumnValueFilter('<column_family>','column',=,'binary:<value>')"}

Answer

When you query a table in HBase, all versions of queried column values are searched by default, including deleted or modified values. If a row fails to be hit (that is, the column cannot be matched in the row), HBase queries the row.

If you only need to query the latest value of a table and the rows that are hit, run the following statement:

scan '<table_name>',{FILTER=>"SingleColumnValueFilter('<column_family>','column',=,'binary:<value>',true,true)"}

This command filters out the rows that fail to be hit and queries the latest version of the table data. That is, the values before modification and deleted values are not queried.

The parameters of SingleColumnValueFilter are described as follows:

SingleColumnValueFilter(final byte[] family, final byte[] qualifier, final CompareOp compareOp, ByteArrayComparable comparator, final boolean filterIfMissing, final boolean latestVersionOnly)

Parameter description:

  • family: indicates the column family of the column you want to query.
  • qualifier: indicates the column you want to query.
  • compareOp: indicates the comparison operator, such as = and >.
  • comparator: indicates the target value to be searched for.
  • filterIfMissing: indicates whether a row is filtered if the column cannot be matched in this row. The default value is false.
  • latestVersionOnly: indicates whether only values of the latest version will be queried. The default value is false.