Updated on 2024-07-23 GMT+08:00

Creating Indexes

Scenario

  • For a table that does not have indexes, this tool allows you to add and create indexes.

Usage

Run the following command on the HBase client to add or create indexes to a table (the added or created indexes will be in the ACTIVE state):

hbase org.apache.hadoop.hbase.hindex.global.mapreduce.GlobalTableIndexer -Dtablename.to.index='table' -Dindexspecs.to.add='idx1=>cf1:[c1->string],[c2]#idx2=>cf2:[c1->string],[c2]#idx3=>cf1:[c1];cf2:[c1]' -Dindexspecs.covered.family.to.add='idx2=>cf1' -Dindexspecs.covered.to.add='idx1=>cf1:[c3],[c4]' -Dindexspecs.coveredallcolumn.to.add='idx3=>true' -Dindexspecs.splitkeys.to.set='idx1=>[\x010,\x011,\x012]#idx2=>[\x01a,\x01b,\x01c]#idx3=>[\x01d,\x01e,\x01f]'

The parameters are described as follows:

  • tablename.to.index: name of the data table for which an index is created
  • indexspecs.to.add: mapping between the index name and the index column in the data table (definition of index column)
  • (Optional) indexspecs.covered.to.add: column of the data table that is redundantly stored in an index table (definition of covering index column)
  • (Optional) indexspecs.covered.family.to.add: column family of the data table that is redundantly stored in an index table (definition of covering index column family)
  • (Optional) indexspecs.coveredallcolumn.to.add: all data in a data table that is redundantly stored in an index table (definition of all covering index columns)
  • (Optional) indexspecs.splitkeys.to.set: pre-partition split keys of an index table. Specify this parameter in case hotspotting occurs in the region of the index table. You can configure pre-partitioning using the following characters:
    • '#' separates indexes.
    • '[]' contains splitkeys.
    • ',' separates splitkeys.

      Each splitkey set for per-partitioning must start with \x01.

  • indexspecs.to.addandbuild (optional): Index data will be generated during data table creation. If the data table is large, do not enable this parameter. Use an index data generation tool instead.

    The parameters in the preceding command are described as follows:

    • idx1, idx2, and idx3 are index names.
    • cf1 and cf2 are column family names.
    • c1, c2, c3, and c4 are column names.
    • string indicates a data type. The value can be STRING, INTEGER, FLOAT, LONG, DOUBLE, SHORT, BYTE, or CHAR.
    • '#' separates indexes, ';' separates column families, and ';' separates column qualifiers.
    • The column name and its data type must be included in '[]'.
    • Column names and their data types are separated by ' - >'.
    • If the data type of a column is not specified, the default data type (string) will be used.