Updated on 2024-11-29 GMT+08:00

Creating Indexes

Scenarios

  • If a large amount of data exists in a table, you can add an index on a column to accelerate data queries.
  • For a table that does not have indexes, this tool allows you to add and create indexes.

How to Use

Run the following command on the HBase client to add or create indexes to a table:

hbase org.apache.hadoop.hbase.hindex.global.mapreduce.GlobalTableIndexer -Dtablename.to.index='table' -Dindexspecs.to.add='idx1=>cf1:[c1->string],[c2]#idx2=>cf2:[c1->string],[c2]#idx3=>cf1:[c1];cf2:[c1]' -Dindexspecs.covered.family.to.add='idx2=>cf1' -Dindexspecs.covered.to.add='idx1=>cf1:[c3],[c4]' -Dindexspecs.coveredallcolumn.to.add='idx3=>true' -Dindexspecs.splitkeys.to.set='idx1=>[\x010,\x011,\x012]#idx2=>[\x01a,\x01b,\x01c]#idx3=>[\x01d,\x01e,\x01f]'

The parameters are described as follows:

  • tablename.to.index: name of the data table for which an index is created

    If the data table is empty when you use this parameter, the created index will be in ACTIVE state. Otherwise, the index will be in INACTIVE state.

  • indexspecs.to.addandbuild (optional): Index data will be generated during data table creation. If the data table is large, do not enable this parameter. Use an index data generation tool instead.

    Do not use this parameter together with indexspecs.to.add. When this parameter is used, the index will be in BUILDING state. After the index data is generated, it will be in ACTIVE state.

  • tablename.to.index: name of the data table for which an index is created
  • indexspecs.to.add: mapping between the index name and the index column in the data table (definition of index column)
  • (Optional) indexspecs.covered.to.add: column of the data table that is redundantly stored in an index table (definition of covering index column)
  • (Optional) indexspecs.covered.family.to.add: column family of the data table that is redundantly stored in an index table (definition of covering index column family)
  • (Optional) indexspecs.coveredallcolumn.to.add: all data in a data table that is redundantly stored in an index table (definition of all covering index columns)
  • (Optional) indexspecs.splitkeys.to.set: pre-partition split keys of an index table. Specify this parameter in case hotspotting occurs in the region of the index table. You can configure pre-partitioning using the following characters:
    • '#': separates indexes.
    • '[]' contains splitkeys.
    • ',' separates splitkeys.

      Each splitkey set for per-partitioning must start with \x01.

The parameters in the preceding command are described as follows:
  • idx1, idx2, and idx3 are index names.
  • cf1 and cf2 are column family names.
  • c1, c2, c3, and c4 are column names.
  • string indicates a data type. The value can be STRING, INTEGER, FLOAT, LONG, DOUBLE, SHORT, BYTE, or CHAR.
  • '#' is used to separate indexes, ';' is used to separate column families, and ',' is used to separate column qualifiers.
  • The column name and its data type must be included in '[]'.
  • Column names and their data types are separated by '->'.
  • If the data type of a column is not specified, the default data type (string) will be used.