Updated on 2024-05-29 GMT+08:00

Creating an Index

Scenarios

  • If a large amount of data exists in a table, you can add an index to a column.
  • For user tables that do not have indexes, this tool allows you to add and build indexes at the same time.

How to Use

Run the following command on the HBase client to add or create an index. After the command is executed, the specified index is added to the table.

hbase org.apache.hadoop.hbase.hindex.global.mapreduce.GlobalTableIndexer -Dtablename.to.index='table' -Dindexspecs.to.add='idx1=>cf1:[c1->string],[c2]#idx2=>cf2:[c1->string],[c2]#idx3=>cf1:[c1];cf2:[c1]' -Dindexspecs.covered.family.to.add='idx2=>cf1' -Dindexspecs.covered.to.add='idx1=>cf1:[c3],[c4]' -Dindexspecs.coveredallcolumn.to.add='idx3=>true' -Dindexspecs.splitkeys.to.set='idx1=>[\x010,\x011,\x012]#idx2=>[\x01a,\x01b,\x01c]#idx3=>[\x01d,\x01e,\x01f]'

The parameters are described as follows:

  • tablename.to.index: Name of the data table for which an index is created

    When this parameter is used to create an index, if the data table is empty, the created index will be in ACTIVE state. Otherwise, the index will be in INACTIVE state.

  • indexspecs.to.addandbuild (optional): Generated index data during data table creation. If the data table is too large, enabling this parameter is not recommended. Use an index data generation tool instead.

    This parameter and tablename.to.index cannot be used at the same time. When this parameter is used, the index will be in BUILDING state. After the index data is generated, the index will be in ACTIVE state.

  • tablename.to.index: Name of the data table for which an index is created
  • indexspecs.to.add: Mapping between the index name and the column in the corresponding data table (definition of index column)
  • indexspecs.covered.to.add (optional): Column of the data table that is redundantly stored in an index (definition of overwrite column)
  • indexspecs.covered.family.to.add (optional): Column family of the data table that is redundantly stored in an index table (definition of overwrite column)
  • indexspecs.coveredallcolumn.to.add (optional): All data in a data table that is redundantly stored in an index table (definition of overwrite all columns)
  • indexspecs.splitkeys.to.set (optional): Pre-partition split point of an index table. Specify this parameter in case the Region index table becomes a hotspot. The format of pre-partition is as follows:
    • '#': separate indexes
    • '[]': contain splitkeys
    • ',': separate splitkeys

      Each splitkey of the pre-partition must start with \x01.

The parameters in the preceding command are described as follows:
  • idx1, idx2, and idx3: index names
  • cf1 and cf2: column family names
  • c1, c2, c3, and c4: column names
  • string: data type. The value can be STRING, INTEGER, FLOAT, LONG, DOUBLE, SHORT, BYTE, or CHAR.
  • '#' is used to separate indexes, ';' is used to separate column families, and ',' is used to separate column qualifiers.
  • The column name and its data type must be included in '[]'.
  • Column names and their data types are separated by '->'.
  • If the data type of a specific column is not specified, the default data type (string) is used.