Updated on 2024-11-29 GMT+08:00

HBase Full-Text Index

Scenario

The mapping (defined by mapping.xml) between HBase tables and Solr indexes is created to provide a unified API for operating HBase and Solr. Indexes are stored in Solr and raw data is stored in HBase. When querying data on Solr, you can query raw data directly.

When distrib=false, the query is not supported.

Prerequisites

Solr and HBase have been installed.

Procedure

  1. Configure a config set.

    1. Obtain the initial config set template.

      solrctl confset --generate ./confWithHBase -confWithHBase

      vi confWithHBase/conf/managed-schema

      The uniqueKey in managed-schema must be consistent with the row key in the HBase table. For other fields, set stored=false, which means that collections are stored in Solr and raw data is stored in HBase.

      <field name="id" type="string" indexed="true" stored="true" required="true" multiValued="false" /> 
      <field name="name" type="text_general" indexed="true" stored="false"/>
      <field name="sku" type="text_en_splitting_tight" indexed="true" stored="false" omitNorms="true"/>
      
       <uniqueKey>id</uniqueKey>
    2. Create a config set.

      solrctl confset --create test_conf ./confWithHBase

  2. Configure the mapping.xml file for HBase tables and Solr collections.

    • In the mapping.xml file, the index fields must be those configured with indexed=true in Solr Collection Schema. And the non-index fields must be those in Solr Collection Schema.
    • The column in the mapping.xml file must be the column family and column in the HBase table specified by table.
    • In the mapping.xml file, <mapping table="test_tb"> indicates the name of the HBase table that will establish the mapping with the Solr collection.
    <?xml version="1.0" encoding="utf-8" standalone="yes"?>
    <mapping table="test_tb">
       <index>
           <field name="name" column="I:n"/>
           <field name="alternative_names" column="I:a"/>
           <field name="latitude" column="I:la"/>
           <field name="longitude" column="I:ln"/>
           <field name="countrycode" column="I:x"/>
           <field name="population" column="I:p"/>
           <field name="elevation" column="I:e"/>
           <field name="timezone" column="I:t"/>
           <field name="lastupdate" column="I:las"/>
           <field name="text" column="I:tt"/>
       </index>
       <non-index>
           <field name="non_f1" column="I:n1"/>
           <field name="non_f2" column="I:n2"/>
           <field name="non_f3" column="I:n3"/>
           <field name="non_f4" column="I:n4"/>
       </non-index>
    </mapping>

  3. Create a collection.

    • AdminInterface must be used to create a LunaAdmin class, and delete an HBase table, and collection.
    • java.lang.StringmappingFileDirPath is the path of the mapping.xml file.

    Main APIs:

    // Create hbase table with descriptor and split keys
    void createTable(org.apache.hadoop.hbase.HTableDescriptor desc, byte[][] splitKeys) 
    
    // Create hbase table with descriptor and split keys, create solr collection with create request in default solr root path, add solr index on hbase
    void createTable(org.apache.hadoop.hbase.HTableDescriptordesc, byte[][]splitKeys, 
    org.apache.solr.client.solrj.request.CollectionAdminRequest.CreatecreateRequest, java.lang.StringmappingFileDirPath)
    
    // Add solr collection on hbase table, with default solr root path
    void addCollection(org.apache.hadoop.hbase.TableNametable, org.apache.solr.client.solrj.request.CollectionAdminRequest.CreatecreateRequest, java.lang.StringmappingFileDirPath)
    
    // Delete hbase table, then delete solr collection of table
    void deleteTable(org.apache.hadoop.hbase.TableNametableName)
     
    // Delete solr collection of hbase table, with default solr root path
    void deleteCollection(org.apache.hadoop.hbase.TableNametable, java.lang.Stringcollection)
    
    // Delete all solr collection of hbase table, with defalut solr root path
    void deleteAllCollections(org.apache.hadoop.hbase.TableName table)
    
    // Check solr collection exists
    boolean collectionExists(java.lang.String collection) 
    
    // Check hbase table exists.
    boolean tableExists(org.apache.hadoop.hbase.TableName tableName)
    
    // Get hbase table descriptor
    org.apache.hadoop.hbase.HTableDescriptor getTableDescriptor(org.apache.hadoop.hbase.TableName tableName)

  4. Create a collection.

    Obtain the table handle for LunaAdmin using the AdminInterface API. Call the HBase put API to write data to the HBase table. Create a collection in Solr based on the configurations in the mapping.xml file when the data is written to the HBase table.

    // Get the table which handles write/read requests.
    org.apache.hadoop.hbase.client.Table getTable(org.apache.hadoop.hbase.TableName table)

  5. Query data.

    • On Solr, use the native API of Solr to query data.
    • To disable this feature, enter query.hbase=false when querying.