Using the HBase Shell for Full-Text Indexing

This section describes how to use the HBase shell to create full-text indexes for HBase tables.

Prerequisites

The created CloudTable cluster (HBase), ECS instance (functioning as an HBase client), and CSS cluster (Elasticsearch engine) must have the same VPC, subnet, and security group to ensure network connectivity.

Full-text Search Example

  1. Start the HBase shell to access a CloudTable cluster.

    For details about how to install and start the HBase shell, see Using HBase Shell to Access a Cluster.

  2. Execute the following statement in the HBase shell to create an HBase table:

    create 'hbase-es-table', {NAME => 'f', VERSIONS => 5},SPLITS => ['10', '20'], METADATA => {'hbase.index.es.enabled' => 'true', 'hbase.index.es.endpoint'=>'10.5.131.1:9200,10.5.131.2:9200','hbase.index.es.indexname'=>'product','hbase.index.es.schema' => '[{"name":"email","type":"text","hbaseQualifier":"f:email"}]' }

    For details about the schema definition of the METADATA field, see HBase Elasticsearch Schema Definition. Replace hbase.index.es.endpoint in the preceding statement with the address to access the CSS cluster.

  3. In the HBase shell, run the following put commands to write three rows of data to the HBase table:

    put 'hbase-es-table', '001rowkey','f:email','how many apples'
    put hbase-es-table', '101rowkey','f:email','how much people'
    put 'hbase-es-table', '201rowkey','f:email','many time people'

  4. Exit the HBase shell, and run the following curl command to call Search APIs of Elasticsearch and search for the keyword how:

    curl -X GET "${ES_ClusterIP:Port}/product/search" -H 'Content-Type: application/json' -d' {  "storedfields" : ["rowkey"],  "query" : {  "term" : { "email" : "how" }  } } '

    Replace ${ES_Cluster_IP:Port} in the preceding command with the address to access the CSS cluster, for example, 10.5.131.1:9200.

    Two documents (a document is a basic information unit for indexing and compiled in JSON) are hit in search result, and the rowkey field of the document is returned. The rowkey is the bridge of the mapping between HBase source data and Elasticsearch index data. The result is as follows:

    {  "took":4,  "timedout":false,  "shards":{  "total":5,  "successful":5,  "skipped":0,  "failed":0  },  "hits":{  "total":2,  "maxscore":0.2876821,  "hits":[  {  "index":"product",  "type":"doc",  "id":"GB087WYB7F1t0X-xu3ZX",  "score":0.2876821,  "fields":{  "rowkey":[  "MDAxcm93a2V5"  ]  }  },  {  "index":"product",  "type":"doc",  "id":"GR087WYB7F1t0X-xvHZ5",  "_score":0.2876821,  "fields":{  "rowkey":[  "MTAxcm93a2V5"  ]  }  }  ]  } }

  5. Use the following website to perform decoding to obtain the rowkey of metadata in HBase:

    https://www.base64decode.org/

    The rowkey returned in 4 is encoded by Base64.Encoder. You can obtain the rowkey in HBase using Base64.Decoder.

  6. Restart the HBase shell. In the HBase shell, run the following get command to obtain the data source:

    get 'hbase-es-table','rowkey'

    In Java application development, you can realize functions in 3, 4, and 5 by invoking a function once. For details, see Querying Data in the CloudTable Service Developer Guide.