Using the HBase Shell for Full-Text Indexing
This section describes how to use the HBase shell to create full-text indexes for HBase tables.
Prerequisites
The created CloudTable cluster (HBase), ECS instance (functioning as an HBase client), and CSS cluster (Elasticsearch engine) must have the same VPC, subnet, and security group to ensure network connectivity.
Full-text Search Example
- Start the HBase shell to access a CloudTable cluster.
For details about how to install and start the HBase shell, see Using HBase Shell to Access a Cluster.
- Execute the following statement in the HBase shell to create an HBase table:
create 'hbase-es-table', {NAME => 'f', VERSIONS => 5},SPLITS => ['10', '20'], METADATA => {'hbase.index.es.enabled' => 'true', 'hbase.index.es.endpoint'=>'10.5.131.1:9200,10.5.131.2:9200','hbase.index.es.indexname'=>'product','hbase.index.es.schema' => '[{"name":"email","type":"text","hbaseQualifier":"f:email"}]' }For details about the schema definition of the METADATA field, see HBase Elasticsearch Schema Definition. Replace hbase.index.es.endpoint in the preceding statement with the address to access the CSS cluster.
- In the HBase shell, run the following put commands to write three rows of data to the HBase table:
put 'hbase-es-table', '001rowkey','f:email','how many apples' put hbase-es-table', '101rowkey','f:email','how much people' put 'hbase-es-table', '201rowkey','f:email','many time people'
- Exit the HBase shell, and run the following curl command to call Search APIs of Elasticsearch and search for the keyword how:
curl -X GET "${ES_ClusterIP:Port}/product/search" -H 'Content-Type: application/json' -d' { "storedfields" : ["rowkey"], "query" : { "term" : { "email" : "how" } } } 'Replace ${ES_Cluster_IP:Port} in the preceding command with the address to access the CSS cluster, for example, 10.5.131.1:9200.
Two documents (a document is a basic information unit for indexing and compiled in JSON) are hit in search result, and the rowkey field of the document is returned. The rowkey is the bridge of the mapping between HBase source data and Elasticsearch index data. The result is as follows:
{ "took":4, "timedout":false, "shards":{ "total":5, "successful":5, "skipped":0, "failed":0 }, "hits":{ "total":2, "maxscore":0.2876821, "hits":[ { "index":"product", "type":"doc", "id":"GB087WYB7F1t0X-xu3ZX", "score":0.2876821, "fields":{ "rowkey":[ "MDAxcm93a2V5" ] } }, { "index":"product", "type":"doc", "id":"GR087WYB7F1t0X-xvHZ5", "_score":0.2876821, "fields":{ "rowkey":[ "MTAxcm93a2V5" ] } } ] } } - Use the following website to perform decoding to obtain the rowkey of metadata in HBase:
The rowkey returned in 4 is encoded by Base64.Encoder. You can obtain the rowkey in HBase using Base64.Decoder.
- Restart the HBase shell. In the HBase shell, run the following get command to obtain the data source:
get 'hbase-es-table','rowkey'
In Java application development, you can realize functions in 3, 4, and 5 by invoking a function once. For details, see Querying Data in the CloudTable Service Developer Guide.
Last Article: Overview of Full-Text Search
Next Article: Batch Data Import
Did this article solve your problem?
Thank you for your score!Your feedback would help us improve the website.