Help Center> CloudTable Service> User Guide> Cluster Mode> HBase Elasticsearch Full-Text Search> Overview of Full-Text Search

Overview of Full-Text Search

HBase-Elasticsearch stores user source data in HBase and uses the Elasticsearch search engine of Cloud Search Service (CSS for short) to supplement full-text search based on key-value query capabilities. You can define which fields in HBase need full-text search based on service requirements. When you create an HBase table, a CSS cluster you specify will be automatically connected and an index is created in Elasticsearch. Index data is stored in Elasticsearch. In addition, the native APIs (Put and Scan) of HBase support the write and query of index data.

How to Use

Using the HBase Shell for Full-Text Indexing
Java application development
You can develop an application to realize HBase Elasticsearch full-text search. For details, see Developing HBase Elasticsearch Full-Text Search Applications in the CloudTable Service Developer Guide.

Working Principles

As a big data storage service, CloudTable stores user data in bytes and provides efficient key-value random query capabilities. You can customize a schema to specify a data type (generally the text type) for some fields to extend the full-text search capability of CloudTable. CloudTable is suitable to be a primary storage system to store massive amounts of source data (any data type), because it separates computing from storage and features easy scale-out and cost-effectiveness of data storage. CSS (Elasticsearch) preserves lightweight index data to support keyword search. The following figure shows the working principles.

Figure 1 Working principles
click to enlarge

If you enable a full-text index for some specified fields when creating an HBase table, HBase automatically synchronizes full-text index data to CSS when writing data. In addition, the native HBase data read API Scan supports common full-text search in terms of key-value read capability. To obtain complex high-level search capabilities, you can call Elasticsearch APIs and then CloudTable read APIs to complete service logic.

Application Scenarios

Massive amounts of user service data requires HBase to function as a big data online storage system to provide the most basic key-value query capabilities, featuring efficiency, high-concurrency, and low-latency. In addition, there are many types and quantities of fields in the data, that is, the corresponding services are diversified. For example, for a row of data in a table, some text fields need to use keywords for full-text search, some fields are secondary indexes, and some fields are applied to tag bitmap indexes. In this case, the Elasticsearch full-text search function needs to be enabled for CloudTable, while other service expansion capabilities are preserved. Example:

A search website stores massive amounts of search information, user environment information, and basic information in real time, extracts user information based on goods keywords, and resells the information to a third-party e-commerce platform.
An intelligent hospital's case system stores patients' medical treatment information, including the basic information, health status, doctor's occupational information, symptom description, diagnosis results, and medicine. A hospital information platform collects statistics on or searches for patients with historical medical treatment using keywords of the current social epidemics, prohibited drugs, or technical breakthroughs for tracking discharged patients or contacting patients to use new technologies for secondary diagnosis and other innovative services.
An intelligent public opinion governance system of governments stores massive amounts of data such as seditious speech, user information, and forwarding times of mainstream media platform users. It also searches for hot events in real time. If the event is a rumor, the system automatically reminds the user of the authenticity of the current event, the social impact data that the user publishes/forwards, relevant legal provisions, and similar cases. The intelligent feedback mechanism is a deterrent to rumormongers and guides good public opinions.

HBase Elasticsearch Schema Definition

HBase uses metadata of a table to store the definition of the Elasticsearch schema.

**Table 1** Schema definition
Field	Description	Mandatory
hbase.index.es.enabled	Whether to create a full-text index for the HBase table in Elasticsearch. The value true indicates that the full-text index is created. The default value is false.	Yes
hbase.index.es.endpoint	Access address of the CSS cluster (Elasticsearch engine), for example, ip1:port,ip2:port	Yes
hbase.index.es.indexname	Index name of the HBase table in Elasticsearch. The index name must be in lower case.	Yes
hbase.index.es.shards	Number of index shards in Elasticsearch. The default value is 5. The value is an integer greater than or equal to 1.	No
hbase.index.es.replicas	Number of index replicas in Elasticsearch. The default value is 1. The value is an integer greater than or equal to 0.	No
hbase.index.es.schema	Field mapping between HBase and Elasticsearch. The value is characters in JSON array format. Each element contains the following fields: name: Name of the field in Elasticsearch type: Type of the field in Elasticsearch hbaseQualifier: HBase qualifier of the data source analyzer: You can configure analyzer to specify an analyzer for fields of the text type. Typically, the ik_smart analyzer is used for Chinese text. The default value is Standard, supporting English text. Example: '[ {"name":"contentCh","type":"text","hbaseQualifier":"cf1:contentCh","analyzer":"ik_smart"}, {"name":"contentEng","type":"text","hbaseQualifier":"cf2:contentEng"},{"name":"id","type":"long","hbaseQualifier":"cf1:id"} ]'	Yes

The data types supported by HBase-Elasticsearch full-text search are {"text", "long", "integer", "short", "byte", "double", "float","boolean"}, that is, the value type of type in the schema. text indicates the text type in Elasticsearch. Full-text search typically supports data of the text type and also supports accurate search of data of basic types.

Parent topic: HBase Elasticsearch Full-Text Search

Last Article: HBase Elasticsearch Full-Text Search

Next Article: Using the HBase Shell for Full-Text Indexing

Did this article solve your problem?

Thank you for your score！Your feedback would help us improve the website.

Products

Compute

Application

Dedicated Cloud

Storage

Management & Deployment

Migration

Network

Enterprise Intelligence

Video

Database

Edge Cloud Services

DevCloud

Security

Cloud Communications

Internet of Things

Solutions

Industry-Specific Solutions

General-Purpose Solutions

Security

DevOps

Enterprise Intelligence

Essential Platform

Big Data

Visual Cognition

Speech and Semantics

Support

Help Center

Customer Services

Developers

Console

语言 - Language

中国站 - 简体中文

中国站 - English

International - 简体中文

International - English