Using Elasticsearch for Data Search_Getting Started

Scenario Description

A women's clothing brand runs an e-commerce website. It has been using traditional databases to power a product search function on this website. However, as the website traffic increases, these traditional databases are struggling to keep up, leading to slow responses and low search accuracy. To improve shopping experience for customers, the e-commerce website plans to use Cloud Search Service (CSS) to provide the product search function.

Assume that the e-commerce website has the data shown in Table 1:

**Table 1** Products sold by one e-commerce website
productName	size
Latest art shirts for women in autumn 2017	L
Latest art shirts for women in autumn 2017	M
Latest art shirts for women in autumn 2017	S
Latest jeans for women in spring 2018	M
Latest jeans for women in spring 2018	S
Latest casual pants for women in spring 2017	L
Latest casual pants for women in spring 2017	S

Procedure

The following describes how to use an Elasticsearch cluster to implement a website search function.

Before starting to migrate data, make the necessary preparations. For details, see Preparations.

Step 1: Creating a Cluster: Create a non-security mode Elasticsearch cluster for data search.
Step 2: Logging In to Kibana: Log in to the cluster through Kibana.
Step 3: Creating an Index: Create indexes in the cluster through Kibana.
Step 4: Importing Data: Use an open-source Elasticsearch API to import data on Kibana.
Step 5: Searching for Data: Perform full-text search and result aggregation and display on data in the Elasticsearch cluster.
Step 6: Deleting Indexes: Delete indexes that you no longer need to reclaim resources.

Preparations

You have registered with Huawei Cloud and performed real-name authentication. Make sure your account is not frozen or in arrears.

If you do not have a Huawei Cloud account, perform the following operations to create one:

Visit the Huawei Cloud official website.
In the upper-right corner of the page, click Register and complete the registration as prompted.
Select the service agreement and click Enable.
Perform real-name authentication.
- If your account is an individual account, see Individual Real-Name Authentication.
- If your account is an enterprise account, see Enterprise Real-Name Authentication.

Step 1: Creating a Cluster

Create a non-security mode Elasticsearch cluster for data search.

Log in to the CSS management console.
In the navigation pane on the left, choose Clusters > Elasticsearch.
In the upper right corner, click Create Cluster. The new-version UI for creating a cluster is displayed by default.
Figure 1 Create Cluster (new version)

Select the cluster type and version.

**Table 2** Cluster configuration
Parameter	Example	Description
Cluster Type	Elasticsearch	Select Elasticsearch.
Cluster Version	7.10.2	Select a cluster version from the drop-down list. The built-in CSS vector search engine is available for Elasticsearch 7.6.2 and 7.10.2 clusters only. To use CSS vector databases, select either of these versions.

Configure basic settings, including the region, billing mode, and AZs.

**Table 3** Basic settings
Parameter	Example	Description
Region	Hong Kong, China	Select the region where the cluster is located. A region is the location of a physical data center. Regions are defined based on their geographical location and network latency. For lower network latency and quicker resource access, select the nearest region.
AZ	AZ 1	Select AZs associated with the cluster region. An AZ is a physical region where resources use independent power supplies and networks. AZs are physically isolated but interconnected through an internal network. A maximum of three AZs can be configured.
Billing Mode	Pay-per-use	Billing mode for the cluster, which can be Yearly/Monthly or Pay-per-use. Yearly/Monthly: You prepay for a yearly or monthly subscription. Pay-per-use (postpaid): You will be billed hourly by actual duration of use. Any partial hour of usage will be rounded up to one hour.

Configure data nodes.

Data nodes store indexed data in a cluster. If Master node and Client node are both unselected, data nodes will be used for all of the following purposes: cluster management, data storage, cluster access, and data analysis. To ensure reliability, a cluster should have at least three nodes.

Figure 2 Configuring data nodes
Click to enlarge

**Table 4** Configuring data nodes
Parameter	Example	Description
CPU Architecture	x86	Select the CPU architecture of the data nodes. x86 and Kunpeng nodes are supported. The architectures actually supported may vary depending on the regional environment.
Node Specifications	ess.spec-4u8g	Select the specifications of the data nodes. Click Available. On the displayed page, select a flavor that suits your needs. In the node flavor list, vCPUs \| Memory indicate the number of vCPUs and memory capacity available for each flavor, and Recommended Storage indicates the supported storage capacity range. The node flavors available may vary depending on the region you select.
Node Storage Type and Capacity	High I/O 100GB	Select the storage type and capacity of the data nodes. If the selected node flavor uses EVS disks, you need to further select Node Storage Type and Capacity based on service requirements. Available EVS disk types vary depending on your region. The value range of node storage capacity is determined by the node flavor you select. The value must be divisible by 20. Node storage capacity cannot be reduced once the cluster is created. Evaluate your long-term data needs and select an appropriate size. If the selected node flavor uses local disks, there is no need to select the node storage type, and the node storage capacity is a fixed value. Both of them are determined by the local disk specifications.
Nodes	1	Set the number of nodes in the cluster. If master nodes are configured, the number of data nodes ranges from 1 to 200. If no master nodes are configured, the number of data nodes ranges from 1 to 32. To ensure cluster availability, you should configure at least three data nodes.

Keep Master Node, Client Node, and Cold Data Node unselected.
- Master nodes manage cluster-wide operations, including metadata, indexes, and shard allocation. For large-scale deployments, using dedicated master nodes enhances cluster stability, service availability, and centralized control.
- Client nodes route and coordinate search and index requests, offloading processing from data nodes for enhanced query performance and cluster scalability when there are heavy loads.
- Cold data nodes are used to store and query latency-insensitive data in large quantities. They offer an effective way to manage large datasets while cutting storage costs.

Configure network settings for the cluster, including the VPC, IP address, and security group.

Figure 3 Configuring network settings
Click to enlarge

**Table 5** Configuring network settings
Parameter	Example	Description
VPC	vpc-default	Select a VPC for the cluster for proper network isolation.
Subnet	subnet-default	Select a subnet for the cluster. A subnet improves network security by providing exclusive network resources that are isolated from other networks. Select a subnet in the current VPC.
IPv4 Address	Assign automatically	Assign IPv4 addresses to cluster nodes.
Security Group	default	Select a security group for the cluster. A security group serves as a virtual firewall that provides access control policies for clusters. The selected security group must allow all ports or port 9200 in the inbound direction. Otherwise, the cluster may be inaccessible to external services.

Configure the security mode. As this topic serves as a quick reference guide only, the security mode is disabled to make the steps simpler.
- When the security mode is enabled, a cluster's communication is encrypted and access to the cluster requires user authentication.
- When it is disabled, access to the cluster requires no user authentication, and data will be transmitted in plaintext using HTTP. In this case, make sure the cluster is deployed in a secure environment. Do not expose the cluster's network interface to the public network.

Configure cluster management settings, such as the cluster name and enterprise project.

**Table 6** Cluster management
Parameter	Example	Description
Cluster Name	Sample-ESCluster	User-defined cluster name.
Add Description	Skip this setting.	Add a description for the cluster for easy recognition.
Enterprise Project	default	Associate the cluster with an enterprise project. An enterprise project groups cloud resources, so you can manage resources and members by project. The default project is default. If enterprise projects are enabled, you can select an enterprise project from the drop-down list.
Tags	None	Adding tags to clusters helps you identify and manage your cluster resources. Each cluster can have a maximum of 20 tags.

Click More Settings. Click to expand More Settings, and configure automatic snapshot creation and VPC Endpoint as required. This cluster is used only for getting started. Keep the default of these settings, that is, keep them disabled.
Click Create Now.
Return to the cluster list and check the newly created cluster. If the cluster is created successfully, Cluster Status changes to Available.
Figure 4 Checking the cluster status

Step 2: Logging In to Kibana

After an Elasticsearch cluster is created, you can access the cluster through Kibana.

From the Elasticsearch cluster list, select the created Sample-ESCluster cluster and click Access Kibana in the Operation column to access the Kibana console.
In the left navigation pane on the Kibana console, click Dev Tools.
The left part of the console is the command input box, and the triangle icon in its upper-right corner is the execution button. The right part shows the execution result.
Figure 5 Kibana console

Step 3: Creating an Index

Create an index in the Elasticsearch cluster to store data.

Run the following command on Kibana to create an index named my_store:

PUT /my_store
{
  "settings": {
    "number_of_shards": 1
  },
  "mappings": {
    "properties": {
      "productName": {
        "type": "text",
        "analyzer": "ik_smart"
        },
        "size": {
          "type": "keyword"
        }
      }
    }
  }

The command output is similar to the following:

{
  "acknowledged" : true,
  "shards_acknowledged" : true,
  "index" : "my_store"
}

Step 4: Importing Data

There are several ways to import data to an Elasticsearch cluster. In this example, we use an open-source Elasticsearch API to import data on Kibana.

On the Kibana console, run the following command to import data to the index named my_store:

POST /my_store/_bulk
{"index":{}}
{"productName":"Latest art shirts for women in autumn 2017","size":"L"}
{"index":{}}
{"productName":"Latest art shirts for women in autumn 2017","size":"M"}
{"index":{}}
{"productName":"Latest art shirts for women in autumn 2017","size":"S"}
{"index":{}}
{"productName":"Latest jeans for women in spring 2018","size":"M"}
{"index":{}}
{"productName":"Latest jeans for women in spring 2018","size":"S"}
{"index":{}}
{"productName":"Latest casual pants for women in spring 2017","size":"L"}
{"index":{}}
{"productName":"Latest casual pants for women in spring 2017","size":"S"}

If the value of the errors field in the command output is false, the data is imported successfully.

Step 5: Searching for Data

Perform full-text search and result aggregation and display on data in the Elasticsearch cluster.

Full-text search

If you access the e-commerce website and want to search for items whose names include "spring jeans", enter "spring jeans" to begin your search.

Run the following command on Kibana:

GET /my_store/_search
{
  "query": {"match": {
    "productName": "spring jeans"
  }}
}

The command output is similar to the following:

{
  "took" : 3,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 4,
      "relation" : "eq"
    },
    "max_score" : 1.7965372,
    "hits" : [
      {
        "_index" : "my_store",
        "_type" : "_doc",
        "_id" : "9xf6VHIBfClt6SDjw7H5",
        "_score" : 1.7965372,
        "_source" : {
          "productName": "Latest jeans for women in spring 2018",
          "size" : "M"
        }
      },
      {
        "_index" : "my_store",
        "_type" : "_doc",
        "_id" : "-Bf6VHIBfClt6SDjw7H5",
        "_score" : 1.7965372,
        "_source" : {
          "productName": "Latest jeans for women in spring 2018",
          "size" : "S"
        }
      },
      {
        "_index" : "my_store",
        "_type" : "_doc",
        "_id" : "-Rf6VHIBfClt6SDjw7H5",
        "_score" : 0.5945667,
        "_source" : {
          "productName": "Latest casual pants for women in spring 2017",
          "size" : "L"
        }
      },
      {
        "_index" : "my_store",
        "_type" : "_doc",
        "_id" : "-hf6VHIBfClt6SDjw7H5",
        "_score" : 0.5945667,
        "_source" : {
          "productName": "Latest casual pants for women in spring 2017",
          "size" : "S"
        }
      }
    ]
  }
}

Elasticsearch supports IK word segmentation. The search command above segments "spring jeans" into "spring" and "jeans".
Elasticsearch supports full-text search. The command above searches for all items whose names include "spring" or "jeans".
Unlike traditional databases, Elasticsearch can return results in milliseconds by using inverted indexes.
Elasticsearch supports ranking by score. In the command output, the first two items contains both "spring" and "jeans", while the last two items contain only "spring". Therefore, the first two items rank higher than the last two as they are more relevant to the search word.

Aggregated result display

The e-commerce website displays aggregated results. For example, it classifies items corresponding to "spring" based on sizes so that you can count the number of items of different sizes.

Run the following result aggregation command on Kibana:

GET /my_store/_search
{
  "query": {
    "match": {
      "productName": "Spring",
    }
  },
  "size": 0,
  "aggs": {
    "sizes": {
      "terms": {
        "field": "size"
      }
    }
  }
}

The command output is similar to the following:

{
  "took" : 3,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 4,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [ ]
  },
  "aggregations" : {
    "sizes" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : "S",
          "doc_count" : 2
        },
        {
          "key" : "L",
          "doc_count" : 1
        },
        {
          "key" : "M",
          "doc_count" : 1
        }
      ]
    }
  }
}

Step 6: Deleting Indexes

If an index is no longer used, run the following command on Kibana to delete it to reclaim resources:

DELETE /my_store

The command output is similar to the following:

{
  "acknowledged" : true
}

Follow-up Operations

You can delete the cluster if you no longer need it.

After you delete a cluster, its data cannot be restored. Exercise caution when deleting a cluster.

Log in to the CSS management console.
In the navigation pane on the left, choose Clusters > Elasticsearch.
In the cluster list, locate the Sample-ESCluster cluster, and choose More > Delete in the Operation column.
In the confirmation dialog box, type in DELETE, and click OK.

Using Elasticsearch for Data Search

Scenario Description

Procedure

Preparations

Step 1: Creating a Cluster

Step 2: Logging In to Kibana

Step 3: Creating an Index

Step 4: Importing Data

Step 5: Searching for Data

Step 6: Deleting Indexes

Follow-up Operations

Related Documents

Feedback

Was this page helpful?