Help Center/ Cloud Search Service/ Getting Started/ Using OpenSearch for Data Search
Updated on 2024-11-20 GMT+08:00

Using OpenSearch for Data Search

This section provides an example of how an e-commerce website uses an OpenSearch cluster to implement a product search function, including creating indexes, importing data, and searching for data.

Scenario Description

A women's clothing brand runs an e-commerce website. It has been using traditional databases to power a product search function for customers. However, as the website traffic increases, these traditional databases are struggling to keep up, leading to slow responses and low search accuracy. To improve shopping experience for customers, the e-commerce website plans to use Cloud Search Service (CSS) to provide the product search function.

Assume that the e-commerce website has the following data:

{
"products":[
{"productName":"Latest elegant shirts in autumn 2017","size":"L"}
{"productName":"Latest elegant shirts in autumn 2017","size":"M"}
{"productName":"Latest elegant shirts in autumn 2017","size":"S"}
{"productName":"Latest jeans in spring 2018","size":"M"}
{"productName":"Latest jeans in spring 2018","size":"S"}
{"productName":"Latest casual pants in spring 2017","size":"L"}
{"productName":"Latest casual pants in spring 2017","size":"S"}
]
}

Procedure

The following describes how to use an OpenSearch cluster to implement a website search function.

Before starting to migrate data, make the necessary preparations. For details, see Preparations.

  1. Step 1: Creating a Cluster: Create a non-security mode OpenSearch cluster for data search.
  2. Step 2: Importing Data: Use an open-source Elasticsearch API to import data on OpenSearch Dashboards.
  3. Step 3: Searching for Data: Perform full-text search and result aggregation and display on data in the OpenSearch cluster.
  4. Step 4: Deleting Indexes: Delete indexes that you no longer need to reclaim resources.

Preparations

You have registered with Huawei Cloud and performed real-name authentication. Make sure your account is not frozen or in arrears.

If you do not have a Huawei Cloud account, perform the following operations to create one:
  1. Visit the Huawei Cloud official website.
  2. In the upper right corner of the page, click Register and complete the registration as prompted.
  3. Select the service agreement and click Enable.
  4. Perform real-name authentication.

Step 1: Creating a Cluster

Create a non-security mode OpenSearch cluster for data search.

  1. Log in to the CSS management console.
  2. In the navigation pane on the left, choose Clusters > OpenSearch.
  3. Click Create Cluster in the upper right corner. The Create Cluster page is displayed.
  4. Configure Billing Mode and AZ for the cluster.
    Table 1 Billing mode and AZ parameters

    Parameter

    Description

    Example Value

    Billing Mode

    Select Yearly/Monthly or Pay-per-use.

    • Yearly/monthly: You pay for the cluster by year or month, in advance. The service duration ranges from one month to three years. If you plan to use a cluster for more than nine months, you are advised to purchase a yearly package for a better price.
    • Pay-per-use: You are billed by actual duration of use, with a billing cycle of one hour. For example, 58 minutes of usage will be rounded up to an hour and billed.

    Pay-per-use

    Region

    Select the region where the cluster is located.

    ECSs in different regions cannot communicate with each other over an intranet. For lower network latency and quicker resource access, select the nearest region.

    Hong Kong, China

    AZ

    Select AZs associated with the cluster region. A maximum of three AZs can be configured.

    AZ 1

  5. Configure basic cluster information.
    Figure 1 Configuring cluster information
    Table 2 Basic configuration parameters

    Parameter

    Description

    Example Value

    Cluster Type

    Choose OpenSearch.

    OpenSearch

    Version

    Select a cluster version from the drop-down list box.

    1.3.6

    Name

    Cluster name, which contains 4 to 32 characters. Only letters, numbers, hyphens (-), and underscores (_) are allowed and the value must start with a letter.

    Sample-OSCluster

  6. Configure the cluster's node specifications
    Figure 2 Configuring the cluster's node specifications
    Table 3 Specification parameters

    Parameter

    Description

    Example Value

    Nodes

    Number of nodes in a cluster. Select a number from 1 to 32.

    1

    CPU Architecture

    x86. The supported type is determined by the actual regional environment.

    x86

    Node Specifications

    Select the specifications of cluster nodes.

    ess.spec-4u8g

    Node Storage Type

    Select the storage type of cluster nodes.

    Common I/O

    Node Storage Capacity

    Node storage capacity. Its value range varies with node specifications. The node storage capacity must be a multiple of 20.

    40GB

    Master node

    The Master node manages all node tasks in the cluster.

    Unselect it.

    Client node

    Client nodes receive and coordinate external requests, such as search and write requests.

    Unselect it.

    Cold data node

    Cold data nodes are used to store data that is not particularly sensitive to query latency in large quantities.

    Unselect it.

  7. Set the enterprise project.

    When creating a CSS cluster, you can bind an enterprise project to the cluster if you have enabled the enterprise project function. In this example, default, the default enterprise project, is selected.

  8. Click Next: Network. Configure the cluster network.
    Figure 3 Configuring networking
    Table 4 Network configuration parameters

    Parameter

    Description

    Example Value

    VPC

    Specify a VPC to isolate the cluster's network.

    NOTE:

    The VPC must contain CIDRs. Otherwise, cluster creation will fail. By default, a VPC will contain CIDRs.

    vpc-default

    Subnet

    A subnet provides dedicated network resources that are isolated from other networks, improving network security.

    subnet-default

    Security Group

    A security group serves as a virtual firewall that provides access control policies for clusters.

    NOTE:

    For enable cluster access, ensure that port 9200 is allowed by the security group.

    default

    Security Mode

    After the security mode is enabled, communication will be encrypted and authentication required for the cluster.

    Disable

  9. Click Next: Advanced Settings. Configure automatic snapshot creation and other functions.

    This cluster is used only for getting started. Cluster snapshots and advanced functions are not required.

  10. Click Next: Confirm. Check the configuration and click Next to create a cluster.
  11. Click Back to Cluster List to switch to the Clusters page. The cluster you created is now in the cluster list and its status is Creating. If the cluster is successfully created, its status changes to Available.

Step 2: Importing Data

There are many ways to import data to an OpenSearch cluster. In this example, we use an open-source Elasticsearch API to import data on OpenSearch Dashboards.

  1. On the OpenSearch cluster management page, select the created Sample-OSCluster cluster and click Access Kibana in the Operation column to access the OpenSearch Dashboards console.
  2. In the OpenSearch Dashboards navigation pane on the left, choose Dev Tools.
    The text box on the left is the input box. The triangle icon in the upper right corner of the input box is the command execution button. The text box on the right area is the result output box.
    Figure 4 Console page
  3. On the Console page, run the following command to create an index named my_store:
    PUT /my_store
    {
      "settings": {
        "number_of_shards": 1
      },
      "mappings": {
        "properties": {
          "productName": {
            "type": "text",
            "analyzer": "ik_smart"
            },
            "size": {
              "type": "keyword"
            }
          }
        }
      }

    The command output is similar to the following:

    {
      "acknowledged" : true,
      "shards_acknowledged" : true,
      "index" : "my_store"
    }
  4. On the Console page, run the following command to import data to the index named my_store:
    POST /my_store/_doc/_bulk
    {"index":{}}
    {"productName":"Latest elegant shirts in autumn 2017","size":"L"}
    {"index":{}}
    {"productName":"Latest elegant shirts in autumn 2017","size":"M"}
    {"index":{}}
    {"productName":"Latest elegant shirts in autumn 2017","size":"S"}
    {"index":{}}
    {"productName":"Latest jeans in spring 2018","size":"M"}
    {"index":{}}
    {"productName":"Latest jeans in spring 2018","size":"S"}
    {"index":{}}
    {"productName":"Latest casual pants in spring 2017","size":"L"}
    {"index":{}}
    {"productName":"Latest casual pants in spring 2017","size":"S"}

    If the value of the errors field in the command output is false, the data is imported successfully.

Step 3: Searching for Data

Perform full-text search and result aggregation and display in the OpenSearch cluster.

  • Full-text search

    If you access the e-commerce website and want to search for items whose names include "spring jeans", enter "spring jeans" to begin your search.

    Run the following command on OpenSearch Dashboards:

    GET /my_store/_search
    {
      "query": {"match": {
        "productName": "spring jeans"
      }}
    }

    The command output is similar to the following:

    {
      "took" : 3,
      "timed_out" : false,
      "_shards" : {
        "total" : 1,
        "successful" : 1,
        "skipped" : 0,
        "failed" : 0
      },
      "hits" : {
        "total" : {
          "value" : 4,
          "relation" : "eq"
        },
        "max_score" : 1.7965372,
        "hits" : [
          {
            "_index" : "my_store",
            "_type" : "_doc",
            "_id" : "9xf6VHIBfClt6SDjw7H5",
            "_score" : 1.7965372,
            "_source" : {
              "productName": "Latest jeans in spring 2018",
              "size" : "M"
            }
          },
          {
            "_index" : "my_store",
            "_type" : "_doc",
            "_id" : "-Bf6VHIBfClt6SDjw7H5",
            "_score" : 1.7965372,
            "_source" : {
              "productName": "Latest jeans in spring 2018",
              "size" : "S"
            }
          },
          {
            "_index" : "my_store",
            "_type" : "_doc",
            "_id" : "-Rf6VHIBfClt6SDjw7H5",
            "_score" : 0.5945667,
            "_source" : {
              "productName": "Latest casual pants in spring 2017",
              "size" : "L"
            }
          },
          {
            "_index" : "my_store",
            "_type" : "_doc",
            "_id" : "-hf6VHIBfClt6SDjw7H5",
            "_score" : 0.5945667,
            "_source" : {
              "productName": "Latest casual pants in spring 2017",
              "size" : "S"
            }
          }
        ]
      }
    }
    
    • OpenSearch supports IK word segmentation. The command above segments "spring jeans" into "spring" and "jeans".
    • OpenSearch supports full-text search. The command above searches for the information about all items whose names include "spring" or "jeans".
    • Unlike traditional databases, OpenSearch can return results in milliseconds by using inverted indexes.
    • OpenSearch supports ranking by score. In the command output, information about the first two items contains both "spring" and "jeans", while that about the last two items contain only "spring". Therefore, the first two items rank higher than the last two due to high keyword match.
  • Aggregated result display

    The e-commerce website displays aggregated results. For example, it classifies items corresponding to "spring" based on sizes so that you can count the number of items of different sizes.

    Run the following result aggregation command on OpenSearch Dashboards:

    GET /my_store/_search
    {
    "query": {
    "match": { "productName": "spring" }
    },
    "size": 0,
    "aggs": {
    "sizes": {
    "terms": { "field": "size" }
    }
    }
    }

    The command output is similar to the following:

    {
      "took" : 3,
      "timed_out" : false,
      "_shards" : {
        "total" : 1,
        "successful" : 1,
        "skipped" : 0,
        "failed" : 0
      },
      "hits" : {
        "total" : {
          "value" : 4,
          "relation" : "eq"
        },
        "max_score" : null,
        "hits" : [ ]
      },
      "aggregations" : {
        "sizes" : {
          "doc_count_error_upper_bound" : 0,
          "sum_other_doc_count" : 0,
          "buckets" : [
            {
              "key" : "S",
              "doc_count" : 2
            },
            {
              "key" : "L",
              "doc_count" : 1
            },
            {
              "key" : "M",
              "doc_count" : 1
            }
          ]
        }
      }
    }

Step 4: Deleting Indexes

If an index is no longer used, run the following command on OpenSearch Dashboards to delete the index to reclaim resources:

DELETE /my_store

The command output is similar to the following:

{
  "acknowledged" : true
}

Follow-up Operations

You can delete the cluster if you no longer need it.

After you delete a cluster, its data cannot be restored. Exercise caution when deleting a cluster.

  1. Log in to the CSS management console.
  2. In the navigation pane on the left, choose Clusters > OpenSearch.
  3. In the cluster list, locate the Sample-OSCluster cluster, and choose More > Delete in the Operation column.
  4. In the confirmation dialog box, type in DELETE, and click OK.