Help Center/ Cloud Search Service/ Getting Started/ Using Elasticsearch for Vector Search

Updated on 2025-10-21 GMT+08:00

View PDF

Using Elasticsearch for Vector Search

Semantic (vector-based) search maps unstructured data, such as text and images, to high-dimensional vector spaces, where content similarity is measured by distances between vectors. Compared with traditional keywords-based search, this semantic similarity-based search, or vector search, significantly improves recall and accuracy. Vector search is now widely used in real-world applications, such as image search, video retrieval, facial recognition, and personalized ads targeting.

This topic provides an example of implementing vector search through a CSS Elasticsearch cluster. Through this example, you get to learn about the CSS vector database, including creating vector indexes, importing vector data, and performing a vector search.

Scenario Description

An e-commerce platform demands more accurate product search—this is where vector search comes in. A deep learning model is used to convert product images into semantic vectors. These vectors, along with structured product information such as product names and prices, are stored in an Elasticsearch cluster. The following search options are available:

Pure vector search: Provide an image and search for products that are most similar to this image.
Filter: Search for similar products within a specified price range.
Combined search: Search by keywords as well as vector similarity.

Assume that the e-commerce website has the vector data shown in Table 1:

**Table 1** Products sold by one e-commerce website
productName	image_vector	price
Latest art shirts for women in autumn 2017	[1.0, 1.0]	100.0
Latest art shirts for women in autumn 2017	[1.0, 2.0]	200.0
Latest art shirts for women in autumn 2017	[1.0, 3.0]	300.0
Latest jeans for women in spring 2018	[10.0, 20.0]	100.0
Latest jeans for women in spring 2018	[10.0, 30.0]	200.0
Latest casual pants for women in spring 2017	[100.0, 200.0]	100.0
Latest casual pants for women in spring 2017	[100.0, 300.0]	200.0

Procedure

The following describes how to use an Elasticsearch cluster to implement a vector search function.

Before starting to migrate data, make the necessary preparations. For details, see Preparations.

Step 1: Creating an Elasticsearch Cluster: Create a non-security mode Elasticsearch cluster for vector search.
Step 2: Logging In to Kibana: Log in to the cluster through Kibana.
Step 3: Creating a Vector Index: Create a vector index to store vector data.
Step 4: Importing Vector Data: Use an open-source Elasticsearch API to import data.
Step 5: Vector Search: Perform a pure vector search and combined search in the Elasticsearch cluster.
Step 6: Deleting Indexes: Delete indexes that you no longer need to reclaim resources.

Preparations

You have registered with Huawei Cloud and performed real-name authentication. Make sure your account is not frozen or in arrears.

If you do not have a Huawei Cloud account, perform the following operations to create one:

Visit the Huawei Cloud official website.
In the upper-right corner of the page, click Register and complete the registration as prompted.
Select the service agreement and click Enable.
Perform real-name authentication.
- If your account is an individual account, see Individual Real-Name Authentication.
- If your account is an enterprise account, see Enterprise Real-Name Authentication.

Step 1: Creating an Elasticsearch Cluster

Create a non-security mode Elasticsearch cluster for vector search.

Log in to the CSS management console.
In the navigation pane on the left, choose Clusters > Elasticsearch.
In the upper right corner, click Create Cluster. The new-version UI for creating a cluster is displayed by default.
Figure 1 Create Cluster (new version)

Select the cluster type and version.

**Table 2** Cluster configuration
Parameter	Example	Description
Cluster Type	Elasticsearch	Select Elasticsearch.
Cluster Version	7.10.2	Select a cluster version from the drop-down list. The built-in CSS vector search engine is available for Elasticsearch 7.6.2 and 7.10.2 clusters only. To use CSS vector databases, select either of these versions.

Configure basic settings, including the region, billing mode, and AZs.

**Table 3** Basic settings
Parameter	Example	Description
Region	Hong Kong, China	Select the region where the cluster is located. A region is the location of a physical data center. Regions are defined based on their geographical location and network latency. For lower network latency and quicker resource access, select the nearest region.
AZ	AZ 1	Select AZs associated with the cluster region. An AZ is a physical region where resources use independent power supplies and networks. AZs are physically isolated but interconnected through an internal network. A maximum of three AZs can be configured.
Billing Mode	Pay-per-use	Billing mode for the cluster, which can be Yearly/Monthly or Pay-per-use. Yearly/Monthly: You prepay for a yearly or monthly subscription. Pay-per-use (postpaid): You will be billed hourly by actual duration of use. Any partial hour of usage will be rounded up to one hour.

Configure data nodes.

Data nodes store indexed data in a cluster. If Master node and Client node are both unselected, data nodes will be used for all of the following purposes: cluster management, data storage, cluster access, and data analysis. To ensure reliability, a cluster should have at least three nodes.

Figure 2 Configuring data nodes
Click to enlarge

**Table 4** Configuring data nodes
Parameter	Example	Description
CPU Architecture	x86	Select the CPU architecture of the data nodes. x86 and Kunpeng nodes are supported. The architectures actually supported may vary depending on the regional environment.
Node Specifications	ess.spec-4u8g	Select the specifications of the data nodes. Click Available. On the displayed page, select a flavor that suits your needs. In the node flavor list, vCPUs \| Memory indicate the number of vCPUs and memory capacity available for each flavor, and Recommended Storage indicates the supported storage capacity range. The node flavors available may vary depending on the region you select.
Node Storage Type and Capacity	High I/O 100GB	Select the storage type and capacity of the data nodes. If the selected node flavor uses EVS disks, you need to further select Node Storage Type and Capacity based on service requirements. Available EVS disk types vary depending on your region. The value range of node storage capacity is determined by the node flavor you select. The value must be divisible by 20. Node storage capacity cannot be reduced once the cluster is created. Evaluate your long-term data needs and select an appropriate size. If the selected node flavor uses local disks, there is no need to select the node storage type, and the node storage capacity is a fixed value. Both of them are determined by the local disk specifications.
Nodes	1	Set the number of nodes in the cluster. If master nodes are configured, the number of data nodes ranges from 1 to 200. If no master nodes are configured, the number of data nodes ranges from 1 to 32. To ensure cluster availability, you should configure at least three data nodes.

Keep Master Node, Client Node, and Cold Data Node unselected.
- Master nodes manage cluster-wide operations, including metadata, indexes, and shard allocation. For large-scale deployments, using dedicated master nodes enhances cluster stability, service availability, and centralized control.
- Client nodes route and coordinate search and index requests, offloading processing from data nodes for enhanced query performance and cluster scalability when there are heavy loads.
- Cold data nodes are used to store and query latency-insensitive data in large quantities. They offer an effective way to manage large datasets while cutting storage costs.

Configure network settings for the cluster, including the VPC, IP address, and security group.

Figure 3 Configuring network settings
Click to enlarge

**Table 5** Configuring network settings
Parameter	Example	Description
VPC	vpc-default	Select a VPC for the cluster for proper network isolation.
Subnet	subnet-default	Select a subnet for the cluster. A subnet improves network security by providing exclusive network resources that are isolated from other networks. Select a subnet in the current VPC.
IPv4 Address	Assign automatically	Assign IPv4 addresses to cluster nodes.
Security Group	default	Select a security group for the cluster. A security group serves as a virtual firewall that provides access control policies for clusters. The selected security group must allow all ports or port 9200 in the inbound direction. Otherwise, the cluster may be inaccessible to external services.

Configure the security mode. As this topic serves as a quick reference guide only, the security mode is disabled to make the steps simpler.
- When the security mode is enabled, a cluster's communication is encrypted and access to the cluster requires user authentication.
- When it is disabled, access to the cluster requires no user authentication, and data will be transmitted in plaintext using HTTP. In this case, make sure the cluster is deployed in a secure environment. Do not expose the cluster's network interface to the public network.

Configure cluster management settings, such as the cluster name and enterprise project.

**Table 6** Cluster management
Parameter	Example	Description
Cluster Name	Sample-ESCluster	User-defined cluster name.
Add Description	Skip this setting.	Add a description for the cluster for easy recognition.
Enterprise Project	default	Associate the cluster with an enterprise project. An enterprise project groups cloud resources, so you can manage resources and members by project. The default project is default. If enterprise projects are enabled, you can select an enterprise project from the drop-down list.
Tags	None	Adding tags to clusters helps you identify and manage your cluster resources. Each cluster can have a maximum of 20 tags.

Click More Settings. Click to expand More Settings, and configure automatic snapshot creation and VPC Endpoint as required. This cluster is used only for getting started. Keep the default of these settings, that is, keep them disabled.
Click Create Now.
Return to the cluster list and check the newly created cluster. If the cluster is created successfully, Cluster Status changes to Available.
Figure 4 Checking the cluster status

Step 2: Logging In to Kibana

After an Elasticsearch cluster is created, you can access the cluster through Kibana.

From the Elasticsearch cluster list, select the created Sample-ESCluster cluster and click Access Kibana in the Operation column to access the Kibana console.
In the left navigation pane on the Kibana console, click Dev Tools.
The left part of the console is the command input box, and the triangle icon in its upper-right corner is the execution button. The right part shows the execution result.
Figure 5 Kibana console

Step 3: Creating a Vector Index

Create a vector index in the Elasticsearch cluster to store vector data.

Run the following command on Kibana to create a vector index named my_store:

PUT /my_store 
{
  "settings": {       		//Index-level settings
    "index": {
      "vector": true  		//Enable vector search
    }
  },
  "mappings": {       		//Define document field structure and types
    "properties": {
      "productName": {    	//Product name field (text)
        "type": "text",   	//Standard text, full-text search supported
        "analyzer": "ik_smart"  //Use the ik_smart tokenizer (for the Chinese language)
      },
      "image_vector": {   	//Image vector field
        "type": "vector", 	//Declare the vector type
        "dimension": 2,   	//Number of vector dimensions (For simplicity of demonstration, this example uses vectors of only two dimensions. In practice, the dimensionality should match the model's output dimensionality. 512 and 768-dimensional vectors are commonly used.
        "indexing": true, 	//Enable vector indexing to support semantic similarity-based search.
        "algorithm": "GRAPH",  	//Create an approximate nearest neighbor (ANN) index.
        "metric": "euclidean"  	//Use the Euclidean distance to measure similarity.
      },
      "price": {                //Product price field
        "type": "float"         //Floating-point number type, which supports range queries and numerical calculation.
      }
    }
  }
}

The command output is similar to the following:

{
  "acknowledged" : true,
  "shards_acknowledged" : true,
  "index" : "my_store"
}

Step 4: Importing Vector Data

There are several ways to import data to an Elasticsearch cluster. In this example, we use an open-source Elasticsearch API to import data on Kibana.

On the Kibana console, run the following command to import vector data to the index named my_store:

POST /my_store/_bulk
{"index":{}}
{"productName":"Latest art shirts for women in autumn 2017","image_vector":[1.0, 1.0],"price":100.0}
{"index":{}}
{"productName":"Latest art shirts for women in autumn 2017","image_vector":[1.0, 2.0],"price":200.0}
{"index":{}}
{"productName":"Latest art shirts for women in autumn 2017","image_vector":[1.0, 3.0],"price":300.0}
{"index":{}}
{"productName":"Latest jeans for women in spring 2018","image_vector":[10.0, 20.0],"price":100.0}
{"index":{}}
{"productName":"Latest jeans for women in spring 2018","image_vector":[10.0, 30.0],"price":200.0}
{"index":{}}
{"productName":"Latest casual pants for women in spring 2017","image_vector":[100.0, 200.0],"price":100.0}
{"index":{}}
{"productName":"Latest casual pants for women in spring 2017","image_vector":[100.0, 300.0],"price":200.0}

If the value of the errors field in the command output is false, the data is imported successfully.

Step 5: Vector Search

Perform a pure vector search and combined search in the Elasticsearch cluster.

Pure vector search

A user provides a product image and search for products that are similar to this image. The cluster first obtains the image's feature vector through a vectorization model, and then uses this vector to perform a similarity-based search.

Run the following search command on Kibana:

GET /my_store/_search
{
  "size": 3,  			//Specify the return of the top-3 most relevant results
  "_source": { 
    "excludes": "image_vector"  //Exclude the image_vector field in the returned result
  }, 
  "query": {
    "vector": {  		//Enable vector search
      "image_vector": {  	//Specify the target vector field name (must be consistent with the index mapping)
        "vector": [1.0, 2.0],  	//The feature vector to be queried (Here a simplified example is provided. The actual dimensions should be consistent with the model output.)
        "topk": 3  	        //Return the top-3 most relevant results
      }
    }
  }
}

The following shows an example of the query result. Elasticsearch ranks the results based on the similarity score between the query vector and the stored vectors.

{
  "took" : 1,  		//The query took one second
  "timed_out" : false,  //The query did not time out
  "_shards" : {  	//Shard execution result
    "total" : 1,  	//Total number of shards
    "successful" : 1,  	//Number of shards successfully executed
    "skipped" : 0,  	//Number of shards skipped
    "failed" : 0  	//Number of shards failed
  },
  "hits" : {
    "total" : {  	//Total number of document matches
      "value" : 3,  	//Three exact matches (eq indicates an exact count)
      "relation" : "eq"
    },
    "max_score" : 1.0,  //Highest similarity score (depending on the distance algorithm used in the vector space).
    "hits" : [  	//List of matched documents (in descending order of similarity scores)
      {
        "_index" : "my_store",          //Index that contains the document
        "_type" : "_doc",  		//Document type, which is a fixed value
        "_id" : "JPL5r5YBWkpNKdSUkRc9",  //Unique document ID
        "_score" : 1.0,  	//Similarity score of the current document
        "_source" : {  			//Stored original documents (image_vector excluded)
          "price" : 200.0,
          "productName" : "Latest art shirts for women in autumn 2017"
        }
      },
      //... (Other results follow the same structure, with a lower similarity score)
    ]
  }
}

Hybrid search

A user provides a product image and search for products that are similar to this image while specifying a price range. A combined search is performed: vector search + filtering by price range.

Run the following search command on Kibana:

GET /my_store/_search
{
  "size": 3,                     //Specify the return of the top-3 most relevant results
  "_source": { 
    "excludes": "image_vector"  //Exclude the image_vector field in the returned result
  }, 
  "query": {
    "vector": {                  //Enable vector search
      "image_vector": {  	 //Specify the target vector field name (must be consistent with the index mapping)
        "vector": [1.0, 2.0],   //The feature vector to be queried (Here a simplified example is provided. The actual dimensions should be consistent with the model output.)
        "topk": 3  	            //Return the top-3 most relevant results
        "filter": {              //Hybrid filters (filtering before calculating similarity)
          "range": {             //Filter by price range
            "price": {
              "lte": 300         //Retain only items priced at or below 300 (specific currency).
            }
          }
        }
      }
    }
  }
}

Query process: Fetch all items whose price is lower than or equal to 300 (specific currency) from the target index; calculate the similarity between the image_vector field of each of these items and the feature vector of the user-provided image; rank the results by similarity score in descending order; return the top 3 most relevant results, with the image_vector field removed and only core information such as product name and price retained.

The command output is similar to the following:

{
  "took" : 1,  		        //The query took one second
  "timed_out" : false,	//The query didn't time out.
  "_shards" : {			//Shard execution result
    "total" : 1,  	   //Total number of shards
    "successful" : 1,  //Number of shards successfully executed
    "skipped" : 0,      //Number of shards skipped
    "failed" : 0  	   //Number of shards failed
  },
  "hits" : {
    "total" : {  	                //Total number of document matches
      "value" : 3,      //Three exact matches (eq indicates an exact count)
      "relation" : "eq"
    },
    "max_score" : 1.0,  //Highest similarity score (depending on the distance algorithm used in the vector space).
    "hits" : [  	        //List of matched documents (in descending order of similarity scores)
      {
        "_index" : "my_store",  //Index that contains the document
        "_type" : "_doc",       //Document type
        "_id" : "JPL5r5YBWkpNKdSUkRc9", //Unique document ID
        "_score" : 1.0,  	//Similarity score of the current document (normalized)
        "_source" : {  	    //Stored original documents (image_vector excluded)
          "price" : 200.0,
          "productName" : "Latest art shirts for women in autumn 2017"
        }
      },
      //... (Other results follow the same structure, with a lower similarity score)
    ]
  }
}

Step 6: Deleting Indexes

If an index is no longer used, run the following command on Kibana to delete it to reclaim resources:

DELETE /my_store

The command output is similar to the following:

{
  "acknowledged" : true
}

Follow-up Operations

You can delete the cluster if you no longer need it.

After you delete a cluster, its data cannot be restored. Exercise caution when deleting a cluster.

Log in to the CSS management console.
In the navigation pane on the left, choose Clusters > Elasticsearch.
In the cluster list, locate the Sample-ESCluster cluster, and choose More > Delete in the Operation column.
In the confirmation dialog box, type in DELETE, and click OK.

Feedback

Was this page helpful?

Helpful Not helpful

Provide feedback

Thank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.

The system is busy. Please try again later.

Which of the following issues have you encountered?

Content is inconsistent with the product UI

Unclear descriptions

Lack of examples or code

Incorrect steps

Can't find what I need

Lack of best practices

Feedback (optional)

0/500

Select at least one type of issue, and enter your comments or suggestions.

Enter a maximum of 500 characters.

Submit Cancel

For any further questions, feel free to contact us through the chatbot.

Chatbot

Using Elasticsearch for Vector Search

Scenario Description

Procedure

Preparations

Step 1: Creating an Elasticsearch Cluster

Step 2: Logging In to Kibana

Step 3: Creating a Vector Index

Step 4: Importing Vector Data

Step 5: Vector Search

Step 6: Deleting Indexes

Follow-up Operations

Related Documents

Feedback

Was this page helpful?