Using Elasticsearch for Vector Search
Semantic (vector-based) search maps unstructured data, such as text and images, to high-dimensional vector spaces, where content similarity is measured by distances between vectors. Compared with traditional keywords-based search, this semantic similarity-based search, or vector search, significantly improves recall and accuracy. Vector search is now widely used in real-world applications, such as image search, video retrieval, facial recognition, and personalized ads targeting.
This topic provides an example of implementing vector search through a CSS Elasticsearch cluster. Through this example, you get to learn about the CSS vector database, including creating vector indexes, importing vector data, and performing a vector search.
Scenario Description
An e-commerce platform demands more accurate product search—this is where vector search comes in. A deep learning model is used to convert product images into semantic vectors. These vectors, along with structured product information such as product names and prices, are stored in an Elasticsearch cluster. The following search options are available:
- Pure vector search: Provide an image and search for products that are most similar to this image.
- Filter: Search for similar products within a specified price range.
- Combined search: Search by keywords as well as vector similarity.
Assume that the e-commerce website has the following vector data:
{ "products":[ {"productName":"Latest art shirts for women in autumn 2017","image_vector":[1.0, 1.0],"price":100.0} {"productName":"Latest art shirts for women in autumn 2017","image_vector":[1.0, 2.0],"price":200.0} {"productName":"Latest art shirts for women in autumn 2017","image_vector":[1.0, 3.0],"price":300.0} {"productName":"Latest jeans for women in spring 2018","image_vector":[10.0, 20.0],"price":100.0} {"productName":"Latest jeans for women in spring 2018","image_vector":[10.0, 30.0],"price":200.0} {"productName":"Latest casual pants for women in spring 2017","image_vector":[100.0, 200.0],"price":100.0} {"productName":"Latest casual pants for women in spring 2017","image_vector":[100.0, 300.0],"price":200.0} ] }
Procedure
The following describes how to use an Elasticsearch cluster to implement a vector search function.
Before starting to migrate data, make the necessary preparations. For details, see Preparations.
- Step 1: Creating an Elasticsearch Cluster: Create a non-security mode Elasticsearch cluster for vector search.
- Step 2: Logging In to Kibana: Log in to the cluster through Kibana.
- Step 3: Creating a Vector Index: Create a vector index to store vector data.
- Step 4: Importing Vector Data: Use an open-source Elasticsearch API to import data.
- Step 5: Vector Search: Perform a pure vector search and combined search in the Elasticsearch cluster.
- Step 6: Deleting Indexes: Delete indexes that you no longer need to reclaim resources.
Preparations
You have registered with Huawei Cloud and performed real-name authentication. Make sure your account is not frozen or in arrears.
- Visit the Huawei Cloud official website.
- In the upper right corner of the page, click Register and complete the registration as prompted.
- Select the service agreement and click Enable.
- Perform real-name authentication.
- If your account is an individual account, see Individual Real-Name Authentication.
- If your account is an enterprise account, see Enterprise Real-Name Authentication.
Step 1: Creating an Elasticsearch Cluster
Create a non-security mode Elasticsearch cluster for vector search.
- Log in to the CSS management console.
- In the navigation pane on the left, choose Clusters > Elasticsearch.
- In the upper right corner, click Create Cluster. The new-version UI for creating a cluster is displayed by default.
Figure 1 Create Cluster (new version)
- Select the cluster type and version.
Table 1 Cluster configuration Parameter
Example
Description
Cluster Type
Elasticsearch
Select Elasticsearch.
Version
7.10.2
Select a cluster version from the drop-down list.
The built-in CSS vector search engine is available for Elasticsearch 7.6.2 and 7.10.2 clusters only. To use CSS vector databases, select either of these versions.
- Configure basic settings, including the billing mode, current region and AZ.
Table 2 Basic settings Parameter
Example
Description
Billing Mode
Pay-per-use
Billing mode for the cluster, which can be Yearly/Monthly or Pay-per-use.
- Yearly/Monthly: You prepay for a yearly or monthly subscription.
- Pay-per-use (postpaid): You will be billed hourly by actual duration of use. Any partial hour of usage will be rounded up to one hour.
Region
Hong Kong, China
Select the region where the cluster is located. A region is the location of a physical data center. Regions are defined based on their geographical location and network latency. For lower network latency and quicker resource access, select the nearest region.
AZ
AZ 1
Select AZs associated with the cluster region. An AZ is a physical region where resources use independent power supplies and networks. AZs are physically isolated but interconnected through an internal network.
A maximum of three AZs can be configured.
- Configure data nodes.
Data nodes store indexed data in a cluster. If Master node and Client node are both unselected, data nodes will be used for all of the following purposes: cluster management, data storage, cluster access, and data analysis. To ensure reliability, a cluster should have a least three nodes.Figure 2 Configuring data nodes
Table 3 Configuring data nodes Parameter
Example
Description
CPU Architecture
x86
Select the CPU architecture of the data nodes. x86 and Kunpeng nodes are supported. The architectures actually supported may vary depending on the regional environment.
Node Specifications
ess.spec-4u8g
Select the specifications of the data nodes. Click Available. On the displayed page, select a flavor that suits your needs.
In the node flavor list, vCPUs | Memory indicate the number of vCPUs and memory capacity available for each flavor, and Recommended Storage indicates the supported storage capacity range.
The node flavors available may vary depending on the region you select.
Node Storage Type and Capacity
- High I/O
- 40GB
Select the storage type and capacity of the data nodes.
- If the selected node flavor uses EVS disks, you need to further select Node Storage Type and Capacity based on service requirements.
- Available EVS disk types vary depending on your region.
- The value range of node storage capacity is determined by the node flavor you select. The value must be divisible by 20.
- Node storage capacity cannot be reduced once the cluster is created. Evaluate your long-term data needs and select an appropriate size.
- If the selected node flavor uses local disks, there is no need to select the node storage type, and the node storage capacity is a fixed value. Both of them are determined by the local disk specifications.
Nodes
1
Set the number of nodes in the cluster.
- If master nodes are configured, the number of data nodes ranges from 1 to 200.
- If no master nodes are configured, the number of data nodes ranges from 1 to 32.
- To ensure cluster availability, you should configure at least three data nodes.
- Keep Master Node, Client Node, and Cold Data Node unselected.
- Master nodes manage cluster-wide operations, including metadata, indexes, and shard allocation. For large-scale deployments, using dedicated master nodes enhances cluster stability, service availability, and centralized control.
- Client nodes route and coordinate search and index requests, offloading processing from data nodes for enhanced query performance and cluster scalability when there are heavy loads.
- Cold data nodes are used to store and query latency-insensitive data in large quantities. They offer an effective way to manage large datasets while cutting storage costs.
- Configure network settings for the cluster, including the VPC, IP address, and security group.
Figure 3 Configuring network settings
Table 4 Configuring network settings Parameter
Example
Description
VPC
vpc-default
Select a VPC for the cluster for proper network isolation.
Subnet
subnet-default
Select a subnet for the cluster. A subnet improves network security by providing exclusive network resources that are isolated from other networks.
Select a subnet in the current VPC.
IPv4 Address
Assign automatically
Assign IPv4 addresses to cluster nodes.
Security Group
default
Select a security group for the cluster. A security group serves as a virtual firewall that provides access control policies for clusters.
The selected security group must allow all ports or port 9200 in the inbound direction. Otherwise, the cluster may be inaccessible to external services.
- Configure the security mode. As this topic serves as a quick reference guide only, the security mode is disabled to make the steps simpler.
- When the security mode is enabled, a cluster's communication is encrypted and access to the cluster requires user authentication.
- When it is disabled, access to the cluster requires no user authentication, and data will be transmitted in plaintext using HTTP. In this case, make sure the cluster is deployed in a secure environment. Do not expose the cluster's network interface to the public network.
- Configure cluster management settings, such as the cluster name and enterprise project.
Table 5 Cluster management Parameter
Example
Description
Name
Sample-ESCluster
User-defined cluster name.
The cluster name must start with a letter and can contain 4 to 32 characters. Only letters, digits, hyphens (-), and underscores (_) are allowed.
Enterprise Project
default
Associate the cluster with an enterprise project.
An enterprise project groups cloud resources, so you can manage resources and members by project. The default project is default.
If enterprise projects are enabled, you can select an enterprise project from the drop-down list.
Tags
None
Adding tags to clusters helps you identify and manage your cluster resources.
Each cluster can have a maximum of 20 tags.
- Set automatic snapshot creation. Toggle off Automatic Snapshot Creation to disable it. This topic does not cover snapshot creation, as it serves as a quick reference guide only.
- Retain the default setting for VPC Endpoint (that is, disabled).
- Click Create Now.
- Return to the cluster list and check the newly created cluster. If the cluster is created successfully, Cluster Status changes to Available.
Figure 4 Checking the cluster status
Step 2: Logging In to Kibana
After an Elasticsearch cluster is created, you can access the cluster through Kibana.
- From the Elasticsearch cluster list, select the created Sample-ESCluster cluster and click Access Kibana in the Operation column to access the Kibana console.
- In the left navigation pane on the Kibana console, click Dev Tools.
The text box on the left is the input box. The triangle icon in the upper right corner of the input box is the command execution button. The text box on the right area is the result output box.Figure 5 Kibana console
Step 3: Creating a Vector Index
Create a vector index in the Elasticsearch cluster to store vector data.
PUT /my_store { "settings": { //Index-level settings "index": { "vector": true //Enable vector search } }, "mappings": { //Define document field structure and types "properties": { "productName": { //Product name field (text) "type": "text", //Standard text, full-text search supported "analyzer": "ik_smart" //Use the ik_smart tokenizer (for the Chinese language) }, "image_vector": { //Image vector field "type": "vector", //Declare the vector type "dimension": 2, //Vector dimensions. There are only two dimensions in this example. In reality, much higher dimensions, such as 512 and 768, are used. "indexing": true, //Enable vector indexing to support semantic similarity-based search. "algorithm": "GRAPH", //Create an approximate nearest neighbor (ANN) index. "metric": "euclidean" //Use the Euclidean distance to measure similarity. }, "price": { //Product price field "type": "float" //Floating-point number type, which supports range queries and numerical calculation. } } } }
The command output is similar to the following:
{ "acknowledged" : true, "shards_acknowledged" : true, "index" : "my_store" }
Step 4: Importing Vector Data
There are several ways to import data to an Elasticsearch cluster. In this example, we use an open-source Elasticsearch API to import data on Kibana.
POST /my_store/_doc/_bulk {"index":{}} {"productName":"Latest art shirts for women in autumn 2017","image_vector":[1.0, 1.0],"price":100.0} {"index":{}} {"productName":"Latest art shirts for women in autumn 2017","image_vector":[1.0, 2.0],"price":200.0} {"index":{}} {"productName":"Latest art shirts for women in autumn 2017","image_vector":[1.0, 3.0],"price":300.0} {"index":{}} {"productName":"Latest jeans for women in spring 2018","image_vector":[10.0, 20.0],"price":100.0} {"index":{}} {"productName":"Latest jeans for women in spring 2018","image_vector":[10.0, 30.0],"price":200.0} {"index":{}} {"productName":"Latest casual pants for women in spring 2017","image_vector":[100.0, 200.0],"price":100.0} {"index":{}} {"productName":"Latest casual pants for women in spring 2017","image_vector":[100.0, 300.0],"price":200.0}
If the value of the errors field in the command output is false, the data is imported successfully.
Step 5: Vector Search
Perform a pure vector search and combined search in the Elasticsearch cluster.
- Pure vector search
A user provides a product image and search for products that are similar to this image. The cluster first obtains the image's feature vector through a vectorization model, and then uses this vector to perform a similarity-based search.
Run the following search command on Kibana:
GET /my_store/_search { "size": 3, //Specify the return of the top-3 most relevant results "_source": { "excludes": "image_vector" //Exclude the image_vector field in the returned result }, "query": { "vector": { //Enable vector search "image_vector": { //Specify the target vector field name (must be consistent with the index mapping) "vector": [1.0, 2.0], //The feature vector to be queried (Here a simplified example is provided. The actual dimensions should be consistent with the model output.) "topk": 3 //Return the top-3 most relevant results } } } }
The following shows an example of the query result. Elasticsearch ranks the results based on the similarity score between the query vector and the stored vectors.
{ "took" : 1, //The query took one second "timed_out" : false, //The query did not time out "_shards" : { //Shard execution result "total" : 1, //Total number of shards "successful" : 1, //Number of shards successfully executed "skipped" : 0, //Number of shards skipped "failed" : 0 //Number of shards failed }, "hits" : { "total" : { //Total number of document matches "value" : 3, //Three exact matches (eq indicates an exact count) "relation" : "eq" }, "max_score" : 1.0, //Highest similarity score (depending on the distance algorithm used in the vector space). "hits" : [ //List of matched documents (in descending order of similarity scores) { "_index" : "my_store", //Index that contains the document "_type" : "_doc", //Document type, which is a fixed value "_id" : "JPL5r5YBWkpNKdSUkRc9", //Unique document ID "_score" : 1.0, //Similarity score of the current document "_source" : { //Stored original documents (image_vector excluded) "price" : 200.0, "productName" : "Latest art shirts for women in autumn 2017" } }, //... (Other results follow the same structure, with a lower similarity score) ] } }
- Hybrid search
A user provides a product image and search for products that are similar to this image while specifying a price range. A combined search is performed: vector search + filtering by price range.
Run the following search command on Kibana:
GET /my_store/_search { "size": 3, //Specify the return of the top-3 most relevant results "_source": { "excludes": "image_vector" //Exclude the image_vector field in the returned result }, "query": { "vector": { //Enable vector search "image_vector": { //Specify the target vector field name (must be consistent with the index mapping) "vector": [1.0, 2.0], //The feature vector to be queried (Here a simplified example is provided. The actual dimensions should be consistent with the model output.) "topk": 3 //Return the top-3 most relevant results "filter": { //Hybrid filters (filtering before calculating similarity) "range": { //Filter by price range "price": { "lte": 300 //Retain only items priced at or below 300 (specific currency). } } } } } } }
Query process: Fetch all items whose price is lower than or equal to 300 (specific currency) from the target index; calculate the similarity between the image_vector field of each of these items and the feature vector of the user-provided image; rank the results by similarity score in descending order; return the top 3 most relevant results, with the image_vector field removed and only core information such as product name and price retained.
The command output is similar to the following:
{ "took" : 1, //The query took one second "timed_out" : false, //The query didn't time out. "_shards" : { //Shard execution result "total" : 1, //Total number of shards "successful" : 1, //Number of shards successfully executed "skipped" : 0, //Number of shards skipped "failed" : 0 //Number of shards failed }, "hits" : { "total" : { //Total number of document matches "value" : 3, //Three exact matches (eq indicates an exact count) "relation" : "eq" }, "max_score" : 1.0, //Highest similarity score (depending on the distance algorithm used in the vector space). "hits" : [ //List of matched documents (in descending order of similarity scores) { "_index" : "my_store", //Index that contains the document "_type" : "_doc", //Document type "_id" : "JPL5r5YBWkpNKdSUkRc9", //Unique document ID "_score" : 1.0, //Similarity score of the current document (normalized) "_source" : { //Stored original documents (image_vector excluded) "price" : 200.0, "productName" : "Latest art shirts for women in autumn 2017" } }, //... (Other results follow the same structure, with a lower similarity score) ] } }
Step 6: Deleting Indexes
If an index is no longer used, run the following command on Kibana to delete it to reclaim resources:
DELETE /my_store
The command output is similar to the following:
{ "acknowledged" : true }
Follow-up Operations
You can delete the cluster if you no longer need it.

After you delete a cluster, its data cannot be restored. Exercise caution when deleting a cluster.
- Log in to the CSS management console.
- In the navigation pane on the left, choose Clusters > Elasticsearch.
- In the cluster list, locate the Sample-ESCluster cluster, and choose More > Delete in the Operation column.
- In the confirmation dialog box, type in DELETE, and click OK.
Related Documents
- For more information about the CSS vector database, see Optimizing the Write and Query Performance of Vector Search.
- To learn more about the CSS vector database performance, see Testing the Performance of CSS's Elasticsearch Vector Search.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot