Using Elasticsearch for Vector Search

Semantic (vector-based) search maps unstructured data, such as text and images, to high-dimensional vector spaces, where content similarity is measured by distances between vectors. Compared with traditional keywords-based search, this semantic similarity-based search, or vector search, significantly improves recall and accuracy. Vector search is now widely used in real-world applications, such as image search, video retrieval, facial recognition, and personalized ads targeting.

This topic provides an example of implementing vector search through a CSS Elasticsearch cluster. Through this example, you get to learn about the CSS vector database, including creating vector indexes, importing vector data, and performing a vector search.

Scenario Description

An e-commerce platform demands more accurate product search—this is where vector search comes in. A deep learning model is used to convert product images into semantic vectors. These vectors, along with structured product information such as product names and prices, are stored in an Elasticsearch cluster. The following search options are available:

Pure vector search: Provide an image and search for products that are most similar to this image.
Filter: Search for similar products within a specified price range.
Combined search: Search by keywords as well as vector similarity.

Assume that the e-commerce website has the vector data shown in Table 1:

**Table 1** Products sold by one e-commerce website
productName	image_vector	price
Latest art shirts for women in autumn 2017	[1.0, 1.0]	100.0
Latest art shirts for women in autumn 2017	[1.0, 2.0]	200.0
Latest art shirts for women in autumn 2017	[1.0, 3.0]	300.0
Latest jeans for women in spring 2018	[10.0, 20.0]	100.0
Latest jeans for women in spring 2018	[10.0, 30.0]	200.0
Latest casual pants for women in spring 2017	[100.0, 200.0]	100.0
Latest casual pants for women in spring 2017	[100.0, 300.0]	200.0

Procedure

The following describes how to use an Elasticsearch cluster to implement a vector search function.

Step 1: Creating an Elasticsearch Cluster: Create a non-security mode Elasticsearch cluster for vector search.
Step 2: Logging In to Kibana: Log in to the cluster through Kibana.
Step 3: Creating a Vector Index: Create a vector index to store vector data.
Step 4: Importing Vector Data: Use an open-source Elasticsearch API to import data.
Step 5: Vector Search: Perform a pure vector search and combined search in the Elasticsearch cluster.
Step 6: Deleting Indexes: Delete indexes that you no longer need to reclaim resources.

Step 1: Creating an Elasticsearch Cluster

Create a non-security mode Elasticsearch cluster for vector search.

Step 2: Logging In to Kibana

After an Elasticsearch cluster is created, you can access the cluster through Kibana.

From the Elasticsearch cluster list, select the created Sample-ESCluster cluster and click Access Kibana in the Operation column to access the Kibana console.
In the left navigation pane on the Kibana console, click Dev Tools.
The left part of the console is the command input box, and the triangle icon in its upper-right corner is the execution button. The right part shows the execution result.
Figure 1 Kibana console

Step 3: Creating a Vector Index

Create a vector index in the Elasticsearch cluster to store vector data.

Run the following command on Kibana to create a vector index named my_store:

PUT /my_store 
{
  "settings": {       		//Index-level settings
    "index": {
      "vector": true  		//Enable vector search
    }
  },
  "mappings": {       		//Define document field structure and types
    "properties": {
      "productName": {    	//Product name field (text)
        "type": "text",   	//Standard text, full-text search supported
        "analyzer": "ik_smart"  //Use the ik_smart tokenizer (for the Chinese language)
      },
      "image_vector": {   	//Image vector field
        "type": "vector", 	//Declare the vector type
        "dimension": 2,   	//Number of vector dimensions (For simplicity of demonstration, this example uses vectors of only two dimensions. In practice, the dimensionality should match the model's output dimensionality. 512 and 768-dimensional vectors are commonly used.
        "indexing": true, 	//Enable vector indexing to support semantic similarity-based search.
        "algorithm": "GRAPH",  	//Create an approximate nearest neighbor (ANN) index.
        "metric": "euclidean"  	//Use the Euclidean distance to measure similarity.
      },
      "price": {                //Product price field
        "type": "float"         //Floating-point number type, which supports range queries and numerical calculation.
      }
    }
  }
}

The command output is similar to the following:

{
  "acknowledged" : true,
  "shards_acknowledged" : true,
  "index" : "my_store"
}

Step 4: Importing Vector Data

There are several ways to import data to an Elasticsearch cluster. In this example, we use an open-source Elasticsearch API to import data on Kibana.

On the Kibana console, run the following command to import vector data to the index named my_store:

POST /my_store/_bulk
{"index":{}}
{"productName":"Latest art shirts for women in autumn 2017","image_vector":[1.0, 1.0],"price":100.0}
{"index":{}}
{"productName":"Latest art shirts for women in autumn 2017","image_vector":[1.0, 2.0],"price":200.0}
{"index":{}}
{"productName":"Latest art shirts for women in autumn 2017","image_vector":[1.0, 3.0],"price":300.0}
{"index":{}}
{"productName":"Latest jeans for women in spring 2018","image_vector":[10.0, 20.0],"price":100.0}
{"index":{}}
{"productName":"Latest jeans for women in spring 2018","image_vector":[10.0, 30.0],"price":200.0}
{"index":{}}
{"productName":"Latest casual pants for women in spring 2017","image_vector":[100.0, 200.0],"price":100.0}
{"index":{}}
{"productName":"Latest casual pants for women in spring 2017","image_vector":[100.0, 300.0],"price":200.0}

If the value of the errors field in the command output is false, the data is imported successfully.

Step 5: Vector Search

Perform a pure vector search and combined search in the Elasticsearch cluster.

Pure vector search

A user provides a product image and search for products that are similar to this image. The cluster first obtains the image's feature vector through a vectorization model, and then uses this vector to perform a similarity-based search.

Run the following search command on Kibana:

GET /my_store/_search
{
  "size": 3,  			//Specify the return of the top-3 most relevant results
  "_source": { 
    "excludes": "image_vector"  //Exclude the image_vector field in the returned result
  }, 
  "query": {
    "vector": {  		//Enable vector search
      "image_vector": {  	//Specify the target vector field name (must be consistent with the index mapping)
        "vector": [1.0, 2.0],  	//The feature vector to be queried (Here a simplified example is provided. The actual dimensions should be consistent with the model output.)
        "topk": 3  	        //Return the top-3 most relevant results
      }
    }
  }
}

The following shows an example of the query result. Elasticsearch ranks the results based on the similarity score between the query vector and the stored vectors.

{
  "took" : 1,  		//The query took one second
  "timed_out" : false,  //The query did not time out
  "_shards" : {  	//Shard execution result
    "total" : 1,  	//Total number of shards
    "successful" : 1,  	//Number of shards successfully executed
    "skipped" : 0,  	//Number of shards skipped
    "failed" : 0  	//Number of shards failed
  },
  "hits" : {
    "total" : {  	//Total number of document matches
      "value" : 3,  	//Three exact matches (eq indicates an exact count)
      "relation" : "eq"
    },
    "max_score" : 1.0,  //Highest similarity score (depending on the distance algorithm used in the vector space).
    "hits" : [  	//List of matched documents (in descending order of similarity scores)
      {
        "_index" : "my_store",          //Index that contains the document
        "_type" : "_doc",  		//Document type, which is a fixed value
        "_id" : "JPL5r5YBWkpNKdSUkRc9",  //Unique document ID
        "_score" : 1.0,  	//Similarity score of the current document
        "_source" : {  			//Stored original documents (image_vector excluded)
          "price" : 200.0,
          "productName" : "Latest art shirts for women in autumn 2017"
        }
      },
      //... (Other results follow the same structure, with a lower similarity score)
    ]
  }
}

Hybrid search

A user provides a product image and search for products that are similar to this image while specifying a price range. A combined search is performed: vector search + filtering by price range.

Run the following search command on Kibana:

GET /my_store/_search
{
  "size": 3,                     //Specify the return of the top-3 most relevant results
  "_source": { 
    "excludes": "image_vector"  //Exclude the image_vector field in the returned result
  }, 
  "query": {
    "vector": {                  //Enable vector search
      "image_vector": {  	 //Specify the target vector field name (must be consistent with the index mapping)
        "vector": [1.0, 2.0],   //The feature vector to be queried (Here a simplified example is provided. The actual dimensions should be consistent with the model output.)
        "topk": 3  	            //Return the top-3 most relevant results
        "filter": {              //Hybrid filters (filtering before calculating similarity)
          "range": {             //Filter by price range
            "price": {
              "lte": 300         //Retain only items priced at or below 300 (specific currency).
            }
          }
        }
      }
    }
  }
}

Query process: Fetch all items whose price is lower than or equal to 300 (specific currency) from the target index; calculate the similarity between the image_vector field of each of these items and the feature vector of the user-provided image; rank the results by similarity score in descending order; return the top 3 most relevant results, with the image_vector field removed and only core information such as product name and price retained.

The command output is similar to the following:

{
  "took" : 1,  		        //The query took one second
  "timed_out" : false,	//The query didn't time out.
  "_shards" : {			//Shard execution result
    "total" : 1,  	   //Total number of shards
    "successful" : 1,  //Number of shards successfully executed
    "skipped" : 0,      //Number of shards skipped
    "failed" : 0  	   //Number of shards failed
  },
  "hits" : {
    "total" : {  	                //Total number of document matches
      "value" : 3,      //Three exact matches (eq indicates an exact count)
      "relation" : "eq"
    },
    "max_score" : 1.0,  //Highest similarity score (depending on the distance algorithm used in the vector space).
    "hits" : [  	        //List of matched documents (in descending order of similarity scores)
      {
        "_index" : "my_store",  //Index that contains the document
        "_type" : "_doc",       //Document type
        "_id" : "JPL5r5YBWkpNKdSUkRc9", //Unique document ID
        "_score" : 1.0,  	//Similarity score of the current document (normalized)
        "_source" : {  	    //Stored original documents (image_vector excluded)
          "price" : 200.0,
          "productName" : "Latest art shirts for women in autumn 2017"
        }
      },
      //... (Other results follow the same structure, with a lower similarity score)
    ]
  }
}

Step 6: Deleting Indexes

If an index is no longer used, run the following command on Kibana to delete it to reclaim resources:

DELETE /my_store

The command output is similar to the following:

{
  "acknowledged" : true
}

Follow-up Operations

You can delete the cluster if you no longer need it.

After you delete a cluster, its data cannot be restored. Exercise caution when deleting a cluster.

Log in to the CSS management console.
In the navigation pane on the left, choose Clusters > Elasticsearch.
In the cluster list, locate the Sample-ESCluster cluster, and choose More > Delete in the Operation column.
In the confirmation dialog box, type in DELETE, and click OK.