Compute
Elastic Cloud Server
Huawei Cloud Flexus
Bare Metal Server
Auto Scaling
Image Management Service
Dedicated Host
FunctionGraph
Cloud Phone Host
Huawei Cloud EulerOS
Networking
Virtual Private Cloud
Elastic IP
Elastic Load Balance
NAT Gateway
Direct Connect
Virtual Private Network
VPC Endpoint
Cloud Connect
Enterprise Router
Enterprise Switch
Global Accelerator
Management & Governance
Cloud Eye
Identity and Access Management
Cloud Trace Service
Resource Formation Service
Tag Management Service
Log Tank Service
Config
OneAccess
Resource Access Manager
Simple Message Notification
Application Performance Management
Application Operations Management
Organizations
Optimization Advisor
IAM Identity Center
Cloud Operations Center
Resource Governance Center
Migration
Server Migration Service
Object Storage Migration Service
Cloud Data Migration
Migration Center
Cloud Ecosystem
KooGallery
Partner Center
User Support
My Account
Billing Center
Cost Center
Resource Center
Enterprise Management
Service Tickets
HUAWEI CLOUD (International) FAQs
ICP Filing
Support Plans
My Credentials
Customer Operation Capabilities
Partner Support Plans
Professional Services
Analytics
MapReduce Service
Data Lake Insight
CloudTable Service
Cloud Search Service
Data Lake Visualization
Data Ingestion Service
GaussDB(DWS)
DataArts Studio
Data Lake Factory
DataArts Lake Formation
IoT
IoT Device Access
Others
Product Pricing Details
System Permissions
Console Quick Start
Common FAQs
Instructions for Associating with a HUAWEI CLOUD Partner
Message Center
Security & Compliance
Security Technologies and Applications
Web Application Firewall
Host Security Service
Cloud Firewall
SecMaster
Anti-DDoS Service
Data Encryption Workshop
Database Security Service
Cloud Bastion Host
Data Security Center
Cloud Certificate Manager
Edge Security
Situation Awareness
Managed Threat Detection
Blockchain
Blockchain Service
Web3 Node Engine Service
Media Services
Media Processing Center
Video On Demand
Live
SparkRTC
MetaStudio
Storage
Object Storage Service
Elastic Volume Service
Cloud Backup and Recovery
Storage Disaster Recovery Service
Scalable File Service Turbo
Scalable File Service
Volume Backup Service
Cloud Server Backup Service
Data Express Service
Dedicated Distributed Storage Service
Containers
Cloud Container Engine
SoftWare Repository for Container
Application Service Mesh
Ubiquitous Cloud Native Service
Cloud Container Instance
Databases
Relational Database Service
Document Database Service
Data Admin Service
Data Replication Service
GeminiDB
GaussDB
Distributed Database Middleware
Database and Application Migration UGO
TaurusDB
Middleware
Distributed Cache Service
API Gateway
Distributed Message Service for Kafka
Distributed Message Service for RabbitMQ
Distributed Message Service for RocketMQ
Cloud Service Engine
Multi-Site High Availability Service
EventGrid
Dedicated Cloud
Dedicated Computing Cluster
Business Applications
Workspace
ROMA Connect
Message & SMS
Domain Name Service
Edge Data Center Management
Meeting
AI
Face Recognition Service
Graph Engine Service
Content Moderation
Image Recognition
Optical Character Recognition
ModelArts
ImageSearch
Conversational Bot Service
Speech Interaction Service
Huawei HiLens
Video Intelligent Analysis Service
Developer Tools
SDK Developer Guide
API Request Signing Guide
Terraform
Koo Command Line Interface
Content Delivery & Edge Computing
Content Delivery Network
Intelligent EdgeFabric
CloudPond
Intelligent EdgeCloud
Solutions
SAP Cloud
High Performance Computing
Developer Services
ServiceStage
CodeArts
CodeArts PerfTest
CodeArts Req
CodeArts Pipeline
CodeArts Build
CodeArts Deploy
CodeArts Artifact
CodeArts TestPlan
CodeArts Check
CodeArts Repo
Cloud Application Engine
MacroVerse aPaaS
KooMessage
KooPhone
KooDrive
Help Center/ Cloud Search Service/ Best Practices/ Testing the Performance of CSS's Elasticsearch Vector Search

Testing the Performance of CSS's Elasticsearch Vector Search

Updated on 2025-01-06 GMT+08:00

Scenarios

CSS's vector search engine provides a fully managed, high-performance distributed vector database service. To facilitate performance/pressure testing for the vector search service and provide accurate references for product selection and resource configuration, this document describes the performance testing solutions for CSS's Elasticsearch vector search service based on open-source datasets and open-source pressure testing tools.

Preparations

  • Create an Elasticsearch vector database. For details, see Creating an Elasticsearch Cluster.

    Set the node quantity to 3, and node specifications to 4 vCPUs | 16GB under General computing. (The test data volume is relatively small, and the CPU specifications need to be kept close to those used in third-party performance benchmarking.) Select ultra-high I/O for node storage, and keep the security mode disabled.

  • Obtain test datasets.
    • sift-128-euclidean: 128 dimensions, 1 million base records, measured using the Euclidean distance.
    • cohere-768-cosine: 768 dimensions, 1 million base records, measured using the cosine distance.
    • gist-960-euclidean: 960 dimensions, 1 million base records, measured using the Euclidean distance.

    You can download sift-128-euclidean and gist-960-euclidean at https://github.com/erikbern/ann-benchmarks. To use the cohere-768-cosine dataset, submit a service ticket.

    Figure 1 Downloading sift-128-euclidean and gist-960-euclidean
  • Prepare the testing tools.

Performance Testing Procedure

  1. Create an ECS for installing the pressure testing tool and executing test scripts. For details, see Purchasing and Using a Linux ECS.
    • This ECS and the Elasticsearch cluster created for the testing purpose must be within the same VPC and security group.
    • You may use another client server instead. No matter what server you use, ensure that it is in the same VPC as the Elasticsearch cluster.
  2. Upload the test dataset to the ECS.
  3. Upload the data writing and recall testing scripts to the ECS, and run the following command:
    pip install h5py
    pip install elasticsearch==7.10
    
    python3 base_test_example.py

    This command creates a vector index for testing, writes in the test data, and returns the average recall for the queries.

  4. Install wrk on the ECS.
  5. On the ECS, prepare the query request file used for the pressure testing to simulate real-world traffic. See Script prepare_query.py for an example.
    pip install h5py
    
    python3 prepare_query.py
  6. Prepare the wrk pressure testing configuration script on the ECS. See Script perf.lua for an example. Modify the query request file name, cluster address, and index name in the script as needed.
  7. Run the following command on the ECS to perform pressure testing on CSS's vector search service:
    wrk -c60 -t60 -d10m -s perf.lua http://x.x.x.x:9200 
    • t indicates the number of pressure testing threads.
    • c indicates the number of server connections.
    • d indicates the pressure testing duration. 10m indicates 10 minutes.
    • s indicates the pressure testing configuration script for wrk.
    • x.x.x.x indicates the address of the Elasticsearch cluster.

    Obtain the test result from the command output, where Requests/sec indicates the query throughput in QPS.

    Figure 2 Test result example

Performance Testing Solutions

  • GRAPH indexes

    For a database of millions of records, GRAPH indexing is recommended.

    • Testing solution 1: Use datasets of varying numbers of data dimensions to test the maximum QPS supported by the vector database when the top-10 recall rate reaches 99%. Perform the test on each dataset using both default parameters and performance-tuned ones. By tuning the build parameters, you can optimize the graph index structure to enhance query performance while maintaining the same recall rate.
      Test result:
      Table 1 GRAPH index test result 1

      Dataset

      Build Parameters

      Query Parameters

      Metrics

      efc

      shrink

      ef

      max_scan_num

      QPS

      Recall

      sift-128-euclidean

      200

      1.0

      84

      10000

      15562

      0.99

      500

      0.8

      50

      10000

      17332

      0.99

      cohere-768-cosine

      200

      1.0

      154

      10000

      3232

      0.99

      500

      0.95

      106

      10000

      3821

      0.99

      gist-960-euclidean

      200

      1.0

      800

      19000

      860

      0.99

      500

      0.9

      400

      15000

      1236

      0.99

      Conclusion: On all these datasets, the vector search service can reach a recall rate of 99% or higher using default parameters. By further tuning the build and query parameters, the index building overhead increases slightly, but this also results in improved query performance.

    • Testing solution 2: Use the same dataset to test the vector search service's query performance under different recall rates by tuning index parameters. This solution uses the Cohere dataset to test the maximum QPS of the cluster under different top-10 recall rates—99%, 98%, and 95%, respectively.
      Test result:
      Table 2 GRAPH index test result 1

      Dataset

      Build Parameter

      Query Parameter

      Metrics

      efc

      ef

      QPS

      Recall

      cohere-768-cosine

      500

      128

      3687

      0.99

      500

      80

      5320

      0.98

      500

      36

      9028

      0.95

      Conclusion: With fixed index building parameters for the same cluster, tuning the ef parameter can achieve different query precisions. Slightly sacrificing the recall rate can significantly boost QPS.

  • GRAPH_PQ indexes

    Graph-based indexing typically requires residual memory to ensure query performance. When dealing with a large number of vector dimensions or high data volumes, memory resources become a crucial factor affecting costs and performance. Specifically, high-dimensional vectors and large datasets demand significantly more memory, increasing storage costs and directly impacting the efficiency and response times of the indexing algorithm. In this scenario, the GRAPH_PQ indexing algorithm is recommended.

    Testing solution: Use the COHERE and GIST datasets with high data dimensions to test the cluster's maximum QPS under a top-10 recall rate of 95%. Compare the residual memory overhead with that of GRAPH indexes.

    Test result:
    Table 3 GRAPH_PQ index test result

    Dataset

    Build Parameters

    Query Parameters

    Metrics

    Memory Overhead

    efc

    fragment_num

    ef

    topk

    QPS

    Recall

    GRAPH_PQ

    GRAPH

    cohere-768-cosine

    200

    64

    85

    130

    8723

    0.95

    332MB

    3.3GB

    gist-960-euclidean

    200

    120

    200

    360

    4267

    0.95

    387MB

    4.0GB

    Conclusion: The result shows that GRAPH_PQ indexing can achieve the same or similar precision and QPS as GRAPH indexing while cutting the memory overhead more than 10 times. By integrating graph indexing and quantization, the GRAPH_PQ indexing algorithm used by CSS's vector search service significantly reduces the memory overhead and increases per-node data capacity.

Table 4 describes the indexing parameters used above. For more information about the build parameters, see Creating Vector Indexes in an Elasticsearch Cluster. For more information about the query parameters, see Using Vector Indexes for Data Search in an Elasticsearch Cluster.

Table 4 Description of index parameters

Type

Parameter

Description

Build parameter

efc

Queue size of the neighboring node during HNSW build. The default value is 200. A larger value indicates a higher precision and slower build speed.

shrink

Cropping coefficient during HNSW build. The default value is 1.0f.

fragment_num

Number of fragments. The default value is 0. The plugin automatically sets the number of fragments based on the vector length.

Query parameter

ef

Queue size of the neighboring node during the query. A larger value indicates a higher query precision and slower query speed. The default value is 200.

max_scan_num

Maximum number of scanned nodes. A larger value indicates a higher query precision and slower query speed. The default value is 10000.

topk

The number of top-k records returned for a query.

Script base_test_example.py

# -*- coding: UTF-8 -*-
import json
import time

import h5py
from elasticsearch import Elasticsearch
from elasticsearch import helpers

def get_client(hosts: list, user: str = None, password: str = None):
    if user and password:
        return Elasticsearch(hosts, http_auth=(user, password), verify_certs=False, ssl_show_warn=False)
    else:
        return Elasticsearch(hosts)

# For more information about the index parameters, see Creating Vector Indexes in an Elasticsearch Cluster.
def create(es_client, index_name, shards, replicas, dim, algorithm="GRAPH",
           metric="euclidean", neighbors=64, efc=200, shrink=1.0):
    index_mapping = {
        "settings": {
            "index": {
                "vector": True
            },
            "number_of_shards": shards,
            "number_of_replicas": replicas,
        },
        "mappings": {
            "properties": {
                "id": {
                    "type": "integer"
                },
                "vec": {
                    "type": "vector",
                    "indexing": True,
                    "dimension": dim,
                    "algorithm": algorithm,
                    "metric": metric,
                    "neighbors": neighbors,
                    "efc": efc,
                    "shrink": shrink,
                }
            }
        }
    }
    es_client.indices.create(index=index_name, body=index_mapping)
    print(f"Create index success! Index name: {index_name}")

def write(es_client, index_name, vectors, bulk_size=1000):
    print("Start write! Index name: " + index_name)
    start = time.time()
    for i in range(0, len(vectors), bulk_size):
        actions = [{
            "_index": index_name,
            "id": i + j,
            "vec": v.tolist()
        } for j, v in enumerate(vectors[i: i + bulk_size])]
        helpers.bulk(es_client, actions, request_timeout=180)
    print(f"Write success! Docs count: {len(vectors)}, total cost: {time.time() - start:.2f} seconds")
    merge(es_client, index_name)

def merge(es_client, index_name, seg_cnt=1):
    print(f"Start merge! Index name: {index_name}")
    start = time.time()
    es_client.indices.forcemerge(index=index_name, max_num_segments=seg_cnt, request_timeout=7200)
    print(f"Merge success! Total cost: {time.time() - start:.2f} seconds")

# For more information about the query parameters, see Using Vector Indexes for Data Search in an Elasticsearch Cluster.
def query(es_client, index_name, queries, gts, size=10, k=10, ef=200, msn=10000):
    print("Start query! Index name: " + index_name)
    i = 0
    precision = []
    for vec in queries:
        hits = set()
        dsl = {
            "size": size,
            "stored_fields": ["_none_"],
            "docvalue_fields": ["id"],
            "query": {
                "vector": {
                    "vec": {
                        "vector": vec.tolist(),
                        "topk": k,
                        "ef": ef,
                        "max_scan_num": msn
                    }
                }
            }
        }
        res = es_client.search(index=index_name, body=json.dumps(dsl))
        for hit in res['hits']['hits']:
            hits.add(int(hit['fields']['id'][0]))
        precision.append(len(hits.intersection(set(gts[i, :size]))) / size)
        i += 1
    print(f"Query complete! Average precision: {sum(precision) / len(precision)}")

def load_test_data(src):
    hdf5_file = h5py.File(src, "r")
    base_vectors = hdf5_file["train"]
    query_vectors = hdf5_file["test"]
    ground_truths = hdf5_file["neighbors"]
    return base_vectors, query_vectors, ground_truths

def test_sift(es_client):
    index_name = "index_sift_graph"
    vectors, queries, gts = load_test_data(r"sift-128-euclidean.hdf5")
    # Adjust the number of shards and replicas, indexing algorithm, and index parameters based on the testing requirements. In our example here, one shard and two replicas are configured for performance testing.
    create(es_client, index_name, shards=1, replicas=2, dim=128)
    write(es_client, index_name, vectors)
    query(es_client, index_name, queries, gts)

if __name__ == "__main__":
    # Change the value to the address of the CSS cluster.
    client = get_client(['http://x.x.x.x:9200'])
    test_sift(client)

Script prepare_query.py

import base64
import json
import struct

import h5py

def prepare_query(src, dst, size=10, k=10, ef=200, msn=10000, metric="euclidean", rescore=False, use_base64=True):
    """
This function is used to read query vectors from the source data files in HDF5 format and generate a complete query request body for performance testing.
    :param src: path of the source data files in HDF5 format.
    :param dst: destination file path.
    :param size: number of query results returned.
    :param k: number of top-k similar results returned by querying a segment-level index.
    :param ef: specifies the queue size used during a query.
    :param msn: specifies max_scan_num.
    :param metric: metric used for result rescoring, such as euclidean, cosine, and inner_product.
    :param rescore: whether to use rescoring. You can enable rescoring for GRAPH_PQ indexes.
    :param use_base64: whether to use Base64-encoded vector data.
    """
    hdf5_file = h5py.File(src, "r")
    query_vectors = hdf5_file["test"]
    with open(dst, "w", encoding="utf8") as fw:
        for vec in query_vectors:
            query_template = {
                "size": size,
                "stored_fields": ["_none_"],
                "docvalue_fields": ["id"],
                "query": {
                    "vector": {
                        "vec": {
                            "vector": vec.tolist() if not use_base64 else floats2base64(vec),
                            "topk": k,
                            "ef": ef,
                            "max_scan_num": msn,
                        }
                    }
                }
            }
            if rescore:
                query_template["query"]["rescore"] = {
                    "window_size": k,
                    "vector_rescore": {
                        "field": "vec",
                        "vector": vec.tolist() if not use_base64 else floats2base64(vec),
                        "metric": metric
                    }
                }
            fw.write(json.dumps(query_template))
            fw.write("\n")

def floats2base64(vector):
    data = struct.pack('<{}f'.format(len(vector)), *vector)
    return base64.b64encode(data).decode()

if __name__ == "__main__":
    # Change the value to the data file address.
    prepare_query(r"/path/to/sift-128-euclidean.hdf5", r"requests.txt")

Script perf.lua

local random = math.random

local reqs = {}
local cnt = 0

-- Rename the query request file used for pressure testing as needed.
for line in io.lines("requests.txt") do
    table.insert(reqs, line)
    cnt = cnt + 1
end


local addrs = {}
local counter = 0
function setup(thread)
   local append = function(host, port)
      for i, addr in ipairs(wrk.lookup(host, port)) do
         if wrk.connect(addr) then
            addrs[#addrs+1] = addr
         end
      end
   end

   if #addrs == 0 then
      -- Change the value to the cluster address.
      append("x.x.x.x", 9200)
      append("x.x.x.x", 9200)
      append("x.x.x.x", 9200)
   end

   local index = counter % #addrs + 1
   counter = counter + 1
   thread.addr = addrs[index]
end

-- Change the index name as needed.
wrk.path = "/index_sift_graph/_search?request_cache=false&preference=_local"
wrk.method = "GET"
wrk.headers["Content-Type"] = "application/json"


function request()
    return wrk.format(wrk.method, wrk.path, wrk.headers, reqs[random(cnt)])
end

We use cookies to improve our site and your experience. By continuing to browse our site you accept our cookie policy. Find out more

Feedback

Feedback

Feedback

0/500

Selected Content

Submit selected content with the feedback