Optimizing Vector Cluster Performance

This topic explains how to optimize the performance of a CSS vector database from two aspects—write and query.

Optimizing Write Performance

Writing vector data incurs three major overheads: replica synchronization, index refresh, and segment merging. When index data is written in real time, frequent index refresh operations generate a large number of small segments. This triggers frequent vector index build and merge operations, which consume excessive CPU/IO resources. You can try the following solutions to optimize write performance.

Solution 1: temporarily disable replicas

Description
Temporarily disable replicas during data ingestion and enable them after data ingestion is complete. Use this solution when importing historical data in batches or performing a full update (for example, when initializing a vector database).

Operation

Set the number of replicas:

PUT my_index/_settings
{
    "number_of_replicas": 0
}

Result
Write performance is enhanced by avoiding real-time vector index building on replica nodes.

Solution 2: adjust the refresh interval

Description
Set the index refresh interval to 120s or longer to reduce the number of small segments generated during frequent index refreshes and also reduce the vector index building overhead caused by segment merges. You can also disable automatic index refresh by setting the refresh interval to –1. Use this solution in high-throughput write scenarios (for example, when writing vectorized log data).

Operation

Set refresh_interval.

PUT my_index/_settings
{
    "refresh_interval": "120s"
}

Result
Index refreshes occur less frequently. The reduces the number of small segments and also the overhead of segment merges, leading to enhanced write performance.

Solution 3: increase indexing threads

Description
Increasing the number of threads for vector index building accelerates the indexing process. However, too many such threads will compete for query resources. Use this solution when there are sufficient CPU resources but the write latency is high—such as in GPU-accelerated environments.
Operation
The default value of native.vector.index_threads is 4. Change this value as needed.
```
PUT _cluster/settings
{
  "persistent": {
    "native.vector.index_threads": 8
  }
}
```
Result
Vector index building is accelerated, and the performance of concurrent writes is enhanced.

Optimizing Query Performance

Query performance is affected by the following factors: the number of segments, the memory circuit breaker mechanism, and field recall. An excessively large number of segments impacts search efficiency; when off-heap memory becomes insufficient, vector index data is frequently swapped in and out of the memory; recalling all fields increases the load during the fetch phase. You can optimize query performance by addressing these factors.

Solution 1: perform force merge

Description
After batch data ingestion, perform the force merge operation to forcibly merge segments, thus reducing the number of segments. Typically, you should perform this operation after data ingestion and before data query (for example, after a scheduled batch ingestion).

Operation

Perform the force merge operation:

POST my_index/_forcemerge?max_num_segments=1

Result
Multiple segments are merged into a single segment. This reduces the file scanning overhead and accelerates the query speed.

Solution 2: adjust the upper limit of the segment size

Description
During batch writes, the maximum size of segments generated by the system is 5 GB. You can increase this upper limit to reduce the number of segments generated after automatic merging. Typically, you should perform this operation before batch data ingestion starts.

Operation

Increase the maximum segment size:

PUT my_index/_settings
{
  "index.merge.policy.max_merged_segment": "10gb"
}

Result
Increasing the maximum segment size helps to reduce the number of segments and thus accelerate query performance.

Solution 3: adjust the circuit breaker limit for off-heap memory

Description
When the off-heap memory required by vector indexes exceeds the circuit breaker limit, the index cache manager frequently swaps in and out index data from the cache, which slows down queries. You can raise the circuit breaker limit to reduce circuit breaking (indicated by CircuitBreakingException recorded in the log) resulted from insufficient memory.
Operation
The default circuit breaker limit for off-heap memory is 80%. You can adjust this limit as required.
```
PUT _cluster/settings
{
  "persistent": {
    "native.cache.circuit_breaker.cpu.limit": "85%"
  }
}
```
Result
It is less likely for vector index data to be swapped out from the memory, and query jitter is reduced.

Solution 4: optimize field recall

Description
If the query result needs to return only a few fields that are either keywords or numeric values, you can use the docvalue_fields parameter to fetch them. Use this method if only numeric or enumerated metadata (such as product IDs and class labels) needs to be fetched. It can significantly reduce overhead during the fetch phase.

Operation

Use the docvalue_fields parameter to fetch only specific fields:

POST my_index/_search
{
  "size": 2,
  "stored_fields": ["_none_"],
  "docvalue_fields": ["my_label"],
  "query": {
    "vector": {
      "my_vector": {
        "vector": [1, 1],
        "topk": 2
      }
    }
  }
}

Result
There is no need to parse the entire _source document. Column-oriented storage (docvalues) reduces the overhead during the fetch phase and improves query performance.

Setting Cache Timeout

When the cluster's memory resources are insufficient, data is frequently updated, or high data freshness is required, you can enable automatic cache expiration to have inactive data cleared from the cache. This helps to optimize system performance, ensure data consistency, and improve query stability. Use this approach where data updates frequently or memory resources are stretching thin.

Run the following command to set cache timeout:

PUT _cluster/settings
{
  "persistent": {
    "native.cache.expiry.enabled": "true",
    "native.cache.expiry.time": "30m"
  }
}

**Table 1** Parameter description
Parameter	Type	Description
native.cache.expiry.enabled	Boolean	Whether to enable automatic cache expiration. Value range: true: Enable automatic cache expiration. Inactive data in the cache will be cleared. false (default value): Disable automatic cache expiration.
native.cache.expiry.time	String	Timeout of inactive cache items. This parameter takes effect only when native.cache.expiry.enabled=true. Value: a time string, for example, 24h (24 hours) or 30m (30 minutes). Default value: 24h.