Managing the Vector Search Cache
For vector search systems processing hundreds of millions of vectors, maintaining millisecond-level latency requires storing a large number of high-dimensional vector indexes in memory. Unlike traditional Elasticsearch or OpenSearch implementations that rely heavily on JVM heap memory, the CSS vector search engine is built on C++ and uses off-heap memory, which delivers superior performance. Without effective lifecycle management, large-scale deployments may experience: out of memory (OOM) if inactive (or cold) indexes accumulate and occupy too much memory; or unstable query latency due to frequent "swap-in and swap-out" of cached index segments. To address this problem, the CSS vector database implements a comprehensive set of off-heap memory management policies to ensure stable search performance under heavy loads. These policies include: real-time monitoring of memory usage watermarks; index preloading to mitigate high first-query latency; and dynamic cache reclamation through automatic cache clearing based on predefined idle timeout periods or usage thresholds.
How the Feature Works
The CSS vector database divides cluster physical memory into JVM heap memory and off-heap memory. The management policies for off-heap memory are as follows:
- Upon first hits, vector segments are loaded from the disk into off-heap memory. In this case, queries may experience high latency.
- Once the data is resident in off-heap memory, all subsequent queries are served directly from the cache, enabling millisecond-level response times.
- When the memory is full or when the idle timeout period expires, inactive segments are evicted from off-heap memory, ensuring stable query performance under heavy loads.
Monitoring Cache Status
To troubleshoot performance bottlenecks, check each cluster node's off-heap memory utilization and cache hit rate.
Run the following command to monitor the cache status:
GET /_vector/stats
Example response:
{
"_nodes" : { # Node information
"total" : 1, # Total number of nodes
"successful" : 1, # Number of successful nodes
"failed" : 0 # Number of failed nodes
},
"cluster_name" : "css-d3a7", # Cluster name
"cpu_circuit_breaker_triggered" : false, # Whether circuit breaking is triggered
"nodes" : {
"cAHmVUZTR9ON7t6jxcDCkg" : { # Node UUID
"cpu_cache_capacity_reached" : false, # Whether the off-heap memory usage of the current node reaches the upper limit
"cpu_eviction_count" : 0, # Number of segment-level cache swap-outs on the current node
"cpu_hit_count" : 0, # Number of segment-level cache hits on the current node
"cpu_load_exception_count" : 0, # Number of segment-level index loading failures on the current node
"cpu_load_success_count" : 0, # Number of segment-level index loading successes on the current node
"cpu_miss_count" : 0, # Number of segment-level cache misses on the current node
"cpu_query_memory_usage" : 0, # Off-heap memory usage on the current node, in KB
"cpu_total_load_time" : 0 # Total time loading segments to the off-heap memory on the current node, in ms
}
}
} Preloading Frequently Accessed Indexes
Preload frequently accessed or newly ingested indexes to the off-heap memory to mitigate high first-query latency (because data needs to be loaded from disk).
Run the following command to preload a specified index:
PUT /_vector/warmup/{index_name} | Parameter | Type | Default Value | Description |
|---|---|---|---|
| index_name | String | N/A | Specifies one or more vector indexes.
|
Example response:
{
"_shards" : {
"total" : 1,
"successful" : 1,
"failed" : 0
}
} Configuring an Automatic Cache Clearing Policy
When data is frequently updated or memory resources become constrained, you can enable automatic cache clearing to automatically evict inactive segments and reclaim off-heap memory, ensuring stable query performance under heavy loads.
Run the following command to enable automatic eviction of segments that have exceeded their idle timeout period:
PUT _cluster/settings
{
"persistent": {
"native.cache.expiry.enabled": "true",
"native.cache.expiry.time": "30m"
}
} | Parameter | Type | Default Value | Description |
|---|---|---|---|
| native.cache.expiry.enabled | Boolean | false | Whether to enable automatic cache clearing. When enabled, inactive segments are automatically evicted when their idle timeout period expires. The value can be:
|
| native.cache.expiry.time | String | 24h | Idle timeout period for evicting inactive segments. Value format: number + unit
Example: 24h (24 hours) and 30m (30 minutes). |
Manually Clearing the Cache
When off-heap memory reaches its capacity, the system automatically manages data through a "swap-in and swap-out" process. However, frequent, high-volume cache churn can impact query performance. After deleting indexes or switching workloads, you can manually reclaim off-heap memory occupied by inactive index segments to ensure query performance for hot data indexes.
- Clear the full cache:
PUT /_vector/clear/cache
- Clear specified indexes from the cache:
PUT /_vector/clear/cache/{index_name}
Example response:
{
"acknowledged" : "true"
} Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot