Updated on 2025-05-29 GMT+08:00

Vector Database Parameters

maintenance_work_mem

Parameter description: Specifies the maximum amount of memory used in maintenance operations.

Parameter type: integer.

Unit: KB

Value range: 1024 to 2147483647

Default value: 65536 (that is, 64 MB).

Setting method: This is a USERSET parameter. Set it based on instructions provided in 17.2-Table 1 GUC parameter types. For example, if the value is 1024 without a unit, maintenance_work_mem indicates 1024 KB. If the value is 1MB, maintenance_work_mem indicates 1 MB. The unit must be KB, MB, or GB if required.

Setting suggestion: The value of this parameter must be greater than the memory required for data sampling during vector index building. For the gsivfflat index, the memory required for sampling is estimated at max(nlist, nlist2) x dim x 0.2 KB. For the gsdiskann index, when pq is enabled, the memory required for sampling is dim x 80 KB.

Risks and impacts of improper settings: If the value is too small, indexes cannot be created and other services that require large memory will fail.

diskann_probe_ncandidates

Parameter description: Specifies the size of the candidate set when the gsdiskann index is used to retrieve vectors.

Parameter type: integer.

Unit: none

Value range: 1 to 32768

Default value: 128

Setting method: This is a USERSET parameter. Set it based on instructions provided in 17.2-Table 1 GUC parameters.

Setting suggestion:

  • Retain the default value. Obtain the optimal parameter settings through experiments.
  • The diskann_probe_ncandidates parameter can be set and take effect for some queries that use the gsdiskann index. You are advised to set the parameter only in a session using SET. You are advised not to use gs_guc to set the parameter globally.

Risks and impacts of improper settings: If the value is too large, the query performance deteriorates. If the value is small, the recall rate is insufficient.

gsivfflat_probes

Parameter description: Specifies the number of inverted tables to be searched when the gsivfflat index is used to retrieve vectors. If the total number of inverted tables exceeds the total number of inverted tables of the gsivfflat index, the entire table is searched. The total number of inverted tables of the gsivfflat index is specified by the ivf_nlist parameter when the index is created.

Parameter type: integer.

Unit: none

Value range: 1 to 32768

Default value: 5

Setting method: This is a USERSET parameter. Set it based on instructions provided in 17.2-Table 1 GUC parameters.

Setting suggestion: Set this parameter to 3% of the value of ivf_nlist when the index is created. You are advised to obtain the optimal parameter configuration through experiments.

Risks and impacts of improper settings: If this parameter is set to a larger value, the search takes a longer time but the search result is more accurate.

gsivfflat_secondary_probes

Parameter description: Specifies the number of level-2 inverted tables to be searched when the gsivfflat index is used to retrieve vectors. If the total number of inverted tables exceeds the total number of level-2 inverted tables, the entire table is searched. The total number of level-2 inverted tables is specified by the ivf_nlist2 parameter when the index is created.

Parameter type: integer.

Unit: none

Value range: 1 to 32768

Default value: 5

Setting method: This is a USERSET parameter. Set it based on instructions provided in 17.2-Table 1 GUC parameters.

Setting suggestion: You are advised to set this parameter to a value between 1/4 and 1/2 of the value of ivf_nlist2. You are advised to obtain the optimal parameter configuration through experiments.

Risks and impacts of improper settings: If this parameter is set to a larger value, the search takes a longer time but the search result is more accurate.

gsivfflat_secondary_probes has the same effect as gsivfflat_probes. gsivfflat_secondary_probes takes effect only when the vector index is a double-layer index, which effectively accelerates the query speed. You are advised to obtain the optimal parameter configuration through experiments.

enable_vectordb

Parameter description: Specifies whether vector indexes can be created and whether vector indexes can be added, modified, and queried. For details about the functions of a vector database, see "Using a Vector Database" in Vector Database Developer Guide.

Parameter type: Boolean.

Unit: none

Value range:

  • on: allowed.
  • off: not allowed.

Default value: off

Setting method: This is a SIGHUP parameter. Set it based on instructions provided in 17.2-Table 1 GUC parameters.

Setting suggestion: If users are not allowed to use vector database functions such as vector index, set this parameter to off. Otherwise, set this parameter to on.

Risks and impacts of improper settings: The parameter value determines whether users can use the vector database functions. If the parameter value is incorrect, users may use the out-of-range database functions.

vectordb_node_retrieval_num_percent

Parameter description: Specifies the ratio of DNs for data query during vector query in the K-means distribution. A lower ratio indicates the higher performance, decreasing the accuracy.

Parameter type: floating-point.

Unit: none

Value range: 0 to 1

Default value: 1.0

Setting method: This is a USERSET parameter. Set it based on instructions provided in 17.2-Table 1 GUC parameters.

Setting suggestion: Set it to 1.0. If the recall rate (the ratio of correct scan results to the full scan results) decreases, you can increase the value of this parameter.

If the calculated distance is the same as the distribution key distance, you are advised to retain the default value to improve the search performance. If the calculated distance is different from the distribution key distance, you are advised to set the value to 1.0 to ensure the recall rate accuracy.

Risks and impacts of improper settings: This parameter is used to control the number of nodes for vector search. Therefore, a smaller parameter value indicates a smaller search range and shorter search time. However, the returned result may not be optimal. Larger value indicates more nodes to be searched for. In this case, the returned result is more accurate and takes longer time. Consider the centralization of vector data when setting this parameter. If data is centralized (data redistribution has been performed in a short period of time), you are advised to set this parameter to a small value. If data is scattered (data redistribution has not been performed for a long period of time), you are advised to set this parameter to a large value. If the search distance is inconsistent with the distribution key distance and the parameter value is less than 1.0, the data search range is incorrect and the search recall rate is low.

vectordb_need_copy_auto_redistribute

Parameter description: Specifies whether to create a center point when the copy operation is performed on the data table of the vector distribution key if there is no center point.

Parameter type: Boolean.

Unit: none

Value range:

  • on: enabled.
  • off: disabled.

Default value: on

Setting method: This is a USERSET parameter. Set it based on instructions provided in "Configuring GUC Parameters > Setting Parameters" in Administrator Guide.

Setting suggestion: If automatic redistribution is not required during data copy, set this parameter to off. Otherwise, set this parameter to on. However, when the number of DNs is less than two, automatic redistribution will not be performed even if this parameter is set to on.

Risks and impacts of improper settings: If this function is enabled, data redistribution during copy takes a short time, which increases the copy time. If this function is disabled, no center point is generated and data is not clustered.

Table 1 GUC parameters related to the vector database

GUC Parameter

Level

Value Range/Default Value

Description

maintenance_work_mem

Session

[1024,2147483647]/[1MB,2048GB)

(65536/64MB)

Maximum amount of memory used in maintenance operations. The default unit is KB.

diskann_probe_ncandidates

Session

[1, 32768](128)

Size of the candidate set when the gsdiskann index is used to retrieve vectors.

gsivfflat_probes

Session

[1, 32768](5)

Number of inverted tables to be searched. If it exceeds the total number of inverted tables, the entire table is searched.

gsivfflat_secondary_probes

Session

[1, 32768](5)

Number of level-2 inverted tables to be searched. If it exceeds the total number of level-2 inverted tables, the entire table is searched.

enable_vectordb

Global parameter (SIGHUP)

[off,on](off)

An advance feature that specifies whether vector indexes can be created and whether vector indexes can be added, modified, and queried.

vectordb_node_retrieval_num_percent

Session

[0,1](0.5)

Ratio of DNs for data query during vector query in the K-means distribution. A lower ratio indicates the higher performance, decreasing the accuracy.

vectordb_need_copy_auto_redistribute

Session

[off,on](on)

Specifies whether to create a center point during the copy operation when the data table of the vector distribution key does not have a center point.