Updated on 2024-11-29 GMT+08:00

Creating an Index

Prerequisites

  • You have added the vector permission to the all-index policy by referring to Authentication Based on Ranger if you need to use vector retrieval in Ranger authentication mode.
  • In ACL authentication mode, user elasticsearch has the permission on all interfaces except the unregister interface in vector retrieval by default.

    The unregister interface is used to delete dictionaries. You can delete only the dictionaries created by yourself.

Procedure

The following creates an index named my_index that contains the my_vector field. The field creates a graph index and uses Euclidean distance to measure the similarity. For details about the parameters, see Table 1.

PUT my_index
{
 "settings": {
  "index": {
   "vector": true
  }
},
 "mappings": {
  "properties": {
   "my_vector": {
    "type": "vector",
    "dimension": 2,
    "indexing": true,
    "algorithm": "GRAPH",
    "metric": "euclidean"
   }
  }
 }
}
Table 1 Parameter description

Parameter

Sub-Parameter

Remarks

settings

vector

If vector index acceleration is required, set this parameter to true.

mappings

type

Field type. If this parameter is set to vector, the field is a vector.

dimension

Vector data dimension.

The value ranges from 0 to 4096.

indexing

Whether to enable index acceleration. By default, index acceleration is disabled.

  • true: indicates that index acceleration is enabled.
  • false: indicates that index acceleration is disabled.

algorithm

Index algorithm. This parameter is valid only when indexing is set to true. The default value is GRAPH. Options:

  • FLAT: brute-force algorithm that calculates the distance between the target vector and all vectors in sequence. The algorithm has a considerable calculation workload and its recall rate reaches 100%. Therefore, this algorithm applies to scenarios that require high recall rate accuracy.
  • GRAPH: Graph index that is embedded with the Self-developed HNSW algorithm. This algorithm is mainly used in scenarios where high performance and precision are required and the data volume is less than 10 million.
  • GRAPH_PQ: an algorithm that combines the HNSW algorithm with particle quantification (PQ) index. This algorithm can reduce the storage overhead of original vectors, enabling HNSW to easily support hundreds of millions of data search.

metric

Metric of measuring the distance between vectors. The default value is euclidean. Options:

  • euclidean
  • inner_product
  • cosine
  • hamming