Updated on 2024-10-14 GMT+08:00

pgvector

Introduction

RDS for PostgreSQL supports the pgvector extension, which allows for vector data type and vector similarity search. This extension supports:

  • Exact and approximate nearest neighbor search
  • L2 distance, inner product, and cosine distance
  • Any language with a PostgreSQL client

For more information, see official pgvector documentation.

Supported Versions

This extension is available to the latest minor versions of RDS for PostgreSQL 12 and later versions. You can run the following SQL statement to check whether your DB instance supports this extension:

SELECT * FROM pg_available_extension_versions WHERE name = 'vector';

If this extension is not supported, upgrade the minor version of your DB instance or upgrade the major version using dump and restore.

For details about the extensions supported by RDS for PostgreSQL, see Supported Extensions.

Extension Installation and Uninstallation

  • Installing the extension
    SELECT control_extension ('create', 'vector');
  • Deleting the extension
    SELECT control_extension ('drop', 'vector');

For more information, see Installing and Uninstalling an Extension on the RDS Console and Installing and Uninstalling an Extension Using SQL Commands.

Basic Functions

  • Creating a vector column with 3 dimensions
    CREATE TABLE items (id bigserial PRIMARY KEY, embedding vector(3));
  • Inserting vectors
    INSERT INTO items (embedding) VALUES ('[1,2,3]'), ('[4,5,6]');
  • Getting the nearest neighbors by L2 distance
    SELECT * FROM items ORDER BY embedding <-> '[3,1,2]';
  • Getting the nearest neighbors by cosine distance
    SELECT * FROM items ORDER BY embedding <=> '[3,1,2]';
  • Getting the nearest neighbors by inner product

    <#> returns the negative inner product since PostgreSQL only supports ASC order index scans on operators.

    SELECT * FROM items ORDER BY embedding <#> '[3,1,2]';

Advanced Functions

  • Getting the distance
    SELECT embedding <-> '[3,1,2]' AS distance FROM items;
    SELECT (embedding <#> '[3,1,2]') * -1 AS inner_product FROM items;
    SELECT 1 - (embedding <=> '[3,1,2]') AS cosine_similarity FROM items;
  • Averaging vectors
    SELECT AVG(embedding) FROM items;
  • Exact search providing perfect recall

    You can add an index to use approximate nearest neighbor search, which trades some recall for performance.

    CREATE INDEX ON items USING ivfflat (embedding vector_l2_ops) WITH (lists = 1);
    INSERT INTO items (embedding) VALUES ('[1,2,4]');
    SELECT * FROM items ORDER BY embedding <-> '[3,3,3]';