Updated on 2024-04-11 GMT+08:00

Using pgvector

Introduction

RDS for PostgreSQL supports the pgvector plugin, which allows for vector data type and vector similarity search. This plugin supports:

  • Exact and approximate nearest neighbor search
  • L2 distance, inner product, and cosine distance
  • Any language with a PostgreSQL client

For more information, see official pgvector documentation.

Supported Versions

This plugin is available to the latest minor versions of RDS for PostgreSQL 12 and later versions. You can run the following SQL statement to check whether your DB instance supports this plugin:

SELECT * FROM pg_available_extension_versions WHERE name = 'vector';

If this plugin is not supported, upgrade the minor version of your DB instance or upgrade the major version using dump and restore.

For details about the plugins supported by RDS for PostgreSQL, see Supported Plugins.

Plugin Installation and Uninstallation

  • Installing the plugin
    SELECT control_extension ('create', 'vector');
  • Deleting the plugin
    SELECT control_extension ('drop', 'vector');

For more information, see Installing and Uninstalling a Plugin on the RDS Console and Installing and Uninstalling a Plugin Using SQL Commands.

Basic Usage

  • Creating a vector column with 3 dimensions
    CREATE TABLE items (id bigserial PRIMARY KEY, embedding vector(3));
  • Inserting vectors
    INSERT INTO items (embedding) VALUES ('[1,2,3]'), ('[4,5,6]');
  • Getting the nearest neighbors by L2 distance
    SELECT * FROM items ORDER BY embedding <-> '[3,1,2]';
  • Getting the nearest neighbors by cosine distance
    SELECT * FROM items ORDER BY embedding <=> '[3,1,2]';
  • Getting the nearest neighbors by inner product

    <#> returns the negative inner product since PostgreSQL only supports ASC order index scans on operators.

    SELECT * FROM items ORDER BY embedding <#> '[3,1,2]';

Advanced Usage

  • Getting the distance
    SELECT embedding <-> '[3,1,2]' AS distance FROM items;
    SELECT (embedding <#> '[3,1,2]') * -1 AS inner_product FROM items;
    SELECT 1 - (embedding <=> '[3,1,2]') AS cosine_similarity FROM items;
  • Averaging vectors
    SELECT AVG(embedding) FROM items;
  • Exact search providing perfect recall

    You can add an index to use approximate nearest neighbor search, which trades some recall for performance.

    CREATE INDEX ON items USING ivfflat (embedding vector_l2_ops) WITH (lists = 1);
    INSERT INTO items (embedding) VALUES ('[1,2,4]');
    SELECT * FROM items ORDER BY embedding <-> '[3,3,3]';