Updated on 2025-09-04 GMT+08:00

User Preference Recommendation Systems

Scenarios

Recommendation systems are widely used on Internet platforms and in traditional industries. They analyze user behavior, preferences, and context information to provide personalized content recommendations. Common scenarios include:

  • E-commerce and retail: Products are recommended based on users' browsing and purchase history (for example, "You May Like").
  • Streaming media and content platforms: Content (such as movies and short videos) is recommended based on watching records and ratings.
  • Travel services: Destinations are recommended based on user preferences.
  • App recommendation: Apps are recommended based on users' app download and usage habits.

This section uses movies as an example to describe how to design a recommendation system database.

Prerequisites

You have installed the pg_trgm extension.

Design and Implementation

The pg_trgm extension of PostgreSQL provides string similarity calculation based on trigrams. You can use this extension to build a simple user recommendation system.

  1. Connect to your RDS for PostgreSQL instance.
  2. Create a test table on the instance.
    CREATE TABLE movies (
        id INT,                     -- Movie ID
        title VARCHAR(255),         -- Movie name
        description TEXT,           -- Movie description
        genres VARCHAR(255)[],      -- Movie genre
        year INTEGER                -- Movie year
    );
  3. Search for movies whose titles are similar to a specified movie title.
    -- Create a GIN index to accelerate similarity queries.
    CREATE INDEX movies_title_gin_idx ON movies USING gin(title gin_trgm_ops);
    
    SELECT 
        m2.id, 
        m2.title, 
        similarity(m1.title, m2.title) AS similarity_score
    FROM 
        movies m1, 
        movies m2
    WHERE 
        m1.id = 123 AND                       -- Target movie ID
        m1.id != m2.id AND
        similarity(m1.title, m2.title) > 0.3  -- Similarity threshold
    ORDER BY 
        similarity_score DESC
    LIMIT 10;
  4. Search for movies based on the similarity of movie descriptions.
    -- Create a GIN index for descriptions.
    CREATE INDEX movies_description_gin_idx ON movies USING gin(description gin_trgm_ops);
    
    SELECT 
        m2.id, 
        m2.title, 
        similarity(m1.description, m2.description) AS similarity_score
    FROM 
        movies m1, 
        movies m2
    WHERE 
        m1.id = 123 AND
        m1.id != m2.id AND
        similarity(m1.description, m2.description) > 0.1  -- The description similarity threshold can be set to a small value.
    ORDER BY 
        similarity_score DESC
    LIMIT 10;
  5. Search for movies based on multiple features such as the title, description, and genre.
    SELECT 
        m2.id, 
        m2.title,
        (
            0.5 * similarity(m1.title, m2.title) + 
            0.3 * similarity(m1.description, m2.description) +
            0.2 * (SELECT COUNT(*) FROM unnest(m1.genres) AS g1 
                    JOIN unnest(m2.genres) AS g2 ON g1 = g2)::FLOAT / 
                  GREATEST(array_length(m1.genres, 1), array_length(m2.genres, 1))
        ) AS combined_similarity
    FROM 
        movies m1, 
        movies m2
    WHERE 
        m1.id = 123 AND
        m1.id != m2.id
    ORDER BY 
        combined_similarity DESC
    LIMIT 10;
  6. Limit the search scope based on the user configuration. For example, run the following SQL statements to search for only movies of the same type:
    SELECT m2.id, m2.title, similarity(m1.title, m2.title) AS similarity_score
    FROM movies m1, movies m2
    WHERE m1.id = 123 AND
          m1.id != m2.id AND
          m1.genres && m2.genres AND  -- Must have at least one same genre.
          similarity(m1.title, m2.title) > 0.3
    ORDER BY similarity_score DESC
    LIMIT 10;