Updated on 2024-07-18 GMT+08:00

Instance Usage Suggestions

Database Connection

RDS for PostgreSQL uses a process architecture, providing a backend service process for each client connection.

  • Set max_connections depending on the type of your application. Use the parameter settings provided on pgtune as examples:
    • Set max_connections to 200 for web applications.
    • Set max_connections to 300 for OLTP applications.
    • Set max_connections to 40 for data warehouses.
    • Set max_connections to 20 for desktop applications.
    • Set max_connections to 100 for hybrid applications.
  • Limit the maximum number of connections allowed for a single user based on workload requirements.
    ALTER ROLE xxx CONNECTION LIMIT xxx;
  • Set the number of active connections to two to three times the number of vCPUs.
  • Avoid long transactions, which may block autovacuum and affect database performance.
  • Release persistent connections periodically because maintaining persistent connections may generate large cache and cause high memory consumption.
  • Check the application framework to prevent the application from automatically starting transactions without performing any operations.

Read Replicas

  • Avoid long transactions, which may cause query conflicts and affect playback.
  • Configure hot_standby_feedback for instances requiring real-time data and set max_standby_streaming_delay to a proper value.
  • Monitor long transactions, long connections, and replication delay and address all issues in a timely manner.
  • Ensure that applications connected to a read replica can be switched to other nodes because read replicas are single-node instances incapable of providing high availability.

Reliability and Availability

  • Select primary/standby DB instances for production databases.
  • Keep vCPU, memory, and storage usage less than 70% for production databases to prevent problems such as out of memory (OOM) and full storage.
  • Deploy primary and standby instances in different AZs to improve availability.
  • Set the time window for automated backup to off-peak hours. Do not disable full backup.
  • Configure asynchronous replication between primary and standby DB instances to prevent workloads on the primary instance from being blocked due to a fault on the standby instance.

Logical Replication

  • Keep the name of a logical replication slot less than 40 bytes to prevent full backup failures.
  • Delete replication slots that are no longer used for logical replication to prevent database bloat.
  • Replication slots will be lost after a primary/standby switchover is performed (due to an instance class change, a minor version upgrade, or a host failure). When this occurs, you need to create replication slots again.
  • Use failover slots for RDS for PostgreSQL 12.6 and later minor versions, and all minor versions of RDS for PostgreSQL 13 and 14 to prevent replication slot loss after a primary/standby switchover or instance reboot.
  • When using logical replication, avoid long transactions and commit discarded two-phase transactions in a timely manner to prevent stacked WAL logs from occupying too much storage space.
  • When using logical replication, avoid using a large number of sub-transactions (such as savepoints and exceptions in a transaction) to prevent high memory usage.
  • When using DRS to synchronize or migrate data, delete the logical replication slots contained in the databases that are rarely accessed or add heartbeat tables to periodically push the replication slots to prevent stacked WAL logs.

Database Age

  • Definition of database age:
    • Database age is a PostgreSQL-specific concept. It refers to the latest transaction ID minus oldest transaction ID in the database.
    • As defined in the Multi-Version Concurrency Control (MVCC) mechanism of RDS for PostgreSQL, the maximum age allowed for a database is 2 billion transactions old. When a database reaches the maximum age, it will be forcibly shut down. In this case, contact technical support to vacuum the database.
    • To view the age of a database, run the following SQL statement:

      select datname, age(datfrozenxid) from pg_database;

  • You are advised to use the db_max_age metric to monitor the database age and set the alarm threshold to 1 billion.

Stability

  • Commit or roll back two-phase transactions in a timely manner to prevent database bloat.
  • Change the table structure, for example, adding fields or indexes, during off-peak hours.
  • To create indexes during peak hours, use the CONCURRENTLY syntax to avoid blocking the DML of the table.
  • Before modifying the structure of a table during peak hours, perform a verification test to prevent the table from being rewritten.
  • Configure a lock wait timeout duration for DDL operations to avoid blocking operations on related tables.
  • Partition your database if its capacity exceeds 2 TB.
  • If a frequently accessed table contains more than 20 million records or its size exceeds 10 GB, split the table or create partitions.
  • Ensure that the number of tables in a single instance do not exceed 20,000, and the number of tables in a single database do not exceed 4,000.
  • To prevent replication exceptions on the standby instance or read replicas, control the data write speed of the primary instance under 50 MB/s. That's because the standby instance or read replica replays WAL logs in a single process at a maximum speed of 50 MB/s to 70 MB/s.

Routine O&M

  • Periodically download and view slow query logs on the Logs page to identify and resolve performance issues in a timely manner.
  • Periodically check the resource usage of your database. If the resources are insufficient, scale up your instance specifications in a timely manner.
  • Run the SELECT statement before deleting or modifying a record.
  • After a large amount of data is deleted or updated in a table, run VACUUM on the table.
  • Note the number of available replication slots and ensure that at least one replication slot is available for database backup.
  • Remove any replication slots that are no longer used to prevent the replication slots from blocking log reclaiming.
  • Do not use unlogged tables because data in these tables will be lost after a database exception (such as OOM or underlying faults) or primary/standby switchover.
  • Do not run VACUUM FULL on system catalogs. If necessary, run VACUUM. Running VACUUM FULL on system catalogs causes the instance to reboot and the instance cannot be connected for a long time.

Security

  • Avoid enabling access to your database from the Internet. If you do need to enable Internet access, bind an EIP to your DB instance and configure a whitelist.
  • Use SSL to connect to your DB instance.