Troubleshooting Abnormal Connections and Active Connections

Description

When the number of database connections reaches the upper limit allowed for a DB instance, subsequent connections will be rejected. Abnormal changes in the number of connections and the number of active connections can indicate workload changes and database status to some extent.

RDS for PostgreSQL provides two related metrics:

Database connections: number of backend connections to a DB instance.
Active connections: number of active connections to a DB instance.

Cause Analysis

Connection pool parameter modification and workload change are considered as normal changes.

If you cannot determine whether there are normal changes or abruptly high concurrency, check database connection information.

The number of database connections suddenly decreases and then is restored to its normal range.
Possible cause: Some connections are interrupted, or the DB instance reboots unexpectedly due to an OOM problem or crash.
The number of database connections increases suddenly or reaches the maximum.
Possible cause: The number of new connections is greater than that of closed connections per unit time.
The concurrency is reduced and connections are not released in a timely manner if any of the following issues occurs:
- There are slow SQL statements.
- There are lock conflicts.
- There are long-running transactions.

Troubleshooting

Query database connection information.

You can sort out the database connection information using the information combination in the pg_stat_activity view.

The following shows an example.

-- Database connection information is filtered by database name, username, client IP address, and status and then sorted by the number of client connections in descending order.
SELECT datname, usename, client_addr, state, count(*) AS client_number 
FROM pg_stat_activity 
WHERE state <> 'idle' 
GROUP BY datname, usename, client_addr, state 
ORDER BY client_number DESC;

The preceding query result shows where the most connections come from. In this way, you can identify workload changes and high concurrency.

Check whether the maximum number of connections has been reached.
If your instance cannot be connected and the following error logs are generated, the number of database connections reaches the upper limit.
```
FATAL: remaining connection slots are reserved for non-replication superuser connections.
FATAL: sorry, too many clients already.
```
Check for any abnormal reboot.
1. Check the memory usage metric for any abnormal changes in memory usage.
2. Download error logs of the corresponding time period by referring to Viewing and Downloading Error Logs.
3. Use the keyword killed or the database system is in recovery mode to determine the time when the reboot occurred.
Slow SQL statements
In most cases, slow SQL statements are accompanied by high CPU usage. You can locate slow SQL statements by referring to Troubleshooting High CPU Usage.

Lock conflicts

Query the lock status of the current database connection using the pg_stat_activity view and the pg_blocking_pids function.

-- Query the lock statuses of the earliest five PIDs (client connections) for the current transaction.
SELECT pg_blocking_pids(pid), array_length(pg_blocking_pids(pid), 1) blocking_num, * 
FROM pg_stat_activity 
WHERE pid IN (select pid FROM pg_stat_activity WHERE state <> 'idel' 
AND xact_start IS NOT NULL ORDER BY xact_start DESC LIMIT 5) 
AND pid <> pg_backend_pid() 
ORDER BY blocking_num DESC NULLS LAST;

Based on the preceding query result, determine whether there are many lock conflicts that prevent the connections from being released.

Long-running transactions
Locate long-running transactions by referring to Troubleshooting Long-Running Transactions.

Solution

Normal workload change or increased concurrency
Upgrade the DB instance specifications.
Excessive database connections
Temporary solution:
1. Run SQL statements to release idle connections as user root.
  For example, to query the idle connection of the user user, run the following SQL statement to obtain the process ID:
  
  select * from pg_stat_activity where state = 'idle' and usename = 'user';
  
  Release the idle connection.
  
  select pg_terminate_backend(pid);
2. Increase the value of max_connections and reboot your instance for the new value to be applied.
Permanent solution:
1. Reduce database connections.
2. If database connections cannot be reduced, upgrade the instance specifications.
OOM exception or crash
If the memory usage is high for a long time, upgrade the instance specifications or optimize the workloads to reduce the resident memory usage. If the DB instance reboots due to SQL statements, optimize the SQL statements.
Slow SQL statements
Locate the slow SQL statements and optimize them.
Lock conflicts
Check whether applications can be disconnected. If yes, use the pg_cancel_backend function to disconnect the applications from the database.
Long-running transactions
Handle long-running transactions by referring to Troubleshooting Long-Running Transactions.