Troubleshooting SQL Statements That Have Been Executed for 3s or 5s

Description

If there are SQL statements that have been executed for 3s or 5s in a DB instance, the SQL statements are running slowly and cannot obtain results in a timely manner.

Cause Analysis

The possible causes are as follows:

A SQL statement is blocked and waits for the execution of other transactions.
The usage of system resources such as CPUs is high.
There are no indexes.
Sorting in a SQL statement slows down the I/O execution.

Troubleshooting

Blocked SQL statement

Find the process waiting for a lock.

SELECT locktype, relation, relation::regclass AS relname, virtualxid, transactionid, virtualtransaction, pid, mode, granted FROM pg_locks where granted = 'f';

Obtain the PID of this process from 1 and find the process that blocks this process (value of the pg_blocking_pids field in the query result).

select pg_blocking_pids(pid), array_length(pg_blocking_pids(pid), 1) blocking_num, * from pg_stat_activity  where pid = <pid from step 1> order by blocking_num;

Query the blocked SQL statement.

select * from pg_stat_activity where pid = <pid from step 2>;

Determine the blocked SQL statement, for example, update xxx;. View the PID of the blocked process through the pg_stat_activity view.
- pg_blocking_pids: ID of the blocking process
- wait_event_type: type of the wait event
```
select pg_blocking_pids(pid), array_length(pg_blocking_pids(pid), 1) blocking_num, *
from pg_stat_activity 
where query like '%update xxx%' 
and pid <> pg_backend_pid()
order by blocking_num desc NULLS LAST;
```
After finding the PID of the blocked SQL statement, you can run the SELECT pg_terminate_backend($PID); command to end the process.

High usage of system resources
If SQL statements are running slowly but no blocking is found, you can view monitoring metrics.

On the management console, check whether the CPU usage is always high in the last seven days.

Figure 1 Viewing CPU usage

If the CPU usage remains above 80%, check the disk read throughput and disk write throughput metrics to see whether read and write operations are performed continuously.

Figure 2 Viewing the disk throughput

Find the SQL statement that is running slowly and check whether its execution is also slow during off-peak hours. If it runs quickly during off-peak hours, the slow execution of the SQL statement is caused by high system resource usage.

Missing indexes

Locate the slow SQL statement and view the execution plan by running explain (ANALYZE, VERBOSE, BUFFERS).

xxx1:15:33.140 +08:10.51.11.21(53162):xxx:LOG:  duration: 22035.462 ms  plan:
    Query Text:  xxxxxxxxxxxxxx
    Nested Loop  (cost=1.07..892.50 rows=1 width=806)
      Join Filter: ((xxx))
      ->  Nested Loop  (cost=1.07..25.41 rows=1 width=565)
            ->  Nested Loop  (cost=0.66..16.97 rows=1 width=523)
                  ->  Index Scan using xxx  (cost=0.25..8.54 rows=1 width=318)
                        Filter: ((itype = 1) AND (dr = 0))
                  ->  Index Scan using xxxx  (cost=0.41..8.43 rows=1 width=247)
                        Index Cond: (xxx)
            ->  Index Scan using xxx  (cost=0.41..8.43 rows=1 width=84)
                  Index Cond: (xxx)
      ->  Seq Scan on xxx log2  (cost=0.00..867.00 rows=1 width=66)

The execution plan shows that a full table scan was performed on a table and took the most time.

Seq Scan on xxx log2

Confirm whether:

The table has indexes.
A large amount of data was added, deleted, modified, or queried during the execution of the SQL statement.

Re-create indexes and run the SQL statement again to check whether the problem persists.

Sorting used in a SQL statement

If the running SQL statement contains the GROUP BY operation, similar to the following:

select xx1,xx2,xx3,xx4,xx5,xx6
        from tbl_xxx
        GROUP BY xx1,xx2,xx3,xx4,xx5,xx6;

the execution can take a long time. You can view the execution time of the SQL statement in the execution plan by running explain (ANALYZE, VERBOSE, BUFFERS).

xxxx  (cost=439756.01..470149.19 rows=189957 width=28) (actual time=18072.697..20311.874 rows=323770 loops=1)
  Group Key: xxxx, xxxx
  Filter: (xxxxxx)
  Rows Removed by Filter: 192
  ->  Sort  (cost=439756.01..444504.94 rows=1899574 width=20) (actual time=18072.671..19960.595 rows=1834158 loops=1)
        Sort Key: xxxxxxxxxx
        Sort Method: external merge  Disk: 61056kB
        ->  Result  (cost=0.00..163739.61 rows=1899574 width=20) (actual time=0.009..927.709 rows=1834158 loops=1)
              ->  Append  (cost=0.00..144743.87 rows=1899574 width=20) (actual time=0.008..791.301 rows=1834158 loops=1)
                    ->  Seq Scan on xxxx  (cost=0.00..0.00 rows=1 width=212) (actual time=0.004..0.004 rows=0 loops=1)

Sort Method: external merge Disk: 61056kB indicates that the sorting operation occupies the disk I/O. In this case, you can run the following SQL statement to disable the sequential query and check the execution plan again:

set enable_seqscan = off;

If the sorting operation still uses the disk I/O, you can adjust the value of work_mem to increase the query memory.

Set work_mem to match your workloads. If the value is too large, an OOM problem may occur.

Solution

Blocked SQL statement
When you find the blocked SQL statement, run SELECT pg_terminate_backend($PID); to end the process.

After it is ended, perform 1 again to check whether there is any other lock conflict.
High usage of system resources
Optimize workloads to reduce the number of concurrent requests.
Missing indexes
Re-create indexes and use the indexes for query.
Sorting used in a SQL statement
Increase the value of work_mem to reduce the I/O execution time for sorting.