Updated on 2024-12-30 GMT+08:00

High CPU Usage

Scenario

The SQL statement execution of a TaurusDB instance slowed down at about 16:08, and a timeout error was reported.

Troubleshooting

  1. Check the CPU usage. In this example, the CPU usage of the instance soared to 100% at about 16:08 and remained at the high line.
    Figure 1 CPU usage
  2. Check the QPS, slow SQL queries, and active connections. The QPS and active connections increased sharply at about 16:08 and a large number of slow SQL queries were generated.
    Figure 2 QPS
    Figure 3 Active connections
    Figure 4 Slow SQL queries
  3. Check the InnoDB logical read rate. The logical read rate of InnoDB also increased sharply around 16:08, and the pattern was similar to that for the slow SQL queries.
    Figure 5 InnoDB logical read rate
  4. Log in to the instance and check real-time sessions. There were a large number of sessions executing SELECT COUNT(*).

    Run EXPLAIN to check the execution plan of the SQL statement. It was found that the SQL statement was not indexed and the entire table was scanned with rows to be scanned reaching over 350,000.

  5. Check the table structure. Only the IDX_XX_USERID index was added for the is_deleted field. Therefore, no index was available for the preceding query. After an index was added for the idx_user_id field, the CPU usage of the instance decreased to a normal level at about 16:37 and services were recovered.

Solution

  1. Before deploying new workloads, use EXPLAIN and SQL diagnosis tools to analyze the execution plans of key SQL statements and add indexes based on the optimization suggestions to avoid full table scanning.
  2. If the high CPU usage is caused by a large number of concurrent requests, upgrade the DB instance specifications or use exclusive resources to avoid CPU contention, or create read replicas to reduce the read pressure of the primary instance.
  3. Use show processlist to view the current session information to locate the fault. Any session whose status is Sending data, Copying to tmp table, Copying to tmp table on disk, Sorting result, or Using filesort may have performance problems.
  4. In emergency scenarios, enable SQL statement concurrency control or kill sessions to temporarily limit the number of slow SQL queries.