Help Center/ Data Admin Service/ Best Practices/ Fixing High CPU Usage on DAS
Updated on 2024-12-30 GMT+08:00

Fixing High CPU Usage on DAS

High CPU usage is caused by the following reasons:

  • Slow SQL queries

    Slow SQL queries are inefficient because a large amount of data causes high I/O usage and low QPS. You can view the CPU usage, QPS, and row read rate on the DAS performance page. To solve the issues, you can kill sessions or optimize indexes through SQL diagnosis.

  • High-concurrent connections

    When QPS of a DB instance increases, it has to process a large number of concurrent operations at the same time, and the CPU usage increases accordingly. On the DAS performance page, you can view metrics such as QPS, number of active connections, and CPU usage. To solve the issues, you can use functions such as SQL statement concurrency control and auto flow control or kill sessions. After core services are restored, you can evaluate whether the issues were caused by service anomaly and adjust the services if necessary. You are advised to upgrade instance specifications if they cannot meet increasing service demands.

Solution (MySQL as an Example)

If a Huawei Cloud database pushes an alarm indicating that CPU usage of the DB instance increases sharply, the SRE will:

  1. Log in to the DAS console.
  2. Click in the upper left corner and select a region and project.
  3. Click in the upper left corner. Choose Databases > Data Admin Service.
  4. In the navigation pane, choose Intelligent O&M > Instance List.

    Alternatively, on the Overview page, click Go to Intelligent O&M.

  5. In the upper right corner of the instance list, filter instances by engine, name, or IP address. Click Details to go to the Dashboard tab page.

    Figure 1 Intelligent O&M

  6. Click the Performance tab.

    Figure 2 Performance monitoring

  7. Check key metrics, such as current active connections, total connections, QPS, slow query logs, and CPU usage.

    Figure 3 Current Active Connections
    Figure 4 Total Connections
    Figure 5 QPS
    Figure 6 Slow Query Logs
    Figure 7 CPU Usage

  8. Verify the CPU usage increases sharply due to a sudden surge of slow query logs and high-concurrent connections.
  9. Click the SQL tab and then Slow Query Logs to view details.

    Figure 8 Slow Query Logs

    A large number of same slow query logs are sent in a short period of time. Executing a single SQL statement takes about 1 minute. The SRE and service department decide to restrict this type of SQL statement and restore other core services of the database.

  10. Click the SQL Statement Concurrency Control tab and enable Concurrency Control.
  11. Create an SQL concurrency control rule, select SELECT for SQL Type and Using SQL statements for Keyword Generation, enter the original SQL statement, click Generate Keyword, set Max. Concurrency to 1, and kill the existing sessions that meet the rule.

    Figure 9 SQL type
    Figure 10 Selecting a SQL statement concurrency control rule

    The SQL statement concurrency control rule has taken effect and the SQL statements that match the rule are intercepted.

  12. Click the Performance tab and check the database performance metrics. The CPU usage starts to decrease and the service department reports that core services have recovered.

    Figure 11 Performance metrics

    After the SQL statement concurrency control takes effect, no same slow query is logged.

    Figure 12 Slow queries over time
    Figure 13 Performance metrics