Help Center> Document Database Service> Best Practices> Solving the High CPU Usage Problem

Solving the High CPU Usage Problem

If the CPU usage is high or close to 100% when you use DDS, data read and write will be slow, affecting your services.

This section helps you analyze the requests being processed and the slow queries of the database. After the analysis and optimization, the queries will become reasonable and use indexes efficiently.

The CPU resources are often used up in the scenario that the capabilities of the instance may reach the upper limit. You can determine whether or not to change the specifications of the instance or add the number of shards.

Analyzing Requests Being Processed

  1. Connect to a DB instance through Mongo Shell.

    For details, see section "Connecting to a DB Instance Through a Public Network" and "Connecting to a DB Instance Through a Private Network" in Document Database Service Getting Started.

  2. Run the following command to view the current operations on the database:

    db.currentOp()

    Command output:

    {
            "raw" : {
                    "shard0001" : {
                            "inprog" : [
                                    {
                                            "desc" : "StatisticsCollector",
                                            "threadId" : "140323686905600",
                                            "active" : true,
                                            "opid" : 9037713,
                                            "op" : "none",
                                            "ns" : "",
                                            "query" : {
     
                                            },
                                            "numYields" : 0,
                                            "locks" : {
     
                                            },
                                            "waitingForLock" : false,
                                            "lockStats" : {
     
                                            }
                                    },
                                    {
                                            "desc" : "conn2607",
                                            "threadId" : "140323415066368",
                                            "connectionId" : 2607,
                                            "client" : "172.16.36.87:37804",
                                            "appName" : "MongoDB Shell",
                                            "active" : true,
                                            "opid" : 9039588,
                                            "secs_running" : 0,
                                            "microsecs_running" : NumberLong(63),
                                            "op" : "command",
                                            "ns" : "admin.",
                                            "query" : {
                                                    "currentOp" : 1
                                       },
                                            "numYields" : 0,
                                            "locks" : {
     
                                            },
                                            "waitingForLock" : false,
                                            "lockStats" : {
     
                                            }
                                    }
                            ],
                            "ok" : 1
                    },
        ...
    }
    • client: IP address of the client that sends the request
    • opid: unique operation ID
    • secs_running: elapsed time for execution, in seconds If the returned value of this field is too large, check whether the request is reasonable.
    • microsecs_running: elapsed time for execution, in seconds If the returned value of this field is too large, check whether the request is reasonable.
    • op: operation type The operations can be query, insert, update, delete, or command.
    • ns: operation target collection
    • For details, see the db.currentOp() command in official document.
  3. Based on the command output, check whether there are requests that take a long time to process.
    • If the CPU usage is low during routine operation and becomes high due to a certain operation, you need to pay attention to the requests that take a long time to execute.
    • If an abnormal request is found, you can find the opid corresponding to the request and run the db.killOp(opid) command to terminate the request.

Analyzing Slow Queries

On DDS, slow request profiling is enabled by default. The system automatically records the execution of queries that exceed 100 ms to the system.profile collection in the corresponding database.

  1. Connect to a DB instance through Mongo Shell.

    For details, see section "Connecting to a DB Instance Through a Public Network" and "Connecting to a DB Instance Through a Private Network" in Document Database Service Getting Started.

  2. Run the following command to enter the specified database (using the test database as an example):

    use test

  3. Check whether the slow SQL queries are generated in system.profile.

    show collections;

    • If the command output contains system.profile, slow SQL queries are generated. Go to the next step.
      mongos> show collections
      system.profile
      test
    • If the command output does not contain system.profile, no slow SQL query is generated. The database does not require slow query analysis.
      mongos> show collections
      test
  4. Check the slow query logs in the database.

    db.system.profile.find().pretty()

  5. Analyze slow query logs to find the cause of the high CPU usage.

    The following is an example of a slow query log. The log shows the request that scans the entire table and 1561632 files without using indexes.

    {
            "op" : "query",
            "ns" : "taiyiDatabase.taiyiTables$10002e",
            "query" : {
                    "find" : "taiyiTables",
                    "filter" : {
                            "filed19" : NumberLong("852605039766")
                    },
                    "shardVersion" : [
                            Timestamp(1, 1048673),
                            ObjectId("5da43185267ad9c374a72fd5")
                    ],
                    "chunkId" : "10002e"
            },
            "keysExamined" : 0,
            "docsExamined" : 1561632,
            "cursorExhausted" : true,
            "numYield" : 12335,
            "locks" : {
                    "Global" : {
                            "acquireCount" : {
                                    "r" : NumberLong(24672)
                            }
                    },
                    "Database" : {
                            "acquireCount" : {
                                    "r" : NumberLong(12336)
                            }
                    },
                    "Collection" : {
                            "acquireCount" : {
                                    "r" : NumberLong(12336)
                            }
                    }
            },
            "nreturned" : 0,
            "responseLength" : 157,
            "protocol" : "op_command",
            "millis" : 44480,
            "planSummary" : "COLLSCAN",
            "execStats" : {
                  "stage" : "SHARDING_FILTER",                                                                                                                                       [3/1955]
                    "nReturned" : 0,
                    "executionTimeMillisEstimate" : 43701,
                    "works" : 1561634,
                    "advanced" : 0,
                    "needTime" : 1561633,
                    "needYield" : 0,
                    "saveState" : 12335,
                    "restoreState" : 12335,
                    "isEOF" : 1,
                    "invalidates" : 0,
                    "chunkSkips" : 0,
                    "inputStage" : {
                            "stage" : "COLLSCAN",
                            "filter" : {
                                    "filed19" : {
                                            "$eq" : NumberLong("852605039766")
                                    }
                            },
                            "nReturned" : 0,
                            "executionTimeMillisEstimate" : 43590,
                            "works" : 1561634,
                            "advanced" : 0,
                            "needTime" : 1561633,
                            "needYield" : 0,
                            "saveState" : 12335,
                            "restoreState" : 12335,
                            "isEOF" : 1,
                            "invalidates" : 0,
                            "direction" : "forward",
                            "docsExamined" : 1561632
                    }
            },
            "ts" : ISODate("2019-10-14T10:49:52.780Z"),
            "client" : "172.16.36.87",
            "appName" : "MongoDB Shell",
            "allUsers" : [
                    {
                            "user" : "__system",
                            "db" : "local"
                    }
            ],
           "user" : "__system@local"
    }

    Pay attention to the following keywords in slow query logs:

    • Full collection (full table) scan: COLLSCAN

      When a request (such as query, update, and delete) requires a full table scan, a large amount of CPU resources are occupied. If the slow query log contains the keyword COLLSCAN, the queries may occupy CPU resources.

      If such requests are frequent, you are advised to create indexes for the fields to be queried.

    • Full collection (full table) scan: docsExamined

      You can view the value of docsExamined to check the number of scanned files. A larger value indicates a higher CPU usage.

    • Inappropriate index: IXSCAN and keysExamined

      An excessive number of indexes can affect the write and update performance.

      If your application has more write operations, creating indexes may increase write latency.

      You can view the value of keyExamined to check the number of indexes that are scanned in a query. A larger value indicates a higher CPU usage.

      If the index is not proper or there are many matching results, the CPU usage may not stay low and the execution is slow.

      Example: For the data of a collection, the number of values of the a field is small (only 1 and 2), but the b field has many values.

      {a: 1, b: 1}
      {a: 1, b: 2}
      {a: 1, b: 3}
      ......
      { a: 1, b: 100000}
      {a: 2, b: 1}
      {a: 2, b: 2}
      {a: 2, b: 3}
      ......
      {a: 1, y: 100000}

      The following shows how to implement the {a: 1, b: 2} query.

      db.createIndex({a: 1}): The query is not effective because the a field has too many same values.
      db.createIndex({a: 1, b: 1}): The query is not effective because the a field has too many same values.
      db.createIndex({b: 1}): The query is effective because the b field has a few same values.
      db.createIndex({b: 1, a: 1}): The query is not effective because the a field has a few same values.

      For the differences between {a: 1} and {b: 1, a: 1}, see the official documents.

    • Sorting a large amount of data: SORT and hasSortStage

      If the query request involves sorting, the value of the hasSortStage parameter in the system.profile collection is true. If the sorting cannot be implemented through indexes, the query results are sorted. Sorting occupies a large amount of CPU resources. In this scenario, you need to create indexes for fields that are frequently sorted.

      If the system.profile collection contains SORT, you can use indexes to improve the sorting.

    Other operations, such as index creation and aggregation (combination of traversal, query, update, and sorting), also apply to the above mentioned scenarios because they also occupy a large amount of CPU resources. For more information about profiling, see official documents.