Help Center> Document Database Service> User Guide> Performance Tuning> Load Imbalance of Cluster Instances
Updated on 2022-10-27 GMT+08:00

Load Imbalance of Cluster Instances

It is common that load is imbalanced between shard nodes in a cluster instance. If the shard key is incorrectly selected, no chunk is preset, and the load balancing speed between shard nodes is lower than the data insertion speed, load imbalance may occur.

This section describes how to fix load imbalance.

Fault Locating

  1. Connect to a database from the client.
  2. Run the following command to check the shard information:

    sh.status()

    mongos> sh.status() 
     \--- Sharding Status --- 
       sharding version: { 
             "_id" : 1, 
             "minCompatibleVersion" : 5, 
             "currentVersion" : 6, 
             "clusterId" : ObjectId("60f9d67ad4876dd0fe01af84") 
       } 
       shards: 
             {  "_id" : "shard_1",  "host" : "shard_1/172.16.51.249:8637,172.16.63.156:8637",  "state" : 1 } 
             {  "_id" : "shard_2",  "host" : "shard_2/172.16.12.98:8637,172.16.53.36:8637",  "state" : 1 } 
       active mongoses: 
             "4.0.3" : 2 
       autosplit: 
             Currently enabled: yes 
       balancer: 
             Currently enabled:  yes 
             Currently running:  yes 
             Collections with active migrations: 
                     test.coll started at Wed Jul 28 2021 11:40:41 GMT+0000 (UTC) 
             Failed balancer rounds in last 5 attempts:  0 
             Migration Results for the last 24 hours: 
                     300 : Success 
       databases: 
             {  "_id" : "test",  "primary" : "shard_2",  "partitioned" : true,  "version" : {  "uuid" : UUID("d612d134-a499-4428-ab21-b53e8f866f67"),  "lastMod" : 1 } } 
                     test.coll 
                             shard key: { "_id" : "hashed" } 
                             unique: false 
                             balancing: true 
                             chunks: 
                                     shard_1 20 
                                     shard_2 20
    • databases lists databases for which you enable enableSharding.
    • test.coll is the collection namespace. test indicates the name of the database where the collection is located, and coll indicates the name of the collection for which sharding is enabled.
    • shard key is the shard key of the previous collection. _id: indicates that the shard is hashed based on _id. _id: -1 indicates that the shard is sharded based on the range of _id.
    • chunks indicates the distribution of shards.

  3. Analyze the shard information based on the query result in 2.

    1. If no shard information is queried, the collections are not sharded.

      Run the following command to enable sharding:

      mongos> sh.enableSharding("<database>") 
      mongos> use admin 
      mongos> db.runCommand({shardcollection:"<database>.<collection>",key:{"keyname":<value> }})
    2. If an improper shard key is selected, the load may be imbalanced. For example, if a large number of requests are processed on a range of shards, the load on these shards is heavier than other shards, causing load imbalance.

      You can redesign the shard key, for example, changing ranged sharding to hashed sharding.

      mongos> db.runCommand({shardcollection:"<database>.<collection>",key:{"keyname":<value> }})
    3. If a large amount of data is inserted and the data volume exceeds the load capacity of a single shard, shard imbalance occurs and the storage usage of the primary shard is too high.

      You can run the following command to check the network connection of the server and ceck whether the amount of data transmitted by each network adapter reaches the upper limit.

      sar -n DEV 1  //1 is the interval.
      Average: IFACE  rxpck/s  txpck/s    rxkB/s    txkB/s rxcmp/s txcmp/s rxmcst/s %ifutil 
      Average:    lo  1926.94  1926.94  25573.92  25573.92    0.00    0.00     0.00    0.00 
      Average:  A1-0     0.00     0.00      0.00      0.00    0.00    0.00     0.00    0.00 
      Average:  A1-1     0.00     0.00      0.00      0.00    0.00    0.00     0.00    0.00 
      Average:  NIC0     5.17     1.48      0.44      0.92    0.00    0.00     0.00    0.00 
      Average:  NIC1     0.00     0.00      0.00      0.00    0.00    0.00     0.00    0.00 
      Average:  A0-0  8173.06 92420.66  97102.22 133305.09    0.00    0.00     0.00    0.00 
      Average:  A0-1 11431.37  9373.06 156950.45    494.40    0.00    0.00     0.00    0.00 
      Average:  B3-0     0.00     0.00      0.00      0.00    0.00    0.00     0.00    0.00 
      Average:  B3-1     0.00     0.00      0.00      0.00    0.00    0.00     0.00    0.00
      • rxkB/s is the number of KBs received per second.
      • txkB/s is the number of KBs sent per second.

      After the check is complete, press Ctrl+Z to exit.

      If the network load is too high, analyze MQL statements, optimize the roadmap, reduce bandwidth consumption, and increase specifications to expand network throughput.

      • Check whether there are sharded collections that do not carry ShardKey. In this case, requests are broadcast, which increases the bandwidth consumption.
      • Control the number of concurrent threads on the client to reduce the network bandwidth traffic.
      • If the problem persists, increase instance specifications in a timely manner. High-specification nodes can provide higher network throughput.