Updated on 2024-05-13 GMT+08:00

Design Rules

Naming

  • The name of a database object (database name, table name, field name, or index name) has to start with a lowercase letter and must be followed by a letter or digit. The length of the name cannot exceed 32 bytes.
  • The database name cannot contain special characters ("".$\/*?~#:|") or null character (\0). The database name cannot be a system database name, such as admin, local, and config.
  • The database collection name can only contain letters and underscores (_). The name cannot be prefixed with "system". The total length of <Database name>.<Collection name> cannot exceed 120 characters.

Index

You can use indexes to avoid full table scans and improve query performance.

  • A column index can have up to 512 bytes, an index name can have up to 64 characters, and a composite index can have up to16 columns.
  • The total length of <Database name>.<Collection name>.$<Index name> cannot exceed 128 characters.
  • Create indexes for fields with high selectivity. If you create indexes for low selective fields, large result sets may be returned. This should be avoided.
  • Write operations on a collection will trigger more I/O operations on indexes in the collection. Ensure that the number of indexes in a collection does not exceed 32.
  • Do not create indexes that will not be used. Unused indexes loaded to the memory will cause a waste of memory. In addition, useless indexes generated due to changes in service logic must be deleted in a timely manner.
  • Indexes must be created in the background instead of foreground.
  • An index must be created for the sort key. If a composite index is created, the column sequence of the index must be the same as that of the sort key. Otherwise, the index will not be used.
  • Do not create an index based on the leading-edge column of a composite index. If the leading-edge column of a composite index is the column used in another index, the smaller index can be removed. For example, a composite index based on "firstname" and "lastname" can be used for queries on "firstname". In this case, creating another firstname-based index is unnecessary.

Sharding

You can shard collections to maximize the cluster performance. For details, see, Sharding a Collection.

Suggestions for sharding collections:

  • In scenarios where the data volume is large (more than one million rows) and the write/read ratio is high, sharding is recommended if the data volume increases with the service volume.
  • If you shard a collection using a hashed shard key, pre-splitting the chunks of the sharded collection can help reduce the impact of automatic balancing and splitting on service running.
  • If sharding is enabled for a non-empty collections, the time window for enabling the balancer must be set during off-peak hours. Otherwise conflicts may occur during data balancing between shards and service performance will be affected.
  • If you want to perform a sort query based on the shard key and new data is evenly distributed based on the shard key, you can use ranged sharding. In other scenarios, you can use hashed sharding.
  • Properly design shard keys to prevent a large amount of data from using the same shard key, which may lead to jumbo chunks.
  • If a sharded cluster is used, you must run flushRouterConfig after running dropDatabase. For details, see How Do I Prevent dds mongos Cache Problem?
  • The update request of the service must match the shard key. When a sharded table is used, an error will be reported for the update request and "An upsert on a sharded collection must contain the shard key and have the simple collation" will be returned in the following scenarios:
    • The filter field of the update request does not contain the shard key field and the value of multi is false.
    • The set field does not contain the shard key and the value of upsert is true.