Oplog Size Planning and Optimization

Scenarios

This section provides guidance on properly configuring DDS oplog parameters to mitigate risks such as primary/secondary replication exceptions or limitations in point-in-time recovery. Correct oplog configuration helps ensure system stability and reliable data restoration.

Oplog Basics

The operations log (oplog) is the basis of replica set replication. In a cluster instance, shards and config are essentially replica sets and also rely on their own oplog to record and propagate data changes. The oplog is a special capped collection, functioning as a ring buffer, that continuously records all operations that modify data in your database. When the oplog's space is full, the earliest entries are overwritten by new ones. The primary node executes write operations and then records them in its own oplog. Secondary nodes asynchronously pull these oplog entries and replay them locally to maintain data consistency. If a write operation does not actually modify any data or fails, it typically does not generate an oplog entry.
In a newly created instance, the oplog occupies 10% of the storage space by default. When you scale up the storage, if the original oplog size is less than 100 GB, it is resized (the new size is the smaller of either the storage space multiplied by oplogSizePercent or 100 GB). If the original oplog size is greater than or equal to 100 GB, the size remains unchanged.
Oplog operations are stored in the local.oplog.rs collection and they are idempotent. That is, an oplog operation produces the same result regardless of whether it is applied once or multiple times.
The oplog grows at a rate roughly proportional to the system's speed in processing write requests. If a single operation affects multiple documents, multiple corresponding oplog entries are generated.
The oplog window refers to the time interval between the oldest and newest entries in the oplog. Primary/secondary replication depends on this time window. A secondary node can synchronize properly only if the oplog window of its source contains the required oplog entries.

Checking the Oplog Size

Use either of the following methods to check the oplog size.

Use storage analysis to view the storage space distribution and the oplog size.
Figure 1 Viewing storage space distribution
Connect to the database to check the oplog size.
- Connect to the database. For details, see Connecting to a DDS Instance.
- Run the following command to check the oplog size and window:
```
rs.printReplicationInfo()
```
  In the following command output, configured oplog size indicates the total size allocated for the oplog and log length start to end indicates the oplog window.
  Figure 2 Command output
- Run the following command to query the size of storage actually occupied by the oplog (default unit: bytes):
```
db.getSiblingDB("local").oplog.rs.stats().storageSize
```

For DDS cluster instances, the oplog size of shards cannot be queried through the dds mongos. You are advised to use storage analysis or directly connect to a shard to check its oplog size (following the instructions in 2). For details about how to obtain the shard IP address, see Enabling IP Addresses of Shard and Config Nodes.

Oplog Consumption Analysis

Plan the oplog size for a replica set based on workload patterns and system monitoring data. The following table lists the workload patterns that may require a larger oplog size.

**Table 1** Workload patterns that may require a larger oplog size
Workload Pattern	Description
Batch updates	A single batch operation is broken down into multiple document-level updates, causing oplog entries to grow rapidly. For example, updating 1,000 documents generates 1,000 oplog entries, which significantly accelerates oplog space consumption.
Frequent alternating insertions and deletions	Insertions and deletions are individually recorded in the oplog even if the data volume does not increase. For example, 1,000 "insert-delete" cycles per second generate 2,000 oplog entries, causing constant write pressure.
Frequent in-place updates	Frequent updates (such as status field changes) to the same document do not increase storage usage, but each update generates a separate oplog entry. For example, 100 updates per second to a single document will result in a 100-fold increase in the oplog write rate.
High-concurrency writes	When the workload involves frequent write operations (such as thousands of insertions, updates, or deletions per second), the oplog write rate increases significantly. Because the oplog functions as a ring buffer that records all write operations in chronological order, intense write activity will lead to rapid exhaustion of the oplog space.

Backup Policies

Full backup
Hidden nodes cannot synchronize data during backup. If the backup duration exceeds the oplog window, the hidden nodes will miss the latest operation records from the primary node, making them unable to catch up with the primary node through replication.
To avoid this issue, take the following measures:
- Modify the backup policy to schedule backup tasks during off-peak hours. For details, see Configuring an Automated Backup Policy.
- Adjust the oplog size to ensure that the oplog window adequately covers the backup duration. For details, see Size Planning Suggestions.
- Enable CBR snapshot backup to minimize the lock duration on hidden nodes. For details about how to enable it, see Configuring an Automated Backup Policy.

Incremental backup
If the oplog generation rate exceeds 250 GB/h (or 75 MB/s), the incremental backup process may lag behind, resulting in unavailable restoration time points. For example, if an instance has a 20 GB oplog with a 1-hour window, the estimated oplog generation rate is 20 GB/h.
To prevent incremental backup issues caused by excessive write speeds, consider the following measures:
- Throttling write concurrency: Properly manage the number of concurrent write threads to prevent a surge in data writes from overwhelming the system within a short period.
- Adjusting write concern: Change the write concern from {w:1} to {w:"majority"}. This ensures that data is returned only after being acknowledged by a majority of nodes, thereby enhancing data reliability and consistency.
- Optimizing data distribution: For replica set instances, consider migrating data to cluster instances. Distributing data across multiple shards can fully use their storage and compute power. This improves write efficiency and system scalability.

Size Planning Suggestions

Plan the oplog size based on two critical dimensions.

Dimension	Description
Oplog generation rate	If an instance has a 20 GB oplog with a 1-hour window, the estimated oplog generation rate is 20 GB/h.
Backup duration	The time required for creating a backup depends on the data volume of your instance. If the average backup speed is 60 MB/s, the backup duration does not exceed 60% of the oplog window.

If you need to increase the oplog size, use either of the following methods:

If storage capacity is sufficient but the oplog allocation is inadequate, increase the oplog size by modifying oplogSizePercent in the parameter group. For details, see Modifying DDS Instance Parameters.
- Before modifying oplogSizePercent, ensure that you have enough storage.
If storage capacity is insufficient and you cannot increase the oplog size by modifying oplogSizePercent, scale up the storage for your instance. For details, see Scaling Up Storage Space.

Previous topic: Working with Indexes

Feedback

Was this page helpful?

Helpful Not helpful

Provide feedback

Thank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.

The system is busy. Please try again later.

For any further questions, feel free to contact us through the chatbot.

Chatbot