Updated on 2025-10-28 GMT+08:00

Automatic Recovery of Extended Primary/Standby Replication Delay

Scenario

The primary/standby replication delay of a DB instance was long, kept increasing for a period of time, and then automatically recovered.

Possible Causes

According to Primary/Standby Replication Delay Scenarios and Solutions and How Primary/Standby Replication Works, this problem is caused by large transactions or DDL operations.

You can analyze full logs or slow query logs to check whether there are large transactions or DDL operations.

As shown in the following figure, if a DDL operation for adding an index was recorded in the slow query logs, the table contained hundreds of millions of data records, and the execution took about one day, the replication delay kept increasing when the DDL operation was replayed on the read replica or standby node. After the DDL operation was replayed, the replication delay dropped back to the normal range.

Solution

  • Wait until the DDL operation is complete.
  • Add indexes during off-peak hours.