Help Center/ GaussDB/ FAQs/ Database Usage/ What Do I Do If Replay Speed of Standby DNs Cannot Catch Up with Write Speed of Primary DN?

Updated on 2024-09-13 GMT+08:00

View PDF

What Do I Do If Replay Speed of Standby DNs Cannot Catch Up with Write Speed of Primary DN?

Symptom

When workloads on a DB instance are heavy, the replay speed of standby DNs cannot catch up with the write speed of the primary DN. After the system runs for a long time, logs are accumulated on the standby DNs. If the primary DN is faulty, data restoration takes a long time and the database is unavailable, severely affecting system availability.

Solution

GaussDB provides ultimate RTO to minimize the data recovery time after a primary DN is faulty and improve availability.

To use ultimate RTO, submit an application by choosing Service Tickets > Create Service Ticket in the upper right corner of the console.

Precautions

Ultimate RTO focuses only on whether the RTO of the standby DN meets the requirements. Ultimate RTO has no inherent flow control and uses the recovery_time_target parameter for flow control instead.
Ultimate RTO uses multi-page redo threads to accelerate the replay progress. When the replay on the standby DN catches up with that on the primary DN and the standby DN is unloaded, the CPU usage of a single page redo thread is about 15% (the actual value depends on the hardware and parameter configuration). Total CPU usage of the replay on the standby DN = CPU usage of a single page redo thread x Number of page redo threads. Because more threads are started, the CPU and memory consumption is higher than that of parallel replay and serial replay.
Ultimate RTO supports read on standby nodes. Because historical data pages are read, the query performance on the standby DNs is worse than that on the primary DN and worse than that of read on standby nodes during parallel redo. However, query blocking is alleviated.
The replay speed of DDL logs is much slower than that of page modification logs. Frequent DDL operations may increase the primary/standby latency.
When the I/O and CPU usage of a node is too high (it is recommended that the I/O and CPU usage be less than or equal to 70%), the performance of replay and read on standby nodes deteriorates significantly.

Parent topic: Database Usage

Feedback

Was this page helpful?

Helpful Not helpful

Provide feedback

Thank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.

The system is busy. Please try again later.

Which of the following issues have you encountered?

Content is inconsistent with the product UI

Unclear descriptions

Lack of examples or code

Incorrect steps

Can't find what I need

Lack of best practices

Feedback (optional)

0/500

Select at least one type of issue, and enter your comments or suggestions.

Enter a maximum of 500 characters.

Submit Cancel

For any further questions, feel free to contact us through the chatbot.

Chatbot