Updated on 2025-05-22 GMT+08:00

Common Faults

Excessive CPU, Memory, Bandwidth, or Connection Usage of a DCS Instance

  • Check: Use Cloud Eye to check the CPU, memory, bandwidth, or connection usage.
  • Recovery:
    1. Change the specifications to expand resources based on service requirements.
    2. Enable overload protection for the application layer to keep high-priority services running smoothly. Switch some services that do not require high performance back to the original data source.

Failed to Connect to a Backend DCS Instance

  • Check: Network connection failed.
  • Recovery:
    1. If this is a temporary failure, for example, a master/standby switchover is in progress, connect to the DCS instance again at the application layer. For details, see RES09 Retries After Failures.
    2. If the connection to an overloaded DCS instance failed, rectify the fault by referring to "Excessive CPU, Memory, Bandwidth, or Connection Usage of a DCS Instance."
    3. For non-temporary failures, the data source at the application layer needs to be switched back to the original one for processing. This prevents service interruption due to cache faults.

Occasionally Failed to Read Data from or Write Data to a DCS Instance

  • Check: Data failed to be read or written. Occasional timeout errors are normal in Redis because of network connectivity and client timeout configurations.
  • Recovery:
    1. If this is a temporary failure, for example, the DCS instance is undergoing a master/standby switchover, retry at the application layer. For details, see RES09 Retries After Failures.
    2. If the connection to an overloaded DCS instance failed, rectify the fault by referring to "Excessive CPU, Memory, Bandwidth, or Connection Usage of a DCS Instance."
    3. For non-temporary failures, the data source at the application layer needs to be switched back to the original one for processing. This prevents service interruption due to cache faults.