Events Supported by Event Monitoring
|
Event Source |
Event Name |
Event Severity |
Description |
Solution |
Impact |
|---|---|---|---|---|---|
|
ECS |
Start auto recovery |
Major |
If a host is faulty, ECSs on the host will be automatically migrated to another properly-running host. During the migration, the ECSs will be restarted. |
Wait for the event to end and check whether services are affected. |
Services may be interrupted. |
|
Stop auto recovery |
Major |
After the automatic migration is complete, the ECS will be restored. |
This event indicates that the ECS has restored to normal and is working properly. |
No impact. |
|
|
Auto recovery timeout (being processed on the backend) |
Major |
The operation of migrating the ECS to a normal host times out. |
Migrate services to other ECSs. |
Services are interrupted. |
|
|
GPU link fault |
Critical |
The GPU of the host running the ECS is faulty or is recovering from a fault. |
Deploy service applications in HA mode. After the GPU fault is rectified, check whether services are restored. |
Services are interrupted. |
|
|
FPGA link fault |
Critical |
The FPGA of the host running the ECS is faulty or is recovering from a fault. |
Deploy service applications in HA mode. After the FPGA fault is rectified, check whether services are restored. |
Services are interrupted. |
|
|
Improper ECS running |
Major |
The ECS is faulty, or the NIC is abnormal, causing the ECS to run abnormally. |
Deploy service applications in HA mode. After the fault is rectified, check whether services recover. |
Services are interrupted. |
|
|
Improper ECS running recovered |
Major |
The ECS has restored to the normal status. |
Wait for the ECS status to become normal and check whether services are affected. |
No impact. |
|
|
Delete ECS |
Major |
The ECS is deleted
|
Check whether the deletion was performed intentionally by a user. |
Services are interrupted. |
|
|
Reboot ECS |
Minor |
The ECS is restarted
|
Check whether the restart was performed intentionally by a user.
|
Services are interrupted. |
|
|
Stop ECS |
Minor |
The ECS is stopped
|
|
Services are interrupted. |
|
|
Delete NIC |
Major |
The ECS NIC is deleted
|
|
Services may be interrupted. |
|
|
Modify ECS specifications |
Minor |
The ECS specifications have been modified
|
|
Services are interrupted. |
Once a physical host running ECSs breaks down, the ECSs automatically migrate to a functional physical host. The ECSs will restart during the migration.
|
Event Source |
Event Name |
Event Severity |
Description |
Solution |
Impact |
|---|---|---|---|---|---|
|
BMS |
Reboot BMS |
Major |
The BMS is restarted
|
|
Services are interrupted. |
|
Unexpected restart |
Major |
The BMS restarts unexpectedly, which may be caused by
|
|
Services are interrupted. |
|
|
Stop BMS |
Major |
The BMS is stopped
|
|
Services are interrupted. |
|
|
Unexpected shutdown |
Major |
The BMS stops unexpectedly, which may be caused by
|
|
Services are interrupted. |
|
|
Network interruption |
Major |
The BMS network is interrupted, which may because
|
|
Services are interrupted. |
|
|
PCIe error |
Major |
The PCIe device or main board on the BMS is faulty. |
|
The network or disk read/write services are affected. |
|
|
Disk fault |
Major |
The disk backplane or the disk on the BMS is faulty. |
|
Data read/write services are affected, or the BMS cannot be started. |
|
|
EVS error |
Major |
The BMS fails to connect to the EVS disk, which may because
|
|
Data read/write services are affected, or the BMS cannot be started. |
|
Event Source |
Event Name |
Event Severity |
Description |
Solution |
Impact |
|---|---|---|---|---|---|
|
EIP |
EIP bandwidth overflow |
Major |
The required bandwidth exceeds the purchased bandwidth, which may slow down the network or cause packet loss.
NOTE:
EIP bandwidth overflow is only launched in the CN North-Beijing1, CN East-Shanghai1, CN East-Shanghai2, and CN South-Guangzhou regions. |
Check whether the EIP bandwidth keeps increasing and whether services are normal. Expand capacity if necessary. |
The network becomes slow or packets are lost. |
|
Release EIP |
Minor |
The EIP is deleted. |
Check whether the resource is deleted by mistake. |
The server cannot access the Internet. |
|
|
EIP blocked |
Critical |
If the required bandwidth exceeds 5 GB, packets will be discarded. It may be caused by DDoS attacks. |
Replace the EIP to prevent services from being affected. Locate and deal with the fault. |
Services are impacted. |
|
|
EIP unblocked |
Critical |
The EIP has been unblocked. |
Use the original EIP again. |
No impact. |
|
Event Source |
Event Name |
Event Severity |
Description |
Solution |
Impact |
|---|---|---|---|---|---|
|
CBR |
Backup failed |
Critical |
Failed to create the backup. |
Manually create a backup or contact customer service. |
Data loss may occur. |
|
Restoration failed |
Critical |
Failed to restore the resource using a backup. |
Use other backups to restore the resource or contact customer service. |
Data loss may occur. |
|
|
Backup deletion failed |
Critical |
Failed to delete the backup. |
Try again later or contact customer service. |
Charging may be abnormal. |
|
|
Vault deletion failed |
Critical |
Failed to delete the vault. |
Try again later or contact customer service. |
Charging may be abnormal. |
|
|
Replication failed |
Critical |
Failed to replicate the backup. |
Try again later or contact customer service. |
Data loss may occur. |
|
|
Backup succeeded |
Major |
The backup is created successfully. |
None |
No impact. |
|
|
Restoration succeeded |
Major |
Resource restoration using a backup succeeded. |
Check whether the data is successfully restored. |
No impact. |
|
|
Backup deletion succeeded |
Major |
The backup is deleted successfully. |
None |
No impact. |
|
|
Vault deletion succeeded |
Major |
The vault is deleted successfully. |
None |
No impact. |
|
|
Replication succeeded |
Major |
The backup is replicated successfully. |
None |
No impact. |
|
Event Source |
Event Name |
Event Severity |
Description |
Solution |
Impact |
|---|---|---|---|---|---|
|
RDS |
DB instance creation failure |
Major |
A DB instance fails to create because the number of disks is insufficient, the quota is insufficient, or underlying resources are exhausted. |
Check the number and quota of disks. Release resources and create DB instances again. |
DB instances cannot be created. |
|
Full backup failure |
Major |
A single full backup failure does not affect the files that have been successfully backed up, but prolong the incremental backup time during the point-in-time restore (PITR). |
Create a manual backup again. |
Backup failed. |
|
|
Primary/secondary switchover failure |
Major |
The standby DB instance does not take over services from the primary DB instance due to network or server failures. The original primary DB instance continues to provide services within a short time. |
Check whether the connection between the application and the database is re-established. |
No impact. |
|
|
Abnormal replication status |
Major |
The possible causes are as follows: 1. The replication delay between the primary and standby DB instances is too long (usually occurs when a large amount of data is written to databases or a large transaction is performed). During peak hours, data may be blocked. 2. The network between the primary and secondary DB instances is disconnected. |
Submit a service ticket. |
This event does not interrupt data read and write of the DB instance, and your applications are not affected. |
|
|
Replication status recovered |
Major |
The replication delay between the primary and standby DB instances is within the normal range, or the network connection between them has restored. |
No action is required. |
No impact. |
|
|
Faulty DB instance |
Major |
A single or primary DB instance is faulty due to a disaster or a server failure. |
Check whether an automatic backup policy has been configured for the DB instance and submit a service ticket. |
The database service may be unavailable. |
|
|
DB instance recovered |
Major |
RDS uses high availability tools to rebuild the standby DB instance for disaster recovery. After the recovery, this event will be reported. |
No action is required. |
No impact. |
|
|
Changing to primary/secondary DB instances failure |
Major |
A fault occurs when you create the standby DB instance or configure synchronization between the primary and standby DN instances. The fault may because resources are insufficient in the data center where the standby DB instance is located. |
Submit a service ticket. |
This event does not interrupt data read and write of the DB instance, and your applications are not affected. |
|
Event Source |
Event Name |
Event Severity |
|---|---|---|
|
RDS |
Reset administrator password |
Major |
|
Operate DB instance |
Major |
|
|
Delete DB instance |
Minor |
|
|
Modify backup policy |
Minor |
|
|
Change parameter group |
Minor |
|
|
Delete parameter group |
Minor |
|
|
Reset parameter group |
Minor |
|
|
Change database port |
Major |
|
Event Source |
Event Name |
Event Severity |
Description |
Solution |
Impact |
|---|---|---|---|---|---|
|
DDS |
DB instance creation failure |
Major |
A DB instance fails to create because the number of disks is insufficient, the quota is insufficient, or underlying resources are exhausted. |
Check the number and quota of disks. Release resources and create DB instances again. |
DB instances cannot be created. |
|
Abnormal replication status |
Major |
The possible causes are as follows: 1. The replication delay between the primary and standby DB instances is too long (usually occurs when a large amount of data is written to databases or a large transaction is performed). During off-peak hours, the replication delay between the primary and standby DB instances gradually decreases. 2. The network between the primary and secondary DB instances is disconnected. |
Submit a service ticket. |
This event does not interrupt data read and write of the DB instance, and your applications are not affected. |
|
|
Replication status recovered |
Major |
The replication delay between the primary and standby DB instances is within the normal range, or the network connection between them has restored. |
No action is required. |
No impact. |
|
|
Faulty DB instance |
Major |
This event is a key alarm event and is reported when an instance is faulty due to a disaster or a server failure. |
Submit a service ticket. |
The database service may be unavailable. |
|
|
DB instance recovered |
Major |
If a disaster occurs, NoSQL provides an HA tool to automatically or manually rectify the fault. After the fault is rectified, this event is reported. |
No action is required. |
No impact. |
|
|
Faulty node |
Major |
This event is a key alarm event and is reported when a database node is faulty due to a disaster or a server failure. |
Check whether the database service is available and submit a service ticket. |
The database service may be unavailable. |
|
|
Node recovered |
Major |
If a disaster occurs, NoSQL provides an HA tool to automatically or manually rectify the fault. After the fault is rectified, this event is reported. |
No action is required. |
No impact. |
|
|
Primary/standby switchover or failover |
Major |
This event is reported when a primary/standby switchover or a failover is triggered. |
No action is required. |
No impact. |
|
Event Source |
Event Name |
Event Severity |
Description |
Solution |
Impact |
|---|---|---|---|---|---|
|
NoSQL |
Failed to create a DB instance |
Major |
The DB instance quota or underlying resources are insufficient. |
Release the instances that are no longer used and try to provision them again, or submit a service ticket to adjust the quota upper limit. |
DB instances cannot be created. |
|
Failed to modify the specifications |
Major |
The underlying resources are insufficient. |
Submit a service ticket. The O&M personnel will coordinate resources in the background, and then you need to change the specifications again. |
Services are interrupted. |
|
|
Failed to add a node |
Major |
The underlying resources are insufficient. |
Submit a service ticket. The O&M personnel will coordinate resources in the background, and then you delete the node that fails to be added and add a new node. |
No impact. |
|
|
Failed to delete a node |
Major |
The underlying resources fail to be released. |
Delete the node again. |
No impact. |
|
|
Failed to scale up the storage space |
Major |
The underlying resources are insufficient. |
Submit a service ticket. The O&M personnel will coordinate resources in the background and then you scale up the storage space again. |
Services may be interrupted. |
|
|
Failed to reset the password |
Major |
Resetting the password times out. |
Reset the password again. |
No impact. |
|
|
Failed to modify a parameter group |
Major |
Modifying a parameter group times out. |
Modify the parameter group again. |
No impact. |
|
|
Failed to set the backup policy |
Major |
The database connection is abnormal. |
Set the backup policy again. |
No impact. |
|
|
Failed to create a manual backup |
Major |
The backup files fail to be exported or uploaded. |
Submit a service ticket to the O&M personnel. |
Data cannot be backed up. |
|
|
Failed to create an automated backup |
Major |
The backup files fail to be exported or uploaded. |
Submit a service ticket to the O&M personnel. |
Data cannot be backed up. |
|
|
Faulty DB instance |
Major |
This event is a key alarm event and is reported when an instance is faulty due to a disaster or a server failure. |
Submit a service ticket. |
The database service may be unavailable. |
|
|
DB instance recovered |
Major |
If a disaster occurs, NoSQL provides an HA tool to automatically or manually rectify the fault. After the fault is rectified, this event is reported. |
No action is required. |
No impact. |
|
|
Faulty node |
Major |
This event is a key alarm event and is reported when a database node is faulty due to a disaster or a server failure. |
Check whether the database service is available and submit a service ticket. |
The database service may be unavailable. |
|
|
Node recovered |
Major |
If a disaster occurs, NoSQL provides an HA tool to automatically or manually rectify the fault. After the fault is rectified, this event is reported. |
No action is required. |
No impact. |
|
|
Primary/standby switchover or failover |
Major |
This event is reported when a primary/standby switchover or a failover is triggered. |
No action is required. |
No impact. |
|
|
HotKeyOccurs |
Major |
Objectively, the primary key is improperly set. As a result, hotspot data is distributed in one partition. The improper application design, which causes frequent read and write operations on a key. |
1. Choose a proper partition key. 2. Add service cache. The service application reads hotspot data from the cache firstly. |
The service request success rate is affected, and the cluster performance and stability also be affected. |
|
|
BigKeyOccurs |
Major |
The primary key design is improper. The number of records or data in a single partition is too large, causing unbalanced node loads. |
1. Choose a proper partition key. 2. Add a new partition key for hashing data. |
As the data in the large partition increases, the cluster stability deteriorates. |
|
Event Source |
Event Name |
Event Severity |
Description |
Solution |
Impact |
|---|---|---|---|---|---|
|
GaussDB (for MySQL) |
Failed to create a DB instance |
Major |
DB instances fail to be created because the quota is insufficient or underlying resources are exhausted. |
Check the DB instance quota. Release resources and create DB instances again. |
DB instances cannot be created. |
|
Read replica promotion failure |
Major |
The read replica fails to be promoted to the primary DB instance due to network or server failures. The original primary DB instance takes over services within a short time. |
Submit a service ticket. |
The read replica fails to be promoted to the primary DB instance. |
|
|
Read replica creation failure |
Major |
Read replicas fail to be created because the quota is insufficient or underlying resources are exhausted. |
Check the read replica quota. Release resources and create DB instances again. |
Read replicas fail to be created. |
|
|
Instance class change failure |
Major |
DB instance classes fail to be changed because the quota is insufficient or underlying resources are exhausted. |
Submit a service ticket. |
DB instance classes fail to be changed. |
|
Event Source |
Event Name |
Event Severity |
Description |
Solution |
Impact |
|---|---|---|---|---|---|
|
GaussDB (for openGauss) |
Process Status Alarm |
Major |
Key processes exit, including: CMS/CMA, ETCD, GTM, CN or DN process. |
Wait until the process is automatically recovered or a primary/standby failover is automatically performed. Check whether services are recovered. If no, contact SRE engineers. |
If processes on primary nodes are faulty, services are interrupted and then rolled back. If processes on standby nodes are faulty, services are not affected. |
|
Component Status Alarm |
Major |
Key components do not respond, including: CMA, ETCD, GTM, CN or DN components. |
Wait until the process is automatically recovered or a primary/standby failover is automatically performed. Check whether services are recovered. If no, contact SRE engineers. |
If processes on primary nodes do not respond, neither do the services. If processes on standby nodes are faulty, services are not affected. |
|
|
Cluster Status Alarm |
Major |
The cluster is abnormal, such as: The cluster is read-only. Majority of ETCDs are faulty. The cluster resources are unevenly distributed. |
Contact SRE engineers. |
If the cluster is read-only, only read-only requests are processed. If the majority of ETCDs are fault, the cluster is unavailable. If the cluster resources are unevenly distributed, the cluster performance and reliability deteriorate. |
|
|
Hardware Resource Alarm |
Major |
A major hardware fault occurs in the cluster, such as: A disk is damaged, causing the GTM network communication fault. |
Contact SRE engineers. |
Some or all services are affected. |
|
|
Status Transition Alarm |
Major |
The following events occur in the cluster: DN build failed. Forcible DN promotion. Primary/standby DN switchover/failover Primary/standby GTM switchover/failover |
Wait until the fault is automatically rectified and check whether services are recovered. If no, contact SRE engineers. |
Some services are interrupted. |
|
|
Other Abnormal Alarm |
Major |
Disk usage threshold alarm |
Focus on service changes and scale up storage space as needed. |
If the used storage space exceeds the threshold, storage space cannot be scaled up. |
|
Event Source |
Event Name |
Event Severity |
|---|---|---|
|
VPC |
Delete VPC |
Major |
|
Modify VPC |
Minor |
|
|
Delete subnet |
Minor |
|
|
Modify subnet |
Minor |
|
|
Modify bandwidth |
Minor |
|
|
Delete VPN |
Major |
|
|
Modify VPN |
Minor |
|
Event Source |
Event Name |
Event Severity |
|---|---|---|
|
EVS |
Update disk |
Minor |
|
Expand disk |
Minor |
|
|
Delete disk |
Major |
|
Event Source |
Event Name |
Event Severity |
|---|---|---|
|
IAM |
Login |
Minor |
|
Logout |
Minor |
|
|
Change password |
Major |
|
|
Create user |
Minor |
|
|
Delete user |
Major |
|
|
Update user |
Minor |
|
|
Create user group |
Minor |
|
|
Delete user group |
Major |
|
|
Update user group |
Minor |
|
|
Create identity provider |
Minor |
|
|
Delete identity provider |
Major |
|
|
Update identity provider |
Minor |
|
|
Update metadata |
Minor |
|
|
Update security policy |
Major |
|
|
Add credential |
Major |
|
|
Delete credential |
Major |
|
|
Create project |
Minor |
|
|
Update project |
Minor |
|
|
Suspend project |
Major |
|
Event Source |
Event Name |
Event Severity |
|---|---|---|
|
KMS |
Disable key |
Major |
|
Schedule key deletion |
Minor |
|
|
Retire grant |
Major |
|
|
Revoke grant |
Major |
|
Event Source |
Event Name |
Event Severity |
|---|---|---|
|
OBS |
Delete bucket |
Major |
|
Delete bucket policy |
Major |
|
|
Set bucket ACL |
Minor |
|
|
Set bucket policy |
Minor |
Last Article: Creating an Alarm Rule to Monitor an Event
Next Article: Viewing Alarm History
Did this article solve your problem?
Thank you for your score!Your feedback would help us improve the website.