Events Supported by Event Monitoring
Event Source |
Event Name |
Event ID |
Event Severity |
Description |
Solution |
Impact |
---|---|---|---|---|---|---|
ECS |
Recovery started |
startAutoRecovery |
Major |
ECSs on a faulty host would be automatically migrated to another properly-running host. During the migration, the ECSs was restarted. |
Wait for the event to end and check whether services are affected. |
Services may be interrupted. |
Recovery succeeded |
endAutoRecovery |
Major |
The ECS was recovered after the automatic migration. |
This event indicates that the ECS has been recovered and been working properly. |
None |
|
Auto recovery timeout (being processed on the backend) |
faultAutoRecovery |
Major |
Migrating the ECS to a normal host timed out. |
Migrate services to other ECSs. |
Services are interrupted. |
|
GPU link fault |
GPULinkFault |
Critical |
The GPU of the host running the ECS was faulty or was recovering from a fault. |
Deploy service applications in HA mode. After the GPU fault is rectified, check whether services are restored. |
Services are interrupted. |
|
FPGA link fault |
FPGALinkFault |
Critical |
The FPGA of the host running the ECS was faulty or was recovering from a fault. |
Deploy service applications in HA mode. After the FPGA fault is rectified, check whether services are restored. |
Services are interrupted. |
|
ECS or NIC exceptions occurred |
vmIsRunningImproperly |
Major |
The ECS was faulty or the ECS NIC was abnormal. |
Deploy service applications in HA mode. After the fault is rectified, check whether services recover. |
Services are interrupted. |
|
ECS or NIC exceptions handled |
vmIsRunningImproperlyRecovery |
Major |
The ECS was restored to the normal status. |
Wait for the ECS status to become normal and check whether services are affected. |
None |
|
ECS deleted |
deleteServer |
Major |
The ECS was deleted
|
Check whether the deletion was performed intentionally by a user. |
Services are interrupted. |
|
ECS restarted |
rebootServer |
Minor |
The ECS was restarted
|
Check whether the restart was performed intentionally by a user.
|
Services are interrupted. |
|
ECS stopped |
stopServer |
Minor |
The ECS was stopped
NOTE:
The ECS is stopped only after CTS is enabled. |
|
Services are interrupted. |
|
NIC deleted |
deleteNic |
Major |
The ECS NIC was deleted
|
|
Services may be interrupted. |
|
ECS resized |
resizeServer |
Minor |
The ECS was resized
|
|
Services are interrupted. |
|
GuestOS restarted |
RestartGuestOS |
Minor |
The guest OS was restarted. |
Contact O&M personnel. |
Services may be interrupted. |
|
ECS failure due to abnormal host processes |
VMFaultsByHostProcessExceptions |
Critical |
The processes of the host accommodating the ECS were abnormal. |
Contact O&M personnel. |
The ECS is faulty. |
|
Startup failure |
faultPowerOn |
Major |
The ECS failed to start. |
Start the ECS again. If the problem persists, contact O&M personnel. |
The ECS cannot start. |
|
Live migration started |
liveMigrationStarted |
Major |
The host where the ECS resides may be faulty. Live migrate the ECS in advance to prevent service interruptions caused by host breakdown. |
Wait for the event to end and check whether services are affected. |
Services may be interrupted for less than 1s. |
|
Live migration completed |
liveMigrationCompleted |
Major |
The ECS was restored to be normal after the live migration. |
Check whether services are running properly. |
None |
|
Live migration failure |
liveMigrationFailed |
Major |
An error occurred during the live migration of an ECS. |
Check whether services are running properly. |
In rare cases, services may be interrupted. |
|
Host breakdown risk |
hostMayCrash |
Major |
The host where the ECS resides may break down, and the risk cannot be prevented through live migration due to some reasons. |
Migrate services running on the ECS first and delete or stop the ECS. Start the ECS only after the O&M personnel eliminate the risk. |
The host may break down, causing service interruption. |
Once a physical host running ECSs breaks down, the ECSs are automatically migrated to a functional physical host. During the migration, the ECSs will be restarted.
Event Source |
Event Name |
Event ID |
Event Severity |
Description |
Solution |
Impact |
---|---|---|---|---|---|---|
EIP |
EIP bandwidth exceeded |
EIPBandwidthOverflow |
Major |
The used bandwidth exceeded the purchased one, which may slow down the network or cause packet loss. The value of this event is the maximum value in a monitoring period, and the value of the EIP inbound and outbound bandwidth is the value at a specific time point in the period. The metrics are described as follows: egressDropBandwidth: dropped outbound packets (bytes) egressAcceptBandwidth: accepted outbound packets (bytes) egressMaxBandwidthPerSec: peak outbound bandwidth (bit/s) ingressAcceptBandwidth: accepted inbound packets (bytes) ingressMaxBandwidthPerSec: peak inbound bandwidth (bit/s) ingressDropBandwidth: dropped inbound packets (bytes) |
Check whether the EIP bandwidth keeps increasing and whether services are normal. Increase bandwidth if necessary. |
The network becomes slow or packets are lost. |
EIP released |
deleteEip |
Minor |
The EIP was released. |
Check whether the EIP was release by mistake. |
The server that has the EIP bound cannot access the Internet. |
|
EIP blocked |
blockEIP |
Critical |
The used bandwidth of an EIP exceeded 5 Gbit/s, the EIP were blocked and packets were discarded. Such an event may be caused by DDoS attacks. |
Replace the EIP to prevent services from being affected. Locate and deal with the fault. |
Services are impacted. |
|
EIP unblocked |
unblockEIP |
Critical |
The EIP was unblocked. |
Use the previous EIP again. |
None |
|
EIP traffic scrubbing started |
ddosCleanEIP |
Major |
Traffic scrubbing on the EIP was started to prevent DDoS attacks. |
Check whether the EIP was attacked. |
Services may be interrupted. |
|
EIP traffic scrubbing ended |
ddosEndCleanEip |
Major |
Traffic scrubbing on the EIP to prevent DDoS attacks was ended. |
Check whether the EIP was attacked. |
Services may be interrupted. |
|
QoS bandwidth exceeded |
EIPBandwidthRuleOverflow |
Major |
The used QoS bandwidth exceeded the allocated one, which may slow down the network or cause packet loss. The value of this event is the maximum value in a monitoring period, and the value of the EIP inbound and outbound bandwidth is the value at a specific time point in the period. egressDropBandwidth: dropped outbound packets (bytes) egressAcceptBandwidth: accepted outbound packets (bytes) egressMaxBandwidthPerSec: peak outbound bandwidth (bit/s) ingressAcceptBandwidth: accepted inbound packets (bytes) ingressMaxBandwidthPerSec: peak inbound bandwidth (bit/s) ingressDropBandwidth: dropped inbound packets (bytes) |
Check whether the EIP bandwidth keeps increasing and whether services are normal. Increase bandwidth if necessary. |
The network becomes slow or packets are lost. |
Event Source |
Event Name |
Event ID |
Event Severity |
Description |
Solution |
Impact |
---|---|---|---|---|---|---|
AAD |
DDoS Attack Events |
ddosAttackEvents |
Major |
A DDoS attack occurs in the AAD protected lines. |
Judge the impact on services based on the attack traffic and attack type. If the attack traffic exceeds your purchased elastic bandwidth, change to another line or increase your bandwidth. |
Services may be interrupted. |
Event Source |
Event Name |
Event ID |
Event Severity |
Description |
Solution |
Impact |
---|---|---|---|---|---|---|
CBR |
Failed to create the backup. |
backupFailed |
Critical |
The backup failed to be created. |
Manually create a backup or contact customer service. |
Data loss may occur. |
Failed to restore the resource using a backup. |
restorationFailed |
Critical |
The resource failed to be restored using a backup. |
Restore the resource using another backup or contact customer service. |
Data loss may occur. |
|
Failed to delete the backup. |
backupDeleteFailed |
Critical |
The backup failed to be deleted. |
Try again later or contact customer service. |
Charging may be abnormal. |
|
Failed to delete the vault. |
vaultDeleteFailed |
Critical |
The vault failed to be deleted. |
Try again later or contact technical support. |
Charging may be abnormal. |
|
Replication failure |
replicationFailed |
Critical |
The backup failed to be replicated. |
Try again later or contact technical support. |
Data loss may occur. |
|
The backup is created successfully. |
backupSucceeded |
Major |
The backup was created. |
None |
None |
|
Resource restoration using a backup succeeded. |
restorationSucceeded |
Major |
The resource was restored using a backup. |
Check whether the data is successfully restored. |
None |
|
The backup is deleted successfully. |
backupDeletionSucceeded |
Major |
The backup was deleted. |
None |
None |
|
The vault is deleted successfully. |
vaultDeletionSucceeded |
Major |
The vault was deleted. |
None |
None |
|
Replication success |
replicationSucceeded |
Major |
The backup was replicated successfully. |
None |
None |
|
Client offline |
agentOffline |
Critical |
The backup client was offline. |
Ensure that the Agent status is normal and the backup client can be connected to Huawei Cloud. |
Backup tasks may fail. |
|
Client online |
agentOnline |
Major |
The backup client was online. |
None |
None |
Event Source |
Event Name |
Event ID |
Event Severity |
---|---|---|---|
RDS |
Reset administrator password |
resetPassword |
Major |
Operate DB instance |
instanceAction |
Major |
|
Delete DB instance |
deleteInstance |
Minor |
|
Modify backup policy |
setBackupPolicy |
Minor |
|
Change parameter group |
updateParameterGroup |
Minor |
|
Delete parameter group |
deleteParameterGroup |
Minor |
|
Reset parameter group |
resetParameterGroup |
Minor |
|
Change database port |
changeInstancePort |
Major |
|
Primary/standby switchover or failover |
PrimaryStandbySwitched |
Major |
Event Source |
Event Name |
Event ID |
Event Severity |
Description |
Solution |
Impact |
---|---|---|---|---|---|---|
DDS |
DB instance creation failure |
DDSCreateInstanceFailed |
Major |
A DDS instance fails to be created due to insufficient disks, quotas, and underlying resources. |
Check the number and quota of disks. Release resources and create DDS instances again. |
DDS instances cannot be created. |
Replication failed |
DDSAbnormalReplicationStatus |
Major |
The possible causes are as follows:
|
Submit a service ticket. |
Your applications are not affected because this event does not interrupt data read and write. |
|
Replication recovered |
DDSReplicationStatusRecovered |
Major |
The replication delay between the primary and standby instances is within the normal range, or the network connection between them has restored. |
No action is required. |
None |
|
DB instance failed |
DDSFaultyDBInstance |
Major |
This event is a key alarm event and is reported when an instance was faulty due to a disaster or a server failure. |
Submit a service ticket. |
The database service may be unavailable. |
|
DB instance recovered |
DDSDBInstanceRecovered |
Major |
If a disaster occurs, NoSQL provides an HA tool to automatically or manually rectify the fault. After the fault is rectified, this event is reported. |
No action is required. |
None |
|
Faulty node |
DDSFaultyDBNode |
Major |
This event is a key alarm event and is reported when a database node is faulty due to a disaster or a server failure. |
Check whether the database service is available and submit a service ticket. |
The database service may be unavailable. |
|
Node recovered |
DDSDBNodeRecovered |
Major |
If a disaster occurs, NoSQL provides an HA tool to automatically or manually rectify the fault. After the fault is rectified, this event is reported. |
No action is required. |
None |
|
Primary/standby switchover or failover |
DDSPrimaryStandbySwitched |
Major |
A primary/standby switchover is performed or a failover is triggered. |
No action is required. |
None |
|
Insufficient data disk space |
DDSRiskyDataDiskUsage |
Major |
The data disk space is insufficient. |
Expand the disk capacity. For details, see section "Scaling Up Storage Space" in the user guide of the corresponding service. |
The instance is set to read-only and data cannot be written to the instance. |
|
Data disk expanded and restored to writable |
DDSDataDiskUsageRecovered |
Major |
The data disk capacity has been expanded and the data disk becomes writable. |
No action is required. |
None |
Event Source |
Event Name |
Event ID |
Event Severity |
Description |
Solution |
Impact |
---|---|---|---|---|---|---|
NoSQL |
DB instance creation failed |
NoSQLCreateInstanceFailed |
Major |
The instance quota or underlying resources are insufficient. |
Release the instances that are no longer used and try to provision them again, or submit a service ticket to adjust the quota. |
DB instances cannot be created. |
Specifications modification failed |
NoSQLResizeInstanceFailed |
Major |
The underlying resources are insufficient. |
Submit a service ticket. The O&M personnel will coordinate resources in the background, and then you need to change the specifications again. |
Services are interrupted. |
|
Node adding failed |
NoSQLAddNodesFailed |
Major |
The underlying resources are insufficient. |
Submit a service ticket. The O&M personnel will coordinate resources in the background, and then you delete the node that failed to be added and add a new node. |
None |
|
Node deletion failed |
NoSQLDeleteNodesFailed |
Major |
The underlying resources fail to be released. |
Delete the node again. |
None |
|
Storage space scale-up failed |
NoSQLScaleUpStorageFailed |
Major |
The underlying resources are insufficient. |
Submit a service ticket. The O&M personnel will coordinate resources in the background and then you scale up the storage space again. |
Services may be interrupted. |
|
Password reset failed |
NoSQLResetPasswordFailed |
Major |
Resetting the password times out. |
Reset the password again. |
None |
|
Parameter group change failed |
NoSQLUpdateInstanceParamGroupFailed |
Major |
Changing a parameter group times out. |
Change the parameter group again. |
None |
|
Backup policy configuration failed |
NoSQLSetBackupPolicyFailed |
Major |
The database connection is abnormal. |
Configure the backup policy again. |
None |
|
Manual backup creation failed |
NoSQLCreateManualBackupFailed |
Major |
The backup files fail to be exported or uploaded. |
Submit a service ticket to the O&M personnel. |
Data cannot be backed up. |
|
Automated backup creation failed |
NoSQLCreateAutomatedBackupFailed |
Major |
The backup files fail to be exported or uploaded. |
Submit a service ticket to the O&M personnel. |
Data cannot be backed up. |
|
Faulty DB instance |
NoSQLFaultyDBInstance |
Major |
This event was a key alarm event and is reported when an instance was faulty due to a disaster or a server failure. |
Submit a service ticket. |
The database service may be unavailable. |
|
DB instance recovered |
NoSQLDBInstanceRecovered |
Major |
If a disaster occurs, NoSQL provides an HA tool to automatically or manually rectify the fault. After the fault is rectified, this event is reported. |
No action is required. |
None |
|
Faulty node |
NoSQLFaultyDBNode |
Major |
This event is a key alarm event and is reported when a database node is faulty due to a disaster or a server failure. |
Check whether the database service is available and submit a service ticket. |
The database service may be unavailable. |
|
Node recovered |
NoSQLDBNodeRecovered |
Major |
If a disaster occurs, NoSQL provides an HA tool to automatically or manually rectify the fault. After the fault is rectified, this event is reported. |
No action is required. |
None |
|
Primary/standby switchover or failover |
NoSQLPrimaryStandbySwitched |
Major |
This event is reported when a primary/standby switchover is performed or a failover is triggered. |
No action is required. |
None |
|
HotKey occurred |
HotKeyOccurs |
Major |
The primary key is improperly configured. As a result, hotspot data is distributed in one partition. The improper application design causes frequent read and write operations on a key. |
1. Choose a proper partition key. 2. Add service cache. The service application reads hotspot data from the cache first. |
The service request success rate is affected, and the cluster performance and stability also be affected. |
|
BigKey occurred |
BigKeyOccurs |
Major |
The primary key design is improper. The number of records or data in a single partition is too large, causing unbalanced node loads. |
1. Choose a proper partition key. 2. Add a new partition key for hashing data. |
As the data in the large partition increases, the cluster stability deteriorates. |
|
Insufficient storage space |
NoSQLRiskyDataDiskUsage |
Major |
The storage space is insufficient. |
Scale up storage space. For details, see section "Scaling Up Storage Space" in the corresponding user guide. |
The instance is set to read-only and data cannot be written to the instance. |
|
Data disk expanded and being writable |
NoSQLDataDiskUsageRecovered |
Major |
The capacity of a data disk has been expanded and the data disk becomes writable. |
No operation is required. |
None |
|
Index creation failed |
NoSQLCreateIndexFailed |
Major |
The service load exceeds what the instance specifications can take. In this case, creating indexes consumes more instance resources. As a result, the response is slow or even frame freezing occurs, and the creation times out. |
Select the matched instance specifications based on the service load. Create indexes during off-peak hours. Create indexes in the background. Select indexes as required. |
The index fails to be created or is incomplete. As a result, the index is invalid. Delete the index and create an index. |
|
Write speed decreased |
NoSQLStallingOccurs |
Major |
The write speed is fast, which is close to the maximum write capability allowed by the cluster scale and instance specifications. As a result, the flow control mechanism of the database is triggered, and requests may fail. |
1. Adjust the cluster scale or node specifications based on the maximum write rate of services. 2. Measures the maximum write rate of services. |
The success rate of service requests is affected. |
|
Data write stopped |
NoSQLStoppingOccurs |
Major |
The data write is too fast, reaching the maximum write capability allowed by the cluster scale and instance specifications. As a result, the flow control mechanism of the database is triggered, and requests may fail. |
1. Adjust the cluster scale or node specifications based on the maximum write rate of services. 2. Measures the maximum write rate of services. |
The success rate of service requests is affected. |
|
Database restart failed |
NoSQLRestartDBFailed |
Major |
The instance status is abnormal. |
Submit a service ticket to the O&M personnel. |
The DB instance status may be abnormal. |
|
Restoration to new DB instance failed |
NoSQLRestoreToNewInstanceFailed |
Major |
The underlying resources are insufficient. |
Submit a service order to ask the O&M personnel to coordinate resources in the background and add new nodes. |
Data cannot be restored to a new DB instance. |
|
Restoration to existing DB instance failed |
NoSQLRestoreToExistInstanceFailed |
Major |
The backup file fails to be downloaded or restored. |
Submit a service ticket to the O&M personnel. |
The current DB instance may be unavailable. |
|
Backup file deletion failed |
NoSQLDeleteBackupFailed |
Major |
The backup files fail to be deleted from OBS. |
Delete the backup files again. |
None |
|
Failed to enable Show Original Log |
NoSQLSwitchSlowlogPlainTextFailed |
Major |
The DB engine does not support this function. |
Refer to the GaussDB NoSQL User Guide to ensure that the DB engine supports Show Original Log. Submit a service ticket to the O&M personnel. |
None |
|
EIP binding failed |
NoSQLBindEipFailed |
Major |
The node status is abnormal, an EIP has been bound to the node, or the EIP to be bound is invalid. |
Check whether the node is normal and whether the EIP is valid. |
The DB instance cannot be accessed from the Internet. |
|
EIP unbinding failed |
NoSQLUnbindEipFailed |
Major |
The node status is abnormal or the EIP has been unbound from the node. |
Check whether the node and EIP status are normal. |
None |
|
Parameter modification failed |
NoSQLModifyParameterFailed |
Major |
The parameter value is invalid. |
Check whether the parameter value is within the valid range and submit a service ticket to the O&M personnel. |
None |
|
Parameter group application failed |
NoSQLApplyParameterGroupFailed |
Major |
The instance status is abnormal. As a result, the parameter group cannot be applied. |
Submit a service ticket to the O&M personnel. |
None |
|
Failed to enable or disable SSL |
NoSQLSwitchSSLFailed |
Major |
Enabling or disabling SSL times out. |
Try again or submit a service ticket. Do not change the connection mode. |
The connection mode cannot be changed. |
Event Source |
Event Name |
Event ID |
Event Severity |
Description |
Solution |
Impact |
---|---|---|---|---|---|---|
GaussDB(for MySQL) |
Incremental backup failure |
TaurusIncrementalBackupInstanceFailed |
Major |
The network between the instance and the management plane (or the OBS) is disconnected, or the backup environment created for the instance is abnormal. |
Submit a service ticket. |
Backup jobs fail. |
Read replica creation failure |
addReadonlyNodesFailed |
Major |
The quota is insufficient or underlying resources are exhausted. |
Check the read replica quota. Release resources and create read replicas again. |
Read replicas fail to be created. |
|
DB instance creation failure |
createInstanceFailed |
Major |
The instance quota or underlying resources are insufficient. |
Check the instance quota. Release resources and create instances again. |
DB instances fail to be created. |
|
Read replica promotion failure |
activeStandBySwitchFailed |
Major |
The read replica fails to be promoted to the primary node due to network or server failures. The original primary node takes over services quickly. |
Submit a service ticket. |
The read replica fails to be promoted to the primary node. |
|
Instance specifications change failure |
flavorAlterationFailed |
Major |
The quota is insufficient or underlying resources are exhausted. |
Submit a service ticket. |
Instance specifications fail to be changed. |
|
Faulty DB instance |
TaurusInstanceRunningStatusAbnormal |
Major |
The instance process is faulty or the communications between the instance and the DFV storage are abnormal. |
Submit a service ticket. |
Services may be affected. |
|
DB instance recovered |
TaurusInstanceRunningStatusRecovered |
Major |
The instance is recovered. |
Observe the service running status. |
None |
|
Faulty node |
TaurusNodeRunningStatusAbnormal |
Major |
The node process is faulty or the communications between the node and the DFV storage are abnormal. |
Observe the instance and service running statuses. |
A read replica may be promoted to the primary node. |
|
Node recovered |
TaurusNodeRunningStatusRecovered |
Major |
The node is recovered. |
Observe the service running status. |
None |
|
Read replica deletion failure |
TaurusDeleteReadOnlyNodeFailed |
Major |
The communications between the management plane and the read replica are abnormal or the VM fails to be deleted from IaaS. |
Submit a service ticket. |
Read replicas fail to be deleted. |
|
Password reset failure |
TaurusResetInstancePasswordFailed |
Major |
The communications between the management plane and the instance are abnormal or the instance is abnormal. |
Check the instance status and try again. If the fault persists, submit a service ticket. |
Passwords fail to be reset for instances. |
|
DB instance reboot failure |
TaurusRestartInstanceFailed |
Major |
The network between the management plane and the instance is abnormal or the instance is abnormal. |
Check the instance status and try again. If the fault persists, submit a service ticket. |
Instances fail to be rebooted. |
|
Restoration to new DB instance failure |
TaurusRestoreToNewInstanceFailed |
Major |
The instance quota is insufficient, underlying resources are exhausted, or the data restoration logic is incorrect. |
If the new instance fails to be created, check the instance quota, release resources, and try to restore to a new instance again. In other cases, submit a service ticket. |
Backup data fails to be restored to new instances. |
|
EIP binding failure |
TaurusBindEIPToInstanceFailed |
Major |
The binding task fails. |
Submit a service ticket. |
EIPs fail to be bound to instances. |
|
EIP unbinding failure |
TaurusUnbindEIPFromInstanceFailed |
Major |
The unbinding task fails. |
Submit a service ticket. |
EIPs fail to be unbound from instances. |
|
Parameter modification failure |
TaurusUpdateInstanceParameterFailed |
Major |
The network between the management plane and the instance is abnormal or the instance is abnormal. |
Check the instance status and try again. If the fault persists, submit a service ticket. |
Instance parameters fail to be modified. |
|
Parameter template application failure |
TaurusApplyParameterGroupToInstanceFailed |
Major |
The network between the management plane and instances is abnormal or the instances are abnormal. |
Check the instance status and try again. If the fault persists, submit a service ticket. |
Parameter templates fail to be applied to instances. |
|
Full backup failure |
TaurusBackupInstanceFailed |
Major |
The network between the instance and the management plane (or the OBS) is disconnected, or the backup environment created for the instance is abnormal. |
Submit a service ticket. |
Backup jobs fail. |
|
Primary/standby failover |
TaurusActiveStandbySwitched |
Major |
When the network, physical machine, or database of the primary node is faulty, the system promotes a read replica to primary based on the failover priority to ensure service continuity. |
|
During the failover, database connection is interrupted for a short period of time. After the failover is complete, you can reconnect to the database. |
|
数据库设置为只读模式 |
NodeReadonlyMode |
重要 |
数据库设置为只读状态,只支持查询类操作。 |
联系数据库技术支持团队处理。 |
数据库设置只读状态后,所有写业务返回失败。 |
|
数据库设置为读写模式 |
NodeReadWriteMode |
重要 |
数据库设置为读写状态 |
无 |
无 |
Event Source |
Event Name |
Event ID |
Event Severity |
Description |
Solution |
Impact |
---|---|---|---|---|---|---|
GaussDB(for openGauss) |
Process status alarm |
ProcessStatusAlarm |
Major |
Key processes exit, including: CMS/CMA, ETCD, GTM, CN, or DN process. |
Wait until the process is automatically recovered or a primary/standby failover is automatically performed. Check whether services are recovered. If no, contact SRE engineers. |
If processes on primary nodes are faulty, services are interrupted and then rolled back. If processes on standby nodes are faulty, services are not affected. |
Component status alarm |
ComponentStatusAlarm |
Major |
Key components do not respond, including: CMA, ETCD, GTM, CN, or DN component. |
Wait until the process is automatically recovered or a primary/standby failover is automatically performed. Check whether services are recovered. If no, contact SRE engineers. |
If processes on primary nodes do not respond, neither do the services. If processes on standby nodes are faulty, services are not affected. |
|
Cluster status alarm |
ClusterStatusAlarm |
Major |
The cluster status is abnormal. For example, the cluster is read-only; majority of ETCDs are faulty; or the cluster resources are unevenly distributed. |
Contact SRE engineers. |
If the cluster status is read-only, only read services are processed. If the majority of ETCDs are fault, the cluster is unavailable. If resources are unevenly distributed, the instance performance and reliability deteriorate. |
|
Hardware resource alarm |
HardwareResourceAlarm |
Major |
A major hardware fault occurs in the instance, such as disk damage or GTM network fault. |
Contact SRE engineers. |
Some or all services are affected. |
|
Status transition alarm |
StateTransitionAlarm |
Major |
The following events occur in the instance: DN build failure, forcible DN promotion, primary/standby DN switchover/failover, or primary/standby GTM switchover/failover. |
Wait until the fault is automatically rectified and check whether services are recovered. If no, contact SRE engineers. |
Some services are interrupted. |
|
Other abnormal alarm |
OtherAbnormalAlarm |
Major |
Disk usage threshold alarm |
Focus on service changes and scale up storage space as needed. |
If the used storage space exceeds the threshold, storage space cannot be scaled up. |
|
Faulty DB instance |
TaurusInstanceRunningStatusAbnormal |
Major |
This event is a key alarm event and is reported when an instance is faulty due to a disaster or a server failure. |
Submit a service ticket. |
The database service may be unavailable. |
|
DB instance recovered |
TaurusInstanceRunningStatusRecovered |
Major |
GaussDB(for openGauss) provides an HA tool for automated or manual rectification of faults. After the fault is rectified, this event is reported. |
No further action is required. |
None |
|
Faulty DB node |
TaurusNodeRunningStatusAbnormal |
Major |
This event is a key alarm event and is reported when a database node is faulty due to a disaster or a server failure. |
Check whether the database service is available and submit a service ticket. |
The database service may be unavailable. |
|
DB node recovered |
TaurusNodeRunningStatusRecovered |
Major |
GaussDB(for openGauss) provides an HA tool for automated or manual rectification of faults. After the fault is rectified, this event is reported. |
No further action is required. |
None |
|
DB instance creation failure |
GaussDBV5CreateInstanceFailed |
Major |
Instances fail to be created because the quota is insufficient or underlying resources are exhausted. |
Release the instances that are no longer used and try to provision them again, or submit a service ticket to adjust the quota. |
DB instances cannot be created. |
|
Node adding failure |
GaussDBV5ExpandClusterFailed |
Major |
The underlying resources are insufficient. |
Submit a service ticket. The O&M personnel will coordinate resources in the background, and then you delete the node that failed to be added and add a new node. |
None |
|
Storage scale-up failure |
GaussDBV5EnlargeVolumeFailed |
Major |
The underlying resources are insufficient. |
Submit a service ticket. The O&M personnel will coordinate resources in the background and then you scale up the storage space again. |
Services may be interrupted. |
|
Reboot failure |
GaussDBV5RestartInstanceFailed |
Major |
The network is abnormal. |
Retry the reboot operation or submit a service ticket to the O&M personnel. |
The database service may be unavailable. |
|
Full backup failure |
GaussDBV5FullBackupFailed |
Major |
The backup files fail to be exported or uploaded. |
Submit a service ticket to the O&M personnel. |
Data cannot be backed up. |
|
Differential backup failure |
GaussDBV5DifferentialBackupFailed |
Major |
The backup files fail to be exported or uploaded. |
Submit a service ticket to the O&M personnel. |
Data cannot be backed up. |
|
Backup deletion failure |
GaussDBV5DeleteBackupFailed |
Major |
This function does not need to be implemented. |
N/A |
N/A |
|
EIP binding failure |
GaussDBV5BindEIPFailed |
Major |
The EIP is bound to another resource. |
Submit a service ticket to the O&M personnel. |
The instance cannot be accessed from the Internet. |
|
EIP unbinding failure |
GaussDBV5UnbindEIPFailed |
Major |
The network is faulty or EIP is abnormal. |
Unbind the IP address again or submit a service ticket to the O&M personnel. |
IP addresses may be residual. |
|
Parameter template application failure |
GaussDBV5ApplyParamFailed |
Major |
Modifying a parameter template times out. |
Modify the parameter template again. |
None |
|
Parameter modification failure |
GaussDBV5UpdateInstanceParamGroupFailed |
Major |
Modifying a parameter template times out. |
Modify the parameter template again. |
None |
|
Backup and restoration failure |
GaussDBV5RestoreFromBcakupFailed |
Major |
The underlying resources are insufficient or backup files fail to be downloaded. |
Submit a service ticket. |
The database service may be unavailable during the restoration failure. |
Event Source |
Event Name |
Event ID |
Event Severity |
Description |
Solution |
Impact |
---|---|---|---|---|---|---|
DDM |
Failed to create a DDM instance |
createDdmInstanceFailed |
Major |
The underlying resources are insufficient. |
Release resources and create the instance again. |
DDM instances cannot be created. |
Failed to change class of a DDM instance |
resizeFlavorFailed |
Major |
The underlying resources are insufficient. |
Submit a service ticket to the O&M personnel to coordinate resources and try again. |
Services on some nodes are interrupted. |
|
Failed to scale out a DDM instance |
enlargeNodeFailed |
Major |
The underlying resources are insufficient. |
Submit a service ticket to the O&M personnel to coordinate resources, delete the node that fails to be added, and add a node again. |
The instance fails to be scaled out. |
|
Failed to scale in a DDM instance |
reduceNodeFailed |
Major |
The underlying resources fail to be released. |
Submit a service ticket to the O&M personnel to release resources. |
The instance fails to be scaled in. |
|
Failed to restart a DDM instance |
restartInstanceFailed |
Major |
The DB instances associated are abnormal. |
Check whether DB instances associated are normal. If the instances are normal, submit a service ticket to the O&M personnel. |
Services on some nodes are interrupted. |
|
Failed to create a schema |
createLogicDbFailed |
Major |
The possible causes are as follows:
|
Check the following items:
|
Services cannot run properly. |
|
Failed to bind an EIP |
bindEipFailed |
Major |
The EIP is abnormal. |
Try again later. In case of emergency, contact O&M personnel to rectify the fault. |
The DDM instance cannot be accessed from the Internet. |
|
Failed to scale out a schema |
migrateLogicDbFailed |
Major |
The underlying resources fail to be processed. |
Submit a service ticket to the O&M personnel. |
The schema cannot be scaled out. |
|
Failed to re-scale out a schema |
retryMigrateLogicDbFailed |
Major |
The underlying resources fail to be processed. |
Submit a service ticket to the O&M personnel. |
The schema cannot be scaled out. |
Event Source |
Event Name |
Event ID |
Event Severity |
Description |
Solution |
Impact |
---|---|---|---|---|---|---|
CPH |
Server shutdown |
cphServerOsShutdown |
Major |
The cloud phone server was shut down
|
Deploy service applications in HA mode. After the fault is rectified, check whether services recover. |
Services are interrupted. |
Server abnormal shutdown |
cphServerShutdown |
Major |
The cloud phone server was shut down unexpectedly. Possible causes are as follows:
|
Deploy service applications in HA mode. After the fault is rectified, check whether services recover. |
Services are interrupted. |
|
Server reboot |
cphServerOsReboot |
Major |
The cloud phone server was rebooted
|
Deploy service applications in HA mode. After the fault is rectified, check whether services recover. |
Services are interrupted. |
|
Server abnormal reboot |
cphServerReboot |
Major |
The cloud phone server was rebooted unexpectedly. Possible causes are as follows:
|
Deploy service applications in HA mode. After the fault is rectified, check whether services recover. |
Services are interrupted. |
|
Network disconnection |
cphServerlinkDown |
Major |
The network where the cloud phone server was deployed was disconnected. Possible causes are as follows:
|
Deploy service applications in HA mode. After the fault is rectified, check whether services recover. |
Services are interrupted. |
|
PCIe error |
cphServerPcieError |
Major |
The PCIe device or main board on the cloud phone server was faulty. Possible causes are as follows:
|
Deploy service applications in HA mode. After the fault is rectified, check whether services recover. |
The network or disk read/write is affected. |
|
Disk error |
cphServerDiskError |
Major |
The disk on the cloud phone server was faulty. Possible causes are as follows:
|
Deploy service applications in HA mode. After the fault is rectified, check whether services recover. |
Data read/write services are affected, or the BMS cannot be started. |
|
Storage error |
cphServerStorageError |
Major |
The cloud phone server could not connect to EVS disks. Possible causes are as follows:
|
Deploy service applications in HA mode. After the fault is rectified, check whether services recover. |
Data read/write services are affected, or the BMS cannot be started. |
|
GPU offline |
cphServerGpuOffline |
Major |
GPU of the cloud phone server was loose and disconnected. |
Stop the cloud phone server and reboot it. |
Faults occur on cloud phones whose GPUs are disconnected. Cloud phones cannot run properly even if they are restarted or reconfigured. |
|
GPU timeout |
cphServerGpuTimeOut |
Major |
GPU of the cloud phone server timed out. |
Reboot the cloud phone server. |
Cloud phones whose GPUs timed out cannot run properly and are still faulty even if they are restarted or reconfigured. |
|
Disk space full |
cphServerDiskFull |
Major |
Disk space of the cloud phone server was used up. |
Clear the application data in the cloud phone to release space. |
Cloud phone is sub-healthy, prone to failure, and unable to start. |
|
Disk readonly |
cphServerDiskReadOnly |
Major |
The disk of the cloud phone server became read-only. |
Reboot the cloud phone server. |
Cloud phone is sub-healthy, prone to failure, and unable to start. |
|
Cloud phone metadata damaged |
cphPhoneMetaDataDamage |
Major |
Cloud phone metadata was damaged. |
Contact Huawei O&M personnel. |
The cloud phone cannot run properly even if it is restarted or reconfigured. |
Event Source |
Event Name |
Event ID |
Event Severity |
Description |
Solution |
Impact |
---|---|---|---|---|---|---|
L2CG |
IP addresses conflicted |
IPConflict |
Major |
A cloud server and an on-premises server that need to communicate use the same IP address. |
Check the ARP and switch information to locate the servers that have the same IP address and change the IP address. |
The communications between the on-premises and cloud servers may be abnormal. |
Event Source |
Event Name |
Event ID |
Event Severity |
---|---|---|---|
VPC |
VPC deleted |
deleteVpc |
Major |
VPC modified |
modifyVpc |
Minor |
|
Subnet deleted |
deleteSubnet |
Minor |
|
Subnet modified |
modifySubnet |
Minor |
|
Bandwidth modified |
modifyBandwidth |
Minor |
|
VPN deleted |
deleteVpn |
Major |
|
VPN modified |
modifyVpn |
Minor |
Event Source |
Event Name |
Event ID |
Event Severity |
---|---|---|---|
EVS |
Disk updated |
updateVolume |
Minor |
Disk expanded |
extendVolume |
Minor |
|
Disk deleted |
deleteVolume |
Major |
Event Source |
Event Name |
Event ID |
Event Severity |
---|---|---|---|
IAM |
Login |
login |
Minor |
Logout |
logout |
Minor |
|
Password changed |
changePassword |
Major |
|
User created |
createUser |
Minor |
|
User deleted |
deleteUser |
Major |
|
User updated |
updateUser |
Minor |
|
User group created |
createUserGroup |
Minor |
|
User group deleted |
deleteUserGroup |
Major |
|
User group updated |
updateUserGroup |
Minor |
|
Identity provider created |
createIdentityProvider |
Minor |
|
Identity provider deleted |
deleteIdentityProvider |
Major |
|
Identity provider updated |
updateIdentityProvider |
Minor |
|
Metadata updated |
updateMetadata |
Minor |
|
Security policy updated |
updateSecurityPolicies |
Major |
|
Credential added |
addCredential |
Major |
|
Credential deleted |
deleteCredential |
Major |
|
Project created |
createProject |
Minor |
|
Project updated |
updateProject |
Minor |
|
Project suspended |
suspendProject |
Major |
Event Source |
Event Name |
Event ID |
Event Severity |
---|---|---|---|
KMS |
Key disabled |
disableKey |
Major |
Key deletion scheduled |
scheduleKeyDeletion |
Minor |
|
Grant retired |
retireGrant |
Major |
|
Grant revoked |
revokeGrant |
Major |
Event Source |
Event Name |
Event ID |
Event Severity |
---|---|---|---|
OBS |
Bucket deleted |
deleteBucket |
Major |
Bucket policy deleted |
deleteBucketPolicy |
Major |
|
Bucket ACL configured |
setBucketAcl |
Minor |
|
Bucket policy configured |
setBucketPolicy |
Minor |
Event Source |
Event Name |
Event ID |
Event Severity |
Description |
Solution |
Impact |
---|---|---|---|---|---|---|
DCS |
Full synchronization during online migration retry |
migrationFullResync |
Minor |
If online migration fails, full synchronization will be triggered because incremental synchronization cannot be performed. |
Monitor the service volume and bandwidth usage. If the bandwidth usage is high and affects the service, manually stop the migration as required. |
If the data volume is large, full synchronization may cause bandwidth usage to spike. |
Redis master/replica switchover |
masterStandbyFailover |
Minor |
The master node was abnormal, promoting a replica to master. |
Check the original master node and rectify the fault. |
None |
|
Memcached master/standby switchover |
memcachedMasterStandbyFailover |
Minor |
The master node was abnormal, promoting the standby node to master. |
Check the original master node and rectify the fault. |
None |
|
Redis server exception |
redisNodeStatusAbnormal |
Major |
The Redis server status was abnormal. |
Check the Redis server status. |
The instance may become unavailable. |
|
Redis server recovered |
redisNodeStatusNormal |
Major |
The Redis server status recovered. |
None |
None |
|
Synchronization failure in data migration |
migrateSyncDataFail |
Major |
Online migration failed. |
Check the network and the ECS service. If the ECS service is abnormal, a migration ECS cannot be created. |
Data cannot be synchronized. |
|
Memcached instance abnormal |
memcachedInstanceStatusAbnormal |
Major |
The Memcached node status was abnormal. |
Check the Memcached node status. |
The instance may become unavailable. |
|
Memcached instance recovered |
memcachedInstanceStatusNormal |
Major |
The Memcached node status recovered. |
None |
None |
|
Instance backup failure |
instanceBackupFailure |
Major |
The DCS instance fails to be backed up due to an OBS access failure. |
Manually back up the instance again. |
None |
|
Instance node abnormal restart |
instanceNodeAbnormalRestart |
Major |
DCS nodes restarted unexpectedly when they became faulty. |
Check whether services are normal. |
Master/standby switchover may occur or access to Redis may fail. |
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.