Help Center/ Cloud Eye/ User Guide (Kuala Lumpur Region)/ Event Monitoring/ Events Supported by Event Monitoring
Updated on 2024-08-27 GMT+08:00

Events Supported by Event Monitoring

Table 1 Elastic Cloud Server (ECS)

Event Source

Event Name

Event ID

Event Severity

Description

Solution

Impact

ECS

Restart triggered due to hardware fault

startAutoRecovery

Major

ECSs on a faulty host would be automatically migrated to another properly-running host. During the migration, the ECSs was restarted.

Wait for the event to end and check whether services are affected.

Services may be interrupted.

Restart completed due to hardware failure

endAutoRecovery

Major

The ECS was recovered after the automatic migration.

This event indicates that the ECS has recovered and been working properly.

None

Auto recovery timeout (being processed on the backend)

faultAutoRecovery

Major

Migrating the ECS to a normal host timed out.

Migrate services to other ECSs.

Services are interrupted.

GPU link fault

GPULinkFault

Critical

The GPU of the host on which the ECS is located was faulty or was recovering from a fault.

Deploy service applications in HA mode.

After the GPU fault is rectified, check whether services are restored.

Services are interrupted.

ECS deleted

deleteServer

Major

The ECS was deleted

  • on the management console.
  • by calling APIs.

Check whether the deletion was performed intentionally by a user.

Services are interrupted.

ECS restarted

rebootServer

Minor

The ECS was restarted

  • on the management console.
  • by calling APIs.

Check whether the restart was performed intentionally by a user.

  • Deploy service applications in HA mode.
  • After the ECS starts up, check whether services recover.

Services are interrupted.

ECS stopped

stopServer

Minor

The ECS was stopped

  • on the management console.
  • by calling APIs.
NOTE:

The ECS is stopped only after CTS is enabled. For details, see Cloud Trace Service User Guide.

  • Check whether the restart was performed intentionally by a user.
  • Deploy service applications in HA mode.
  • After the ECS starts up, check whether services recover.

Services are interrupted.

NIC deleted

deleteNic

Major

The ECS NIC was deleted

  • on the management console.
  • by calling APIs.
  • Check whether the deletion was performed intentionally by a user.
  • Deploy service applications in HA mode.
  • After the NIC is deleted, check whether services recover.

Services may be interrupted.

ECS resized

resizeServer

Minor

The ECS specifications were resized

  • on the management console.
  • by calling APIs.
  • Check whether the operation was performed by a user.
  • Deploy service applications in HA mode.
  • After the ECS is resized, check whether services have recovered.

Services are interrupted.

GuestOS restarted

RestartGuestOS

Minor

The guest OS was restarted.

Contact O&M personnel.

Services may be interrupted.

ECS failure due to abnormal host processes

VMFaultsByHostProcessExceptions

Critical

The processes of the host accommodating the ECS were abnormal.

Contact O&M personnel.

The ECS is faulty.

Startup failure

faultPowerOn

Major

The ECS failed to start.

Start the ECS again. If the problem persists, contact O&M personnel.

The ECS cannot start.

Host breakdown risk

hostMayCrash

Major

The host where the ECS resides may break down, and the risk cannot be prevented through live migration due to some reasons.

Migrate services running on the ECS first and delete or stop the ECS. Start the ECS only after the O&M personnel eliminate the risk.

The host may break down, causing service interruption.

Scheduled migration completed

instance_migrate_completed

Major

Scheduled ECS migration is completed.

Wait until the ECSs become available and check whether services are affected.

Services may be interrupted.

Scheduled migration being executed

instance_migrate_executing

Major

ECSs are being migrated as scheduled.

Wait until the event is complete and check whether services are affected.

Services may be interrupted.

Scheduled migration canceled

instance_migrate_canceled

Major

Scheduled ECS migration is canceled.

None

None

Scheduled migration failed

instance_migrate_failed

Major

ECSs failed to be migrated as scheduled.

Contact O&M personnel.

Services are interrupted.

Scheduled migration to be executed

instance_migrate_scheduled

Major

ECSs will be migrated as scheduled.

Check the impact on services during the execution window.

None

Scheduled specification modification failed

instance_resize_failed

Major

Specifications failed to be modified as scheduled.

Contact O&M personnel.

Services are interrupted.

Scheduled specification modification completed

instance_resize_completed

Major

Scheduled specifications modification is completed.

None

None

Scheduled specification modification being executed

instance_resize_executing

Major

Specifications are being modified as scheduled.

Wait until the event is completed and check whether services are affected.

Services are interrupted.

Scheduled specification modification canceled

instance_resize_canceled

Major

Scheduled specifications modification is canceled.

None

None

Scheduled specification modification to be executed

instance_resize_scheduled

Major

Specifications will be modified as scheduled.

Check the impact on services during the execution window.

None

Scheduled redeployment to be executed

instance_redeploy_scheduled

Major

ECSs will be redeployed on new hosts as scheduled.

Check the impact on services during the execution window.

None

Scheduled restart to be executed

instance_reboot_scheduled

Major

ECSs will be restarted as scheduled.

Check the impact on services during the execution window.

None

Scheduled stop to be executed

instance_stop_scheduled

Major

ECSs will be stopped as scheduled as they are affected by underlying hardware or system O&M.

Check the impact on services during the execution window.

None

Live migration started

liveMigrationStarted

Major

The host where the ECS is located may be faulty. Live migrate the ECS in advance to prevent service interruptions caused by host breakdown.

Wait for the event to end and check whether services are affected.

Services may be interrupted for less than 1s.

Live migration completed

liveMigrationCompleted

Major

The live migration is complete, and the ECS is running properly.

Check whether services are running properly.

None

Live migration failure

liveMigrationFailed

Major

An error occurred during the live migration of an ECS.

Check whether services are running properly.

There is a low probability that services are interrupted.

ECC uncorrectable error alarm generated on GPU SRAM

SRAMUncorrectableEccError

Major

There are ECC uncorrectable errors generated on GPU SRAM.

If services are affected, submit a service ticket.

The GPU hardware may be faulty. As a result, the GPU memory is faulty, and services exit abnormally.

FPGA link fault

FPGALinkFault

Critical

The FPGA of the host on which the ECS is located was

  • faulty.
  • recovering from a fault.

Deploy service applications in HA mode.

After the FPGA fault is rectified, check whether services are restored.

Services are interrupted.

Scheduled redeployment to be authorized

instance_redeploy_inquiring

Major

As being affected by underlying hardware or system O&M, ECSs will be redeployed on new hosts as scheduled.

Authorize scheduled redeployment.

None

Local disk replacement canceled

localdisk_recovery_canceled

Major

Local disk failure

None

None

Local disk replacement to be executed

localdisk_recovery_scheduled

Major

Local disk failure

Check the impact on services during the execution window.

None

Xid event alarm generated on GPU

commonXidError

Major

A xid event alarm occurs on GPU.

If services are affected, submit a service ticket.

The GPU hardware, driver, and application problems lead to Xid events, which may lead to abnormal exit of the business.

nvidia-smi suspended

nvidiaSmiHangEvent

Major

nvidia-smi timed out.

If services are affected, submit a service ticket.

The driver may report an error during service running.

NPU: uncorrectable ECC error

UncorrectableEccErrorCount

Major

There are uncorrectable ECC errors generated on GPU SRAM.

If services are affected, replace the NPU with another one.

Services may be interrupted.

Scheduled redeployment canceled

instance_redeploy_canceled

Major

As being affected by underlying hardware or system O&M, ECSs will be redeployed on new hosts as scheduled.

None

None

Scheduled redeployment being executed

instance_redeploy_executing

Major

As being affected by underlying hardware or system O&M, ECSs will be redeployed on new hosts as scheduled.

Wait until the event is complete and check whether services are affected.

Services are interrupted.

Scheduled redeployment completed

instance_redeploy_completed

Major

As being affected by underlying hardware or system O&M, ECSs will be redeployed on new hosts as scheduled.

Wait until the redeployed ECSs are available and check whether services are affected.

None

Scheduled redeployment failed

instance_redeploy_failed

Major

As being affected by underlying hardware or system O&M, ECSs will be redeployed on new hosts as scheduled.

Contact O&M personnel.

Services are interrupted.

Local disk replacement to be authorized

localdisk_recovery_inquiring

Major

Local disks are faulty.

Authorize local disk replacement.

Local disks are unavailable.

Local disks being replaced

localdisk_recovery_executing

Major

Local disk failure

Wait until the local disks are replaced and check whether the local disks are available.

Local disks are unavailable.

Local disks replaced

localdisk_recovery_completed

Major

Local disk failure

Wait until the services are running properly and check whether local disks are available.

None

Local disk replacement failed

localdisk_recovery_failed

Major

Local disks are faulty.

Contact O&M personnel.

Local disks are unavailable.

Once a physical host running ECSs breaks down, the ECSs are automatically migrated to a functional physical host. During the migration, the ECSs will be restarted.

Table 2 Elastic IP (EIP)

Event Source

Event Name

Event ID

Event Severity

EIP

EIP released

deleteEip

Minor

Table 3 Elastic IP and bandwidth

Event Source

Event Name

Event ID

Event Severity

Elastic IP and bandwidth

VPC deleted

deleteVpc

Major

VPC modified

modifyVpc

Minor

Subnet deleted

deleteSubnet

Minor

Subnet modified

modifySubnet

Minor

Bandwidth modified

modifyBandwidth

Minor

VPN deleted

deleteVpn

Major

VPN modified

modifyVpn

Minor

Table 4 Elastic Volume Service (EVS)

Event Source

Event Name

Event ID

Event Severity

Description

Solution

Impact

EVS

Update disk

updateVolume

Minor

Update the name and description of an EVS disk.

No further action is required.

None

Expand disk

extendVolume

Minor

Expand an EVS disk.

No further action is required.

None

Delete disk

deleteVolume

Major

Delete an EVS disk.

No further action is required.

Deleted disks cannot be recovered.

QoS upper limit reached

reachQoS

Major

The I/O latency increases as the QoS upper limits of the disk are frequently reached and flow control triggered.

Change the disk type to one with a higher specification.

The current disk may fail to meet service requirements.

Table 5 Identity and Access Management (IAM)

Event Source

Event Name

Event ID

Event Severity

IAM

Login

login

Minor

Logout

logout

Minor

Password changed

changePassword

Major

User created

createUser

Minor

User deleted

deleteUser

Major

User updated

updateUser

Minor

User group created

createUserGroup

Minor

User group deleted

deleteUserGroup

Major

User group updated

updateUserGroup

Minor

Identity provider created

createIdentityProvider

Minor

Identity provider deleted

deleteIdentityProvider

Major

Identity provider updated

updateIdentityProvider

Minor

Metadata updated

updateMetadata

Minor

Security policy updated

updateSecurityPolicies

Major

Credential added

addCredential

Major

Credential deleted

deleteCredential

Major

Project created

createProject

Minor

Project updated

updateProject

Minor

Project suspended

suspendProject

Major

Table 6 Key Management Service (KMS)

Event Source

Event Name

Event ID

Event Severity

KMS

Key disabled

disableKey

Major

Key deletion scheduled

scheduleKeyDeletion

Minor

Grant retired

retireGrant

Major

Grant revoked

revokeGrant

Major

Table 7 Object Storage Service (OBS)

Event Source

Event Name

Event ID

Event Severity

OBS

Bucket deleted

deleteBucket

Major

Bucket policy deleted

deleteBucketPolicy

Major

Bucket ACL configured

setBucketAcl

Minor

Bucket policy configured

setBucketPolicy

Minor