Updated on 2024-10-18 GMT+08:00

Overview

Huawei Cloud can predict and proactively prevent hardware or software faults of hosts accommodating ECSs.

If host failures cannot be avoided, the system will generate and report events for affected ECSs to minimize impacts of instance unavailability or performance deterioration. These events include instance redeployment and local disk replacement. For details, see Event Type. The system does not frequently report events.

You can view events details on the ECS console, including the event type, instance name/ID, and event status. You can also check ECS events details on the Event Monitoring page on the Cloud Eye console. For details, see Viewing Event Monitoring Data.

Event Type

Table 1 describes events that can be reported by the system.

Table 1 Events

Event Type

Generated When

Impact

Handling Suggestion

Instance redeployment

The system detects that the host accommodating ECSs is faulty and it plans to deploy the ECSs on a new host.

During the instance redeployment, ECSs will be temporarily unavailable for a short period of time.

The system will send the event notification 24 to 72 hours earlier than the scheduled execution time.

NOTICE:

For ECSs using local disks, all data stored on the local disks will be lost.

Refer to the following to rectify the fault. After the fault is rectified, check the impacts on services. If any problems occur, contact technical support.

Handling an Instance Redeployment Event

You are advised to select off-peak time as the scheduled start time during authorization. If you do not specify the start time, the current time is used as the start time by default.

Local disk replacement

The system detects that a disk of the host accommodating ECSs (including bare metal ECSs) is faulty.

Local disk replacement will cause data loss on local disks.

Refer to the following to rectify the fault. After the fault is rectified, check the impacts on services. If any problems occur, contact technical support.

NOTICE:

Local disk replacement will cause data loss on local disks. If you do not need to retain data on local disks, use one of the following methods:

  • Redeployment: All local disk data will be lost.
  • Authorizing Disk Replacement: Only data on the faulty local disk will be lost.

    You are advised to select off-peak time as the scheduled start time during authorization. If you do not specify the start time, the current time is used as the start time by default.

    The local disk replacement will be completed within five working days generally after it is started. Please wait patiently.

Instance migration

The system detects that the host accommodating ECSs is faulty and needs to be restarted, stopped, or brought offline, and it plans to migrate ECSs.

The system attempts to perform a live migration of ECSs first. The HA mechanism will be triggered if an exception occurs (ECSs will be unavailable temporarily during this period).

After the fault is rectified, check the impacts on services. If any problems occur, contact technical support.

System maintenance

The system detects that there are hardware or software faults in the host accommodating ECSs (including bare metal ECSs) and plans to perform maintenance operations on the affected instances.

During system maintenance, the host may be powered off, and ECSs running on it become unavailable.

Refer to the following to rectify the fault. After the fault is rectified, check the impacts on services. If any problems occur, contact technical support.

Handling a System Maintenance Event

Ensure that services running on the instances have been stopped and select an off-peak time as the scheduled start time during authorization. If you do not specify the start time, the current time is used as the start time by default.

The duration required for system maintenance varies depending on the faults. The system maintenance will be completed within five working days generally after the authorization is started. Please wait patiently.

Event Status

Table 2 lists statuses of the events reported by the system. You can check progresses of the events and filter events by status.

Table 2 Event statuses

Type

Description

Pending authorization

An event is waiting to be authorized with the start time specified. The system will complete operations within a specified time. For details, see Handling an Event.

To be executed

The event is waiting for the system to schedule resources.

Executing

The system has scheduled resources and is rectifying the fault.

Execution succeeded

The system has completed event execution. Check the impacts on services. If any problems occur, contact technical support.

Execution failed

The system fails to automatically rectify the fault.

Canceled

The event has been canceled.

The event status changes with the operations performed by users and the system.

Figure 1 Event statuses