One-Click Monitoring

Application Scenarios

One-click monitoring enables you to quickly and easily enable or disable monitoring of common events for certain services. Table 1 describes differences between one-click monitoring and common monitoring.

Table 1 One-click monitoring and common monitoring

Alarm Type

Objective

Scope

Alarm Object

Trigger Condition

One-click monitoring

When an event occurs, Cloud Eye triggers alarms immediately.

Advantages: The configuration is simple.

Key events of ECS, EIP, and RDS. For detailed events, see Supported Cloud Services and Alarm Rules.

Event monitoring

Immediate trigger

Common monitoring

Cloud Eye triggers alarms based on the preset alarm policies. For example, Cloud Eye triggers an alarm if the average CPU usage is 80% or more for five consecutive times within 5 minutes.

Advantages: Alarm policies are flexible and can be configured based on service requirements.

All services supported by Cloud Eye

  • Server monitoring
  • Cloud service monitoring
  • Custom monitoring
  • Website monitoring
  • Log monitoring

Accumulative trigger

When an event occurs, Cloud Eye triggers alarms based on the alarm policy.

Advantages: The configuration is flexible. Only event alarms are supported.

For details about services that support event monitoring, see Events Supported by Event Monitoring.

Event monitoring

Immediate trigger or accumulative trigger

This topic describes how to use the one-click monitoring function to monitor key metrics.

Constraints

  • One-click monitoring sends notifications only when alarms are generated and does not send notifications when alarms are cleared.
  • Once the alarm threshold is reached, one-click monitoring will trigger alarms immediately.

Procedure

  1. Log in to the management console.
  2. Under Management & Deployment, select Cloud Eye.
  3. In the navigation pane on the left, choose Alarm Management> One-Click Monitoring.
  4. Locate the target cloud service, and enable One-Click Monitoring.
    For details about the cloud services and alarm rules supported by one-click monitoring, see Supported Cloud Services and Alarm Rules.
    Figure 1 One-Click Monitoring
  5. Click the arrow to the left of the cloud service name to view the automatically generated alarm rules.

    The notification object of one-click monitoring rule is the account contact. Alarm notifications will be sent to the phone number or email address provided during registration.

    Figure 2 Viewing alarm rules

Supported Cloud Services and Alarm Rules

Table 2 ECS

Alarm Name

Alarm Policy

Description

Procedure

alarm-StartAutoRecovery

Elastic Cloud Server-Start auto recovery

Immediate trigger

When the host where the ECS resides becomes faulty, the system automatically migrates the ECS to a functional host. This process will cause the ECS to restart and send a "Start auto recovery" event. After the migration is complete and a "Stop auto recovery" event is sent, the ECS is restored.

"Start auto recovery" indicates that a fault has occurred and the ECS cannot be used. In this case, you need to replace the ECS or direct traffic to other ECSs.

alarm-EndAutoRecovery

Elastic Cloud Server-Stop auto recovery

Immediate trigger

This alarm indicates that the ECS is working properly and can be used again.

Table 3 Elastic IP and Bandwidth

Alarm Name

Alarm Policy

Event Description

Procedure

alarm-BlockEIP

Elastic IP-EIP blocked

Immediate trigger

If the bandwidth usage exceeds 5 Gbit/s, the traffic will be discarded. This indicates that the bandwidth usage exceeds the threshold or the system experiences attacks (generally DDoS attacks).

An event will be received when the EIP is unblocked.

Change the EIP to prevent services from being affected. In addition. Check the root cause and rectify the fault.

alarm-UnblockEIP

Elastic IP-EIP unblocked

Immediate trigger

Use the unblocked EIP again to avoid a waste of resources.

alarm-EIPBandwidthOverflow

Elastic IP-EIP bandwidth overflow

Immediate trigger

If this event is reported, the data traffic exceeds the purchased bandwidth, which may decrease your network speed or cause packet loss.

Check whether the EIP data traffic continues to increase and whether services are normal. Increase the bandwidth if required.

Table 4 RDS

Alarm Name

Alarm Policy

Event Description

Procedure

alarm-CreateInstanceFailed

Relational Database Service-DB instance creation failure

Immediate trigger

DB instance creation failed because of insufficient disks or quota, or underlying resources have been used up.

Check the number and quota of disks. Release resources and create DB instances again.

alarm-FullBackupFailed

Relational Database Service-Full backup failure

Immediate trigger

Full backup failed. A single full backup failure does not affect the files that have been successfully backed up, but prolong the incremental backup time during the point-in-time restore (PITR).

Create a manual backup again.

alarm-ActiveStandBySwitchFailed

Relational Database Service-Primary/standby switchover failure

Immediate trigger

The standby DB instance does not take over services from the primary DB instance due to network or server failures. The original primary DB instance continues to provide services within a short time.

Check whether the connection between the application and the database is re-established.

alarm-AbnormalReplicationStatus

Relational Database Service-Replication status abnormal

Immediate trigger

The replication delay between the primary and standby DB instances is too long (usually occurs when a large amount of data is written to databases or a large transaction is performed). During off-peak hours, the replication delay between the primary and standby DB instances gradually decreases. Another possible cause is that the network between the primary and standby DB instances is interrupted. However, the network interruption does not interrupt data reads from or writes into a single DB instance, and customers' applications are unaware of the interruption.

Submit a service ticket for processing.

alarm-FaultyDBInstance

Relational Database Service-DB instance faulty

Immediate trigger

A single or primary DB instance is faulty due to a disaster or a server failure. This event is critical and may cause the database service to be unavailable.

Check whether an automated backup policy has been configured for the DB instance and submit a service ticket for processing.

alarm-SingleToHAFailed

Relational Database Service-Failure of changing single DB instance to primary/standby

Immediate trigger

When the standby DB instance is created or after the standby DB instance is created, the configuration synchronization between the primary DB instance and the standby DB instance is faulty. Generally, the fault is caused by insufficient resources of the data center where the standby DB instance is located. This event does not interrupt the data reads and writes of the original single DB instance, and customers' applications are unaware of this event.

Submit a service ticket for processing.

alarm-ReplicationStatusRecovered

Relational Database Service-Replication status recovered

Immediate trigger

The replication delay between the primary and standby DB instances has been restored to the normal range, or the network connection between them has been restored.

No action is required.

alarm-DBInstanceRecovered

Relational Database Service-DB instance recovered

Immediate trigger

RDS uses high availability tools to rebuild the standby DB instance for disaster recovery. After the recovery, this event will be reported.

No action is required.