Updated on 2024-05-20 GMT+08:00

Events Supported by Event Monitoring

Table 1 Events Supported by Event Monitoring for GeminiDB

Event Source

Event Name

Event ID

Event Severity

Description

Solution

Impact

NoSQL

Instance creation failure

NoSQLCreateInstanceFailed

Major

The instance quota or underlying resources are insufficient.

Release the instances that are no longer used and try to provision them again, or submit a service ticket to adjust the quota.

Instances fail to be created.

Specifications change failure

NoSQLResizeInstanceFailed

Major

The underlying resources are insufficient.

Submit a service ticket to ask O&M personnel to coordinate resources, and then try again.

Services are interrupted.

Node adding failure

NoSQLAddNodesFailed

Major

The underlying resources are insufficient.

Submit a service ticket to ask O&M personnel to coordinate resources, delete the node that failed to be added, and add a new one.

None

Node deletion failure

NoSQLDeleteNodesFailed

Major

Releasing underlying resources failed.

Delete the node again.

None

Storage space scale-up failure

NoSQLScaleUpStorageFailed

Major

The underlying resources are insufficient.

Submit a service ticket to ask O&M personnel to coordinate resources, and then try again.

Services may be interrupted.

Password resetting failure

NoSQLResetPasswordFailed

Major

Resetting the password times out.

Reset the password again.

None

Parameter template change failure

NoSQLUpdateInstanceParamGroupFailed

Major

Changing a parameter template times out.

Change the parameter template again.

None

Backup policy configuration failure

NoSQLSetBackupPolicyFailed

Major

The database connection is abnormal.

Configure the backup policy again.

None

Manual backup creation failure

NoSQLCreateManualBackupFailed

Major

The backup files fail to be exported or uploaded.

Submit a service ticket to O&M personnel.

Data cannot be backed up.

Automated backup creation failure

NoSQLCreateAutomatedBackupFailed

Major

The backup files fail to be exported or uploaded.

Submit a service ticket to O&M personnel.

Data cannot be backed up.

Instance status abnormal

NoSQLFaultyDBInstance

Major

This event is a key alarm event and is reported when an instance is faulty due to a disaster or a server failure.

Submit a service ticket.

The database service may be unavailable.

Instance status recovery

NoSQLDBInstanceRecovered

Major

If a disaster occurs, NoSQL provides an HA tool to automatically or manually rectify the fault. After the fault is rectified, this event is reported.

No further action is required.

None

Node status abnormal

NoSQLFaultyDBNode

Major

This event is a key alarm event and is reported when a database node is faulty due to a disaster or a server failure.

Check whether the database service is available and submit a service ticket.

The database service may be unavailable.

Node status recovery

NoSQLDBNodeRecovered

Major

If a disaster occurs, NoSQL provides an HA tool to automatically or manually rectify the fault. After the fault is rectified, this event is reported.

No further action is required.

None

Primary/standby switchover or failover

NoSQLPrimaryStandbySwitched

Major

This event is reported when a primary/secondary switchover or a failover is triggered.

No further action is required.

None

Occurrence of hotspot partitioning keys

HotKeyOccurs

Major

Hotspot data is stored in one partition because the primary key is improper. Improper application design causes frequent read and write operations on a key.

1. Choose a proper partition key.

2. Add service cache so that service applications read hotspot data from the cache first.

The service request success rate is affected, and the cluster performance and stability deteriorates.

BigKey occurrence

BigKeyOccurs

Major

The primary key design is improper. There are too many records or too much data in a single partition, causing load imbalance on nodes.

1. Choose a proper partition key.

2. Add a new partition key for hashing data.

As more and more data is stored in the partition, cluster stability deteriorates.

Insufficient storage space

NoSQLRiskyDataDiskUsage

Major

The storage space is insufficient.

Scale up storage space. For details, see section "Scaling Up Storage Space" in the user guide of GeminiDB.

The instance is set to read-only and data cannot be written to the instance.

Data disk expanded and being writable

NoSQLDataDiskUsageRecovered

Major

The data disk has been expanded and becomes writable.

No further action is required.

None

Index creation failure

NoSQLCreateIndexFailed

Major

The service load exceeds what the instance specifications can take. In this case, creating indexes consumes more instance resources. As a result, the response is slow or even frame freezing occurs, and the creation times out.

Select matched instance specifications based on service load.

Create indexes during off-peak hours.

Create indexes in the background.

Select indexes as required.

The index fails to be created or is incomplete. Delete the index and create a new one.

Write speed decrease

NoSQLStallingOccurs

Major

The write speed is close to the maximum write speed allowed by the cluster scale and instance specifications. As a result, the database flow control mechanism is triggered, and requests may fail.

1. Adjust the cluster scale or node specifications based on the maximum write rate of services.

2. Measure the maximum write rate of services.

The success rate of service requests is affected.

Data write stopped

NoSQLStoppingOccurs

Major

The data write is too fast, reaching the maximum write capability allowed by the cluster scale and instance specifications. As a result, the database flow control mechanism is triggered, and requests may fail.

1. Change the cluster scale or node specifications based on the maximum write rate of services.

2. Measure the maximum write rate of services.

The success rate of service requests is affected.

Database restart failure

NoSQLRestartDBFailed

Major

The instance status is abnormal.

Submit a service ticket to O&M personnel.

The instance status may be abnormal.

Restoration to new instance failure

NoSQLRestoreToNewInstanceFailed

Major

The underlying resources are insufficient.

Submit a service ticket to ask O&M personnel to coordinate resources, and then add new nodes.

Data cannot be restored to a new instance.

Restoration to existing instance failure

NoSQLRestoreToExistInstanceFailed

Major

The backup file fails to be downloaded or restored.

Submit a service ticket to O&M personnel.

The current instance may be unavailable.

Backup file deletion failure

NoSQLDeleteBackupFailed

Major

The backup files fail to be deleted from OBS.

Delete the backup files again.

None

Failure to display slow query logs in plaintext

NoSQLSwitchSlowlogPlainTextFailed

Major

The DB API does not support this function.

Refer to the GeminiDB User Guide to check whether that the DB API supports the display of slow query logs in plaintext. Submit a service ticket to O&M personnel.

None

EIP binding failure

NoSQLBindEipFailed

Major

The node status is abnormal, an EIP has been bound to the node, or the EIP to be bound is invalid.

Check whether the node is normal and whether the EIP is valid.

The instance cannot be accessed from a public network.

EIP unbinding failure

NoSQLUnbindEipFailed

Major

The node status is abnormal or the EIP has been unbound from the node.

Check whether the node and EIP status are normal.

None

Parameter modification failure

NoSQLModifyParameterFailed

Major

The parameter value is invalid.

Check whether the parameter value is within the valid range and submit a service ticket to O&M personnel.

None

Parameter template application failure

NoSQLApplyParameterGroupFailed

Major

The instance status is abnormal. So, the parameter template cannot be applied.

Submit a service ticket to O&M personnel.

None

Enabling or disabling SSL failure

NoSQLSwitchSSLFailed

Major

Enabling or disabling SSL times out.

Try again or submit a service ticket. Do not change the connection mode.

The SSL connection mode cannot be changed.

Too much data in a single row

LargeRowOccurs

Major

If there is too much data in a single row, queries may time out, causing faults like OOM error.

1. Limit the write length of each column and row so that the key and value length of each row does not exceed the preset threshold.

2. Check whether there are abnormal writes or coding, causing large rows.

If there are too many records in a single row, cluster stability will deteriorate as the data volume increases.