CCE Autopilot Cluster Events
CCE Autopilot can report a range of events to AOM when a cluster is running. You can add event alarms as needed to monitor the health of cluster data plane and control plane components. This helps you quickly identify and resolve problems, ensuring cluster stability and reliability.
- Data Plane Events: user operation events, such as workload, network, storage, and auto scaling events.
- Control Plane Events: master node events, which are usually caused by faults or upgrades of control plane components.
Data Plane Events
|
Object |
Event Name |
Severity |
Description |
|---|---|---|---|
|
Pod |
PodOOMKilling |
Major |
Check whether the pod exits due to OOM. This event is reported by CCE Node Problem Detector (1.18.41 or later) and Cloud Native Log Collection (1.3.2 or later). |
|
Pod |
FailedStart |
Major |
Check whether the pod has started. |
|
Pod |
FailedPullImage |
Major |
Check whether the pod has pulled an image. |
|
Pod |
BackOffStart |
Major |
Check whether the pod fails to restart. |
|
Pod |
FailedScheduling |
Major |
Check whether the pod has been scheduled. |
|
Pod |
BackOffPullImage |
Major |
Check whether the pod has pulled an image after a retry. |
|
Pod |
FailedCreate |
Major |
Check whether the pod has been created. |
|
Pod |
Unhealthy |
Minor |
Check whether the pod health check is successful. |
|
Pod |
FailedDelete |
Minor |
Check whether the workload has been deleted. |
|
Pod |
ErrImageNeverPull |
Minor |
Check whether the workload has pulled an image. |
|
Pod |
FailedScaleOut |
Minor |
Check whether replicas can be added to scale the workload. |
|
Pod |
FailedReconfig |
Minor |
Check whether the pod configuration has been updated. |
|
Pod |
FailedActive |
Minor |
Check whether the pod is activated. |
|
Pod |
FailedRollback |
Minor |
Check whether the pod is rolled back. |
|
Pod |
FailedUpdate |
Minor |
Check whether the pod is updated. |
|
Pod |
FailedScaleIn |
Minor |
Check whether the pod scale-in failed. |
|
Pod |
FailedRestart |
Minor |
Check whether the pod is restarted. |
|
Deployment |
SelectorOverlap |
Minor |
Check whether label selectors in the cluster conflict. |
|
Deployment |
ReplicaSetCreateError |
Minor |
Check whether a workload ReplicaSet can be created. |
|
Deployment |
DeploymentRollbackRevisionNotFound |
Minor |
Check whether the Deployment rollback version is available. |
|
Job |
TooManyActivePods |
Minor |
Check whether there are still active pods after the number of pods in a job reaches the preset value. |
|
Job |
TooManySucceededPods |
Minor |
Check whether there are extra running pods after the number of pods in a job reaches the preset value. |
|
CronJob |
FailedGet |
Minor |
Check whether the CronJob can be queried. |
|
CronJob |
FailedList |
Minor |
Check whether the list of pods can be obtained. |
|
CronJob |
UnexpectedJob |
Minor |
Check whether there are any unknown CronJobs. |
|
Object |
Event Name |
Severity |
Description |
|---|---|---|---|
|
Service |
CreatingLoadBalancerFailed |
Minor |
Check whether a load balancer has been created. |
|
Service |
DeletingLoadBalancerFailed |
Minor |
Check whether the load balancer has been deleted. |
|
Service |
UpdateLoadBalancerFailed |
Minor |
Check whether the load balancer has been updated. |
|
Object |
Event Name |
Severity |
Description |
|---|---|---|---|
|
PV |
DetachVolumeFailed |
Minor |
Check whether the block storage is mounted. |
|
PV |
VolumeUnknownReclaimPolicy |
Minor |
Check whether a volume reclaim policy is specified. |
|
PV |
SetUpAtVolumeFailed |
Minor |
Check whether the volume is mounted. |
|
PV |
VolumeFailedRecycle |
Minor |
Check whether the volume is reclaimed. |
|
PV |
WaitForAttachVolumeFailed |
Minor |
Check whether the block storage is mounted to the node. |
|
PV |
VolumeFailedDelete |
Minor |
Check whether the volume is deleted. |
|
PV |
MountDeviceFailed |
Minor |
Check whether the device is mounted. |
|
PV |
TearDownAtVolumeFailed |
Minor |
Check whether the volume is unmounted. |
|
PV |
UnmountDeviceFailed |
Minor |
Check whether the device is unmounted. |
|
PV |
AttachVolumeFailed |
Minor |
Check whether the block storage is demounted from the node. |
|
PVC |
VolumeResizeFailed |
Minor |
Check whether the volume capacity is expanded. |
|
PVC |
ClaimLost |
Minor |
Check whether the PVC is normal. |
|
PVC |
ProvisioningFailed |
Minor |
Check whether the volume is created. |
|
PVC |
ProvisioningCleanupFailed |
Minor |
Check whether the volume has been cleared. |
|
PVC |
ClaimMisbound |
Minor |
Check whether the PVC is bound to an incorrect volume. |
|
Object |
Event Name |
Severity |
Description |
|---|---|---|---|
|
HPA |
InvalidTargetRange |
Major |
|
|
HPA |
FailedGetScale |
Major |
HPA failed to obtain the resource object to be scaled. |
|
HPA |
FailedComputeMetricsReplicas |
Major |
An error occurs when the number of replicas to be adjusted for resources is calculated. For example, metric-server is unavailable, resource metric collection fails, or the CPU usage is incorrectly set. You can run the following command to view details: kubectl describe horizontalpodautoscaler <hpa-name> |
|
HPA |
FailedGetObjectMetric |
Major |
Failed to obtain the metrics of the specified object (such as PVC and ConfigMap). |
|
HPA |
FailedGetPodsMetric |
Major |
Failed to obtain pod resource metrics (resource usages of a pod). |
|
HPA |
FailedGetResourceMetric |
Major |
Failed to obtain cluster resource metrics (resource usages of a cluster). |
|
HPA |
FailedGetContainerResourceMetric |
Major |
Failed to obtain the resource metrics of a container. |
|
HPA |
FailedGetExternalMetric |
Major |
Failed to obtain external metrics. |
|
HPA |
FailedRescale |
Major |
Failed to update the desired number of copies of the resource object to be scaled. |
|
HPA |
SuccessfulRescale |
Minor |
The desired number of copies of the resource object to be scaled is updated. |
|
CronHPA |
ScaleFailed |
Major |
CronHPA failed to update the desired number of copies of the resource object to be scaled. |
|
CronHPA |
FailedGetHorizontalPodAutoscaler |
Major |
CronHPA failed to query the associated HPA object. (Generally, kube-apiserver cannot respond.) |
|
CronHPA |
FailedGetHpaScale |
Major |
CronHPA failed to obtain the resource object to be scaled. |
|
CronHPA |
UpdateHPAFailed |
Major |
CronHPA failed to update the associated HPA object. |
|
CronHPA |
UpdateHPASuccess |
Minor |
CronHPA successfully updates the associated HPA object. |
|
CronHPA |
SkipUpdateHPA |
Minor |
CronHPA skips updating the associated HPA object. |
|
CronHPA |
SkipUpdateTarget |
Minor |
CronHPA skips updating the number of copies of the resource object to be scaled. |
|
CronHPA |
UpdateTargetSuccess |
Minor |
CronHPA successfully updates the number of copies of the resource object to be scaled. |
Control Plane Events
|
Event ID |
Severity |
Description |
|---|---|---|
|
Internal error |
Major |
Check whether there is an internal error in the cluster. |
|
Failed to check component status or components are abnormal |
Major |
Check whether the statuses of cluster components can be obtained or the components malfunction. |
|
Cluster status is Unavailable |
Major |
Check whether the cluster is available. |
|
Cluster status is Error |
Major |
Check whether the cluster is faulty. |
|
Cluster status is not updated for a long time |
Major |
Check whether the cluster is stuck in a state for a long period. |
|
Failed to update cluster status |
Major |
Check whether the cluster status is updated. |
|
Failed to delete the unavailable connection of the Kubernetes cluster |
Major |
Check whether unavailable Kubernetes connections have been deleted. |
|
Failed to sync the cluster cert |
Major |
Check whether the cluster certificates have been synchronized. |
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot