CCE Autopilot Cluster Events
CCE Autopilot can report a range of events in a running cluster to AOM. You can add event alarms as required to monitor the health of cluster data plane and control plane components. This helps you quickly identify and resolve problems, ensuring cluster stability and reliability.
- Data Plane Events: user operation events, such as workload, network, storage, and auto scaling events.
- Control Plane Events: master node events, which are usually caused by faults or upgrades of control plane components.
Data Plane Events
Object |
Event Name |
Severity |
Description |
---|---|---|---|
Pod |
PodOOMKilling |
Major |
Check whether the pod exits due to OOM. This event is reported by CCE Node Problem Detector (1.18.41 or later) and Cloud Native Log Collection (1.3.2 or later). |
Pod |
FailedStart |
Major |
Check whether the pod is started. |
Pod |
FailedPullImage |
Major |
Check whether the pod has pulled an image. |
Pod |
BackOffStart |
Major |
Check whether the pod fails to be restarted. |
Pod |
FailedScheduling |
Major |
Check whether the pod is scheduled. |
Pod |
BackOffPullImage |
Major |
Check whether the pod has pulled an image after a retry. |
Pod |
FailedCreate |
Major |
Check whether the pod is created. |
Pod |
Unhealthy |
Minor |
Check whether the pod health check is successful. |
Pod |
FailedDelete |
Minor |
Check whether the workload is deleted. |
Pod |
ErrImageNeverPull |
Minor |
Check whether the workload has pulled an image. |
Pod |
FailedScaleOut |
Minor |
Check whether the workload is scaled out using replicas. |
Pod |
FailedReconfig |
Minor |
Check whether the pod configuration is updated. |
Pod |
FailedActive |
Minor |
Check whether the pod is activated. |
Pod |
FailedRollback |
Minor |
Check whether the pod is rolled back. |
Pod |
FailedUpdate |
Minor |
Check whether the pod is updated. |
Pod |
FailedScaleIn |
Minor |
Check whether the pod scale-in failed. |
Pod |
FailedRestart |
Minor |
Check whether the pod is restarted. |
Deployment |
SelectorOverlap |
Minor |
Check whether label selectors in the cluster conflict. |
Deployment |
ReplicaSetCreateError |
Minor |
Check whether a workload ReplicaSet can be created. |
Deployment |
DeploymentRollbackRevisionNotFound |
Minor |
Check whether the Deployment rollback version is available. |
Job |
TooManyActivePods |
Minor |
Check whether there are still active pods after the number of pods in a job reaches the preset value. |
Job |
TooManySucceededPods |
Minor |
Check whether there are extra running pods after the number of pods in a job reaches the preset value. |
CronJob |
FailedGet |
Minor |
Check whether the CronJob can be obtained. |
CronJob |
FailedList |
Minor |
Check whether the list of pods can be obtained. |
CronJob |
UnexpectedJob |
Minor |
Check whether there are any unknown CronJobs. |
Type |
Event Name |
Severity |
Description |
---|---|---|---|
Service |
CreatingLoadBalancerFailed |
Minor |
Check whether a load balancer is created. |
Service |
DeletingLoadBalancerFailed |
Minor |
Check whether the load balancer is deleted. |
Service |
UpdateLoadBalancerFailed |
Minor |
Check whether the load balancer is updated. |
Type |
Event Name |
Severity |
Description |
---|---|---|---|
PV |
DetachVolumeFailed |
Minor |
Check whether the block storage is detached. |
PV |
VolumeUnknownReclaimPolicy |
Minor |
Check whether a volume reclamation policy is specified. |
PV |
SetUpAtVolumeFailed |
Minor |
Check whether the volume is mounted. |
PV |
VolumeFailedRecycle |
Minor |
Check whether the volume is reclaimed. |
PV |
WaitForAttachVolumeFailed |
Minor |
Check whether the block storage is attached to the node. |
PV |
VolumeFailedDelete |
Minor |
Check whether the volume is deleted. |
PV |
MountDeviceFailed |
Minor |
Check whether the device is mounted. |
PV |
TearDownAtVolumeFailed |
Minor |
Check whether the volume is unmounted. |
PV |
UnmountDeviceFailed |
Minor |
Check whether the device is unmounted. |
PV |
AttachVolumeFailed |
Minor |
Check whether the block storage is detached from the node. |
PVC |
VolumeResizeFailed |
Minor |
Check whether the volume capacity is expanded. |
PVC |
ClaimLost |
Minor |
Check whether the PVC is normal. |
PVC |
ProvisioningFailed |
Minor |
Check whether the volume is created. |
PVC |
ProvisioningCleanupFailed |
Minor |
Check whether the volume has been cleared. |
PVC |
ClaimMisbound |
Minor |
Check whether the PVC is bound to an incorrect volume. |
Type |
Event Name |
Severity |
Description |
---|---|---|---|
HPA |
InvalidTargetRange |
Major |
|
HPA |
FailedGetScale |
Major |
HPA failed to obtain the resource object to be scaled. |
HPA |
FailedComputeMetricsReplicas |
Major |
An error occurs when the number of copies to be adjusted for resources is calculated. For example, metric-server is unavailable, resource metric collection fails, or the CPU usage is incorrectly set. You can run the following command to view details: kubectl describe horizontalpodautoscaler <hpa-name> |
HPA |
FailedGetObjectMetric |
Major |
Failed to obtain the metrics of the specified object (such as PVC and ConfigMaps). |
HPA |
FailedGetPodsMetric |
Major |
Failed to obtain pod resource metrics (resource usages of a pod). |
HPA |
FailedGetResourceMetric |
Major |
Failed to obtain cluster resource metrics (resource usages of a cluster). |
HPA |
FailedGetContainerResourceMetric |
Major |
Failed to obtain the resource metrics of a container. |
HPA |
FailedGetExternalMetric |
Major |
Failed to obtain external metrics. |
HPA |
FailedRescale |
Major |
Failed to update the desired number of copies of the resource object to be scaled. |
HPA |
SuccessfulRescale |
Minor |
The desired number of copies of the resource object to be scaled is updated. |
CronHPA |
ScaleFailed |
Major |
CronHPA failed to update the desired number of copies of the resource object to be scaled. |
CronHPA |
FailedGetHorizontalPodAutoscaler |
Major |
CronHPA failed to query the associated HPA object. (Generally, kube-apiserver cannot respond.) |
CronHPA |
FailedGetHpaScale |
Major |
CronHPA failed to obtain the resource object to be scaled. |
CronHPA |
UpdateHPAFailed |
Major |
CronHPA failed to update the associated HPA object. |
CronHPA |
UpdateHPASuccess |
Minor |
CronHPA successfully updates the associated HPA object. |
CronHPA |
SkipUpdateHPA |
Minor |
CronHPA skips updating the associated HPA object. |
CronHPA |
SkipUpdateTarget |
Minor |
CronHPA skips updating the number of copies of the resource object to be scaled. |
CronHPA |
UpdateTargetSuccess |
Minor |
CronHPA successfully updates the number of copies of the resource object to be scaled. |
CustomedHPA |
FailedSetPolicySettings |
Major |
Failed to parse the cooldown period of CustomedHPA. |
CustomedHPA |
FailedSubmitRule |
Major |
CustomedHPA failed to process schedule rules or metric rules. |
CustomedHPA |
FailedComputeReplicas |
Major |
CustomedHPA failed to trigger resource scaling based on the compute metrics. |
CustomedHPA |
FailedScale |
Major |
CustomedHPA failed to update the desired number of copies of the resource object to be scaled. (Generally, kube-apiserver cannot respond). |
CustomedHPA |
MetricScaleSuccess |
Minor |
CustomedHPA triggers resource scaling based on the metric rule. |
CustomedHPA |
CronScaleSuccess |
Minor |
CustomedHPA triggers resource scaling based on the periodic rule. |
Control Plane Events
Event ID |
Severity |
Description |
---|---|---|
Internal error |
Major |
Check whether an internal error occurs in the cluster. |
Failed to check component status or components are abnormal |
Major |
Check whether the statuses of cluster components can be obtained or whether the components malfunction. |
Cluster status is Unavailable |
Major |
Check whether the cluster is available. |
Cluster status is Error |
Major |
Check whether the cluster is faulty. |
Cluster status is not updated for a long time |
Major |
Check whether the cluster is stuck in a state for a long time. |
Failed to update cluster status |
Major |
Check whether the cluster status is updated. |
Failed to delete the unavailable connection of the Kubernetes cluster |
Major |
Check whether unavailable Kubernetes connections are deleted. |
Failed to sync the cluster cert |
Major |
Check whether the cluster certificate is synchronized. |
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot