CCE Events
CCE can report a range of events in a running cluster to AOM. You can add event alarms as required to monitor the health of cluster data plane and control plane components. This helps you quickly identify and resolve problems, ensuring cluster stability and reliability.
- Data Plane Events: user operation events, including workload, network, node, storage, and auto scaling events.
- Control Plane Events: master node events, which are usually caused by faults or upgrades of control plane components.
Data Plane Events
Category |
Event Name |
Severity |
Description |
---|---|---|---|
Pod |
PodOOMKilling |
Major |
Check whether the pod exits due to OOM. This event is reported by CCE Node Problem Detector (1.18.41 or later) and Cloud Native Logging (1.3.2 or later). |
Pod |
FailedStart |
Major |
Check whether the pod is started. |
Pod |
FailedPullImage |
Major |
Check whether the pod has pulled an image. |
Pod |
BackOffStart |
Major |
Check whether the pod fails to be restarted. |
Pod |
FailedScheduling |
Major |
Check whether the pod is scheduled. |
Pod |
BackOffPullImage |
Major |
Check whether the pod has pulled an image after a retry. |
Pod |
FailedCreate |
Major |
Check whether the pod is created. |
Pod |
Unhealthy |
Minor |
Check whether the pod health check is successful. |
Pod |
FailedDelete |
Minor |
Check whether the workload is deleted. |
Pod |
ErrImageNeverPull |
Minor |
Check whether the workload has pulled an image. |
Pod |
FailedScaleOut |
Minor |
Check whether workload copies are scaled out. |
Pod |
FailedStandBy |
Minor |
Check whether the pod enters the standby state. |
Pod |
FailedReconfig |
Minor |
Check whether the pod configuration is updated. |
Pod |
FailedActive |
Minor |
Check whether the pod is activated. |
Pod |
FailedRollback |
Minor |
Check whether the pod is rolled back. |
Pod |
FailedUpdate |
Minor |
Check whether the pod is updated. |
Pod |
FailedScaleIn |
Minor |
Check whether a pod scale-in failed. |
Pod |
FailedRestart |
Minor |
Check whether the pod is restarted. |
Deployment |
SelectorOverlap |
Minor |
Check whether label selectors in the cluster conflict. |
Deployment |
ReplicaSetCreateError |
Minor |
Check whether a workload ReplicaSet can be created. |
Deployment |
DeploymentRollbackRevisionNotFound |
Minor |
Check whether the Deployment rollback version is available. |
DaemonSet |
SelectingAll |
Minor |
Check whether the workload label selector is correctly configured. |
Job |
TooManyActivePods |
Minor |
Check whether there are still active pods after the number of pods in a job reaches the preset value. |
Job |
TooManySucceededPods |
Minor |
Check whether there are extra running pods after the number of pods in a job reaches the preset value. |
CronJob |
FailedGet |
Minor |
Check whether CronJobs can be obtained. |
CronJob |
FailedList |
Minor |
Check whether pods can be obtained. |
CronJob |
UnexpectedJob |
Minor |
Check whether there are any unknown CronJobs. |
Category |
Event Name |
Severity |
Description |
---|---|---|---|
Service |
CreatingLoadBalancerFailed |
Minor |
Check whether the load balancer is created. |
Service |
DeletingLoadBalancerFailed |
Minor |
Check whether the load balancer is deleted. |
Service |
UpdateLoadBalancerFailed |
Minor |
Check whether the load balancer is updated. |
Category |
Event Name |
Severity |
Description |
---|---|---|---|
Node |
Rebooted |
Major |
Check whether the node is restarted. |
Node |
NodeNotSchedulable |
Major |
Check whether the node is schedulable. |
Node |
NodeNotReady |
Major |
Check whether the node is running normally. |
Node |
NodeCreateFailed |
Major |
Check whether the node is created. |
Node |
KUBELETIsDown |
Minor |
Check whether kubelet is running normally on the node. |
Node |
NodeHasInsufficientMemory |
Minor |
Check whether the available memory of the node is sufficient. |
Node |
UnregisterNetDevice |
Minor |
Check whether the node is associated with any unregistered network device. |
Node |
NetworkCardNotFound |
Minor |
Check the node ENI status. |
Node |
KUBEPROXYIsDown |
Minor |
Check whether kube-proxy is running normally on the node. |
Node |
NodeOutOfDisk |
Minor |
Check whether the node disk space is sufficient. |
Node |
TaskHung |
Minor |
Check whether there are any suspended tasks on the node. |
Node |
CIDRNotAvailable |
Minor |
Check whether the node CIDR block is available. |
Node |
ConntrackFull |
Minor |
Check whether the connection tracking table on the node is full. |
Node |
NodeHasDiskPressure |
Minor |
Check whether the node disk space is sufficient. |
Node |
NodeInstallFailed |
Minor |
Check whether nodes are managed in the cluster. |
Node |
KernelOops |
Minor |
Check whether the OS kernel of the node is faulty. |
Node |
OOMKilling |
Minor |
|
Node |
DOCKERIsDown |
Minor |
Check whether the container engine of the node is running normally. |
Node |
CIDRAssignmentFailed |
Minor |
Check whether a CIDR block is allocated for the node. |
Node |
DockerHung |
Minor |
Check whether the Docker process on the node is suspended. |
Node |
FilesystemIsReadOnly |
Minor |
Check whether the file system of the node is read-only. |
Node |
NTPIsDown |
Minor |
Check whether NTP is running normally on the node. |
Node |
NodeUninstallFailed |
Minor |
Check whether the node is uninstalled. |
Node |
AUFSUmountHung |
Minor |
Check whether detaching the node disk is suspended. |
Node |
CNIIsDown |
Minor |
Check whether the CNI add-on on the node is faulty. |
Namespace |
DeleteNodeWithNoServer |
Minor |
Check whether discarded nodes are cleared. |
Category |
Event Name |
Severity |
Description |
---|---|---|---|
PV |
DetachVolumeFailed |
Minor |
Check whether the block storage is detached. |
PV |
VolumeUnknownReclaimPolicy |
Minor |
Check whether a volume reclamation policy is specified. |
PV |
SetUpAtVolumeFailed |
Minor |
Check whether the data volume is mounted. |
PV |
VolumeFailedRecycle |
Minor |
Check whether the data volume is reclaimed. |
PV |
WaitForAttachVolumeFailed |
Minor |
Check whether block storage is attached to the node. |
PV |
VolumeFailedDelete |
Minor |
Check whether the data volume is deleted. |
PV |
MountDeviceFailed |
Minor |
Check whether the data volume is mounted. |
PV |
TearDownAtVolumeFailed |
Minor |
Check whether the data volume is detached. |
PV |
UnmountDeviceFailed |
Minor |
Check whether the drive letter of the data volume is unmounted. |
PV |
AttachVolumeFailed |
Minor |
Check whether block storage is detached from the node. |
PVC |
VolumeResizeFailed |
Minor |
Check whether the capacity of the data volume is expanded. |
PVC |
ClaimLost |
Minor |
Check whether the PVC volume is normal. |
PVC |
ProvisioningFailed |
Minor |
Check whether the data volume is created. |
PVC |
ProvisioningCleanupFailed |
Minor |
Check whether the data volume is cleared. |
PVC |
ClaimMisbound |
Minor |
Check whether the PVC is bound to an incorrect volume. |
Category |
Event Name |
Severity |
Description |
---|---|---|---|
Autoscaler |
ScaleUpTimedOut |
Major |
Check whether adding nodes to the node pool timed out. |
Autoscaler |
NodePoolAvailable |
Major |
Check whether the node pool resources are sufficient. |
Autoscaler |
ScaleDown |
Major |
Nodes are being deleted from the cluster. |
Autoscaler |
NotTriggerScaleUp |
Major |
Check whether a node scale-out is triggered. |
Autoscaler |
DeleteUnregistered |
Major |
Check whether unregistered nodes are deleted. |
Autoscaler |
ScaleDownEmpty |
Major |
Check whether idle nodes are scaled in. |
Autoscaler |
ScaleDownFailed |
Major |
Check whether nodes are scaled in. |
Autoscaler |
FailedToScaleUpGroup |
Major |
Check whether an error occurred during a node pool scale-out. |
Autoscaler |
ScaledUpGroup |
Major |
Check whether the node pool is scaled out. |
Autoscaler |
ScaleUpFailed |
Major |
Check whether the node is scaled out. |
Autoscaler |
FixNodeGroupSizeDone |
Major |
Check whether the number of nodes in the node pool is restored. |
Autoscaler |
NodeGroupInBackOff |
Major |
Check whether there are any rollback retries during node pool scaling. |
Autoscaler |
FixNodeGroupSizeError |
Major |
Check whether the number of nodes in the node pool is restored. |
Autoscaler |
NodePoolSoldOut |
Major |
Check whether the node pool resources are sufficient. |
Autoscaler |
TriggeredScaleUp |
Major |
Check whether a node scale-out is triggered. |
Autoscaler |
StartScaledUpGroup |
Major |
Check whether a node pool scaled-out is started. |
Autoscaler |
DeleteUnregisteredFailed |
Major |
Check whether unregistered nodes are deleted. |
HPA |
InvalidTargetRange |
Major |
|
HPA |
FailedGetScale |
Major |
HPA failed to obtain the resource object to be scaled. |
HPA |
FailedComputeMetricsReplicas |
Major |
An error occurs when the number of copies to be adjusted for resources is calculated. For example, metric-server is unavailable, resource metric collection fails, or the CPU usage is incorrectly set. You can run the following command to view details: kubectl describe horizontalpodautoscaler <hpa-name> |
HPA |
FailedGetObjectMetric |
Major |
Failed to obtain the metrics of the specified object (such as PVC and ConfigMaps). |
HPA |
FailedGetPodsMetric |
Major |
Failed to obtain the pod resource metric (resource usage of a pod). |
HPA |
FailedGetResourceMetric |
Major |
Failed to obtain the cluster resource metric (resource usage of a cluster). |
HPA |
FailedGetContainerResourceMetric |
Major |
Failed to obtain the resource metrics of a container. |
HPA |
FailedGetExternalMetric |
Major |
Failed to obtain external metrics. |
HPA |
FailedRescale |
Major |
Failed to update the desired number of copies of the resource object to be scaled. |
HPA |
SuccessfulRescale |
Minor |
The desired number of copies of the resource object to be scaled is updated. |
CronHPA |
ScaleFailed |
Major |
CronHPA failed to update the desired number of copies of the resource object to be scaled. |
CronHPA |
FailedGetHorizontalPodAutoscaler |
Major |
CronHPA failed to query the associated HPA object. (Generally, kube-apiserver cannot respond.) |
CronHPA |
FailedGetHpaScale |
Major |
CronHPA failed to obtain the resource object to be scaled. |
CronHPA |
UpdateHPAFailed |
Major |
CronHPA failed to update the associated HPA object. |
CronHPA |
UpdateHPASuccess |
Minor |
CronHPA successfully updates the associated HPA object. |
CronHPA |
SkipUpdateHPA |
Minor |
CronHPA skips updating the associated HPA object. |
CronHPA |
SkipUpdateTarget |
Minor |
CronHPA skips updating the number of copies of the resource object to be scaled. |
CronHPA |
UpdateTargetSuccess |
Minor |
CronHPA successfully updates the number of copies of the resource object to be scaled. |
CustomedHPA |
FailedSetPolicySettings |
Major |
Failed to parse the cooldown period of CustomedHPA. |
CustomedHPA |
FailedSubmitRule |
Major |
CustomedHPA failed to process schedule rules or metric rules. |
CustomedHPA |
FailedComputeReplicas |
Major |
CustomedHPA failed to trigger resource scaling based on the compute metrics. |
CustomedHPA |
FailedScale |
Major |
CustomedHPA failed to update the desired number of copies of the resource object to be scaled. (Generally, kube-apiserver cannot respond). |
CustomedHPA |
MetricScaleSuccess |
Minor |
CustomedHPA triggers resource scaling based on the metric rule. |
CustomedHPA |
CronScaleSuccess |
Minor |
CustomedHPA triggers resource scaling based on the periodic rule. |
Control Plane Events
Event ID |
Severity |
Description |
---|---|---|
Internal error |
Major |
Check whether an internal error occurs in the cluster. |
External dependency error |
Major |
Check whether an error occurs in cluster external dependencies. |
Failed to initialize process thread |
Major |
Check whether a cluster initialization thread is executed. |
Failed to update database |
Major |
Check whether the database for the cluster is updated. |
Failed to create node by nodepool |
Major |
Check whether nodes are created in the node pool. |
Failed to delete node by nodepool |
Major |
Check whether nodes are deleted from the node pool. |
Failed to create yearly/monthly subscription node |
Major |
Check whether the yearly/monthly node is created in the cluster. |
Failed to cancel the authorization of accessing the image of the master. |
Major |
When creating a cluster, check whether the authorization for the resource tenant to access the master node image is canceled. |
Failed to create the virtual IP for the master |
Major |
When creating a cluster, check whether the virtual IP address is allocated. |
Failed to delete the node VM |
Major |
Check whether the node (VM) is deleted from the cluster. |
Failed to delete the security group of node |
Major |
Check whether the security group of the node is deleted from the cluster. |
Failed to delete the security group of master |
Major |
Check whether the security group of the master node is deleted from the cluster. |
Failed to delete the security group of port |
Major |
Check whether the ENI security group of the master node is deleted from the cluster. |
Failed to delete the security group of eni or subeni |
Major |
Check whether ENI or sub-ENI security group is deleted from the cluster. |
Failed to detach the port of master |
Major |
Check whether the ENI of the master node is detached from the cluster. |
Failed to delete the port of master |
Major |
Check whether the ENI of the master node is deleted from the cluster. |
Failed to delete the master VM |
Major |
Check whether the master node (VM) is deleted from the cluster. |
Failed to delete the key pair of master |
Major |
Check whether the key pair of the master node is deleted from the cluster. |
Failed to delete the subnet of master |
Major |
Check whether the subnet of the master node is deleted from the cluster. |
Failed to delete the VPC of master |
Major |
Check whether the VPC of the master node is deleted from the cluster. |
Failed to delete certificate of cluster |
Major |
Check whether the certificate is deleted from the cluster. |
Failed to delete the server group of master |
Major |
Check whether the master node (ECS) is deleted from the cluster. |
Failed to delete the virtual IP for the master |
Major |
Check whether the virtual IP address is deleted from the cluster. |
Failed to get floating IP of the master |
Major |
Check whether the floating IP address of the master node is obtained. |
Failed to get cluster flavor |
Major |
Check whether the cluster flavor is obtained. |
Failed to get cluster endpoint |
Major |
Check whether the cluster endpoint is obtained. |
Failed to get kubernetes connection |
Major |
Check whether the Kubernetes cluster connections are obtained. |
Failed to update secret |
Major |
Check whether the cluster Secret is updated. |
Operation timed out |
Major |
Check whether the user operation timed out. |
Connecting to Kubernetes cluster timed out |
Major |
Check whether accessing the Kubernetes cluster timed out. |
Failed to check component status or components are abnormal |
Major |
Check whether the statuses of cluster components can be obtained or whether the components malfunction. |
The node is not found in kubernetes cluster |
Major |
Check whether the node can be found in the Kubernetes cluster. |
The status of node is not ready in kubernetes cluster |
Major |
Check whether the node is running normally in the Kubernetes cluster. |
Can't find corresponding vm of this node in ECS |
Major |
Check whether the node can be found on the ECS console. |
Failed to upgrade the master |
Major |
Check whether the master node has been upgraded. |
Failed to upgrade the node |
Major |
Check whether the node has been upgraded. |
Failed to change flavor of the master |
Major |
Check whether the master node flavor has been changed. |
Change flavor of the master timeout |
Major |
Check whether changing the master node flavor timed out. |
Failed to pass verification while creating yearly/monthly subscription node |
Major |
Check whether creating a yearly/monthly node has been verified. |
Failed to install the node |
Major |
Check whether the node is installed in the cluster. |
Failed to clean routes of cluster container network in VPC |
Major |
Check whether the routes of cluster container VPCs are cleaned. |
Cluster status is Unavailable |
Major |
Check whether the cluster is available. |
Cluster status is Error |
Major |
Check whether the cluster is faulty. |
Cluster status is not updated for a long time |
Major |
Check whether the cluster retains in a state for a long time. |
Failed to update master status after upgrading cluster timeout |
Major |
Check whether the status of the master node is updated after the cluster upgrade timed out. |
Failed to update running jobs after upgrading cluster timeout |
Major |
Check whether running tasks are updated after the cluster upgrade timed out. |
Failed to update cluster status |
Major |
Check whether the cluster status is updated. |
Failed to update node status |
Major |
Check whether the node status is updated. |
Failed to remove the static node from database |
Major |
Check whether nodes are removed from the database after managing nodes timed out. |
Failed to update node status to abnormal after node processing timeout |
Major |
Check whether the node status is updated to abnormal after processing the node timed out. |
Failed to update the cluster endpoint |
Major |
Check whether the cluster endpoint is updated. |
Failed to delete the unavailable connection of the Kubernetes cluster |
Major |
Check whether unavailable Kubernetes connections are deleted. |
Failed to sync the cluster cert |
Major |
Check whether the cluster certificate is synchronized. |
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot