CCE Events

CCE can report a range of events in a running cluster to AOM. You can add event alarms as required to monitor the health of cluster data plane and control plane components. This helps you quickly identify and resolve problems, ensuring cluster stability and reliability.

Data Plane Events: user operation events, including workload, network, node, storage, and auto scaling events.
Control Plane Events: master node events, which are usually caused by faults or upgrades of control plane components.

Data Plane Events

**Table 1** Workload events
Category	Event Name	Severity	Description
Pod	PodOOMKilling	Major	Check whether the pod exits due to OOM. This event is reported by CCE Node Problem Detector (1.18.41 or later) and Cloud Native Logging (1.3.2 or later).
Pod	FailedStart	Major	Check whether the pod is started.
Pod	FailedPullImage	Major	Check whether the pod has pulled an image.
Pod	BackOffStart	Major	Check whether the pod fails to be restarted.
Pod	FailedScheduling	Major	Check whether the pod is scheduled.
Pod	BackOffPullImage	Major	Check whether the pod has pulled an image after a retry.
Pod	FailedCreate	Major	Check whether the pod is created.
Pod	Unhealthy	Minor	Check whether the pod health check is successful.
Pod	FailedDelete	Minor	Check whether the workload is deleted.
Pod	ErrImageNeverPull	Minor	Check whether the workload has pulled an image.
Pod	FailedScaleOut	Minor	Check whether workload copies are scaled out.
Pod	FailedStandBy	Minor	Check whether the pod enters the standby state.
Pod	FailedReconfig	Minor	Check whether the pod configuration is updated.
Pod	FailedActive	Minor	Check whether the pod is activated.
Pod	FailedRollback	Minor	Check whether the pod is rolled back.
Pod	FailedUpdate	Minor	Check whether the pod is updated.
Pod	FailedScaleIn	Minor	Check whether a pod scale-in failed.
Pod	FailedRestart	Minor	Check whether the pod is restarted.
Deployment	SelectorOverlap	Minor	Check whether label selectors in the cluster conflict.
Deployment	ReplicaSetCreateError	Minor	Check whether a workload ReplicaSet can be created.
Deployment	DeploymentRollbackRevisionNotFound	Minor	Check whether the Deployment rollback version is available.
DaemonSet	SelectingAll	Minor	Check whether the workload label selector is correctly configured.
Job	TooManyActivePods	Minor	Check whether there are still active pods after the number of pods in a job reaches the preset value.
Job	TooManySucceededPods	Minor	Check whether there are extra running pods after the number of pods in a job reaches the preset value.
CronJob	FailedGet	Minor	Check whether CronJobs can be obtained.
CronJob	FailedList	Minor	Check whether pods can be obtained.
CronJob	UnexpectedJob	Minor	Check whether there are any unknown CronJobs.

**Table 2** Network events
Category	Event Name	Severity	Description
Service	CreatingLoadBalancerFailed	Minor	Check whether the load balancer is created.
Service	DeletingLoadBalancerFailed	Minor	Check whether the load balancer is deleted.
Service	UpdateLoadBalancerFailed	Minor	Check whether the load balancer is updated.

**Table 3** Node events
Category	Event Name	Severity	Description
Node	Rebooted	Major	Check whether the node is restarted.
Node	NodeNotSchedulable	Major	Check whether the node is schedulable.
Node	NodeNotReady	Major	Check whether the node is running normally.
Node	NodeCreateFailed	Major	Check whether the node is created.
Node	KUBELETIsDown	Minor	Check whether kubelet is running normally on the node.
Node	NodeHasInsufficientMemory	Minor	Check whether the available memory of the node is sufficient.
Node	UnregisterNetDevice	Minor	Check whether the node is associated with any unregistered network device.
Node	NetworkCardNotFound	Minor	Check the node ENI status.
Node	KUBEPROXYIsDown	Minor	Check whether kube-proxy is running normally on the node.
Node	NodeOutOfDisk	Minor	Check whether the node disk space is sufficient.
Node	TaskHung	Minor	Check whether there are any suspended tasks on the node.
Node	CIDRNotAvailable	Minor	Check whether the node CIDR block is available.
Node	ConntrackFull	Minor	Check whether the connection tracking table on the node is full.
Node	NodeHasDiskPressure	Minor	Check whether the node disk space is sufficient.
Node	NodeInstallFailed	Minor	Check whether nodes are managed in the cluster.
Node	KernelOops	Minor	Check whether the OS kernel of the node is faulty.
Node	OOMKilling	Minor	The memory used by pods on the node exceeds the limit. As a result, the process is terminated. The memory used by pods on the node does not exceed the limit, but the available memory of the node is insufficient. As a result, OOM occurs.
Node	DOCKERIsDown	Minor	Check whether the container engine of the node is running normally.
Node	CIDRAssignmentFailed	Minor	Check whether a CIDR block is allocated for the node.
Node	DockerHung	Minor	Check whether the Docker process on the node is suspended.
Node	FilesystemIsReadOnly	Minor	Check whether the file system of the node is read-only.
Node	NTPIsDown	Minor	Check whether NTP is running normally on the node.
Node	NodeUninstallFailed	Minor	Check whether the node is uninstalled.
Node	AUFSUmountHung	Minor	Check whether detaching the node disk is suspended.
Node	CNIIsDown	Minor	Check whether the CNI add-on on the node is faulty.
Namespace	DeleteNodeWithNoServer	Minor	Check whether discarded nodes are cleared.

**Table 4** Storage events
Category	Event Name	Severity	Description
PV	DetachVolumeFailed	Minor	Check whether the block storage is detached.
PV	VolumeUnknownReclaimPolicy	Minor	Check whether a volume reclamation policy is specified.
PV	SetUpAtVolumeFailed	Minor	Check whether the data volume is mounted.
PV	VolumeFailedRecycle	Minor	Check whether the data volume is reclaimed.
PV	WaitForAttachVolumeFailed	Minor	Check whether block storage is attached to the node.
PV	VolumeFailedDelete	Minor	Check whether the data volume is deleted.
PV	MountDeviceFailed	Minor	Check whether the data volume is mounted.
PV	TearDownAtVolumeFailed	Minor	Check whether the data volume is detached.
PV	UnmountDeviceFailed	Minor	Check whether the drive letter of the data volume is unmounted.
PV	AttachVolumeFailed	Minor	Check whether block storage is detached from the node.
PVC	VolumeResizeFailed	Minor	Check whether the capacity of the data volume is expanded.
PVC	ClaimLost	Minor	Check whether the PVC volume is normal.
PVC	ProvisioningFailed	Minor	Check whether the data volume is created.
PVC	ProvisioningCleanupFailed	Minor	Check whether the data volume is cleared.
PVC	ClaimMisbound	Minor	Check whether the PVC is bound to an incorrect volume.

**Table 5** Auto scaling events
Category	Event Name	Severity	Description
Autoscaler	ScaleUpTimedOut	Major	Check whether adding nodes to the node pool timed out.
Autoscaler	NodePoolAvailable	Major	Check whether the node pool resources are sufficient.
Autoscaler	ScaleDown	Major	Nodes are being deleted from the cluster.
Autoscaler	NotTriggerScaleUp	Major	Check whether a node scale-out is triggered.
Autoscaler	DeleteUnregistered	Major	Check whether unregistered nodes are deleted.
Autoscaler	ScaleDownEmpty	Major	Check whether idle nodes are scaled in.
Autoscaler	ScaleDownFailed	Major	Check whether nodes are scaled in.
Autoscaler	FailedToScaleUpGroup	Major	Check whether an error occurred during a node pool scale-out.
Autoscaler	ScaledUpGroup	Major	Check whether the node pool is scaled out.
Autoscaler	ScaleUpFailed	Major	Check whether the node is scaled out.
Autoscaler	FixNodeGroupSizeDone	Major	Check whether the number of nodes in the node pool is restored.
Autoscaler	NodeGroupInBackOff	Major	Check whether there are any rollback retries during node pool scaling.
Autoscaler	FixNodeGroupSizeError	Major	Check whether the number of nodes in the node pool is restored.
Autoscaler	NodePoolSoldOut	Major	Check whether the node pool resources are sufficient.
Autoscaler	TriggeredScaleUp	Major	Check whether a node scale-out is triggered.
Autoscaler	StartScaledUpGroup	Major	Check whether a node pool scaled-out is started.
Autoscaler	DeleteUnregisteredFailed	Major	Check whether unregistered nodes are deleted.
HPA	InvalidTargetRange	Major	Invalid extendedhpa.metrics is configured in annotations of HPA. The metric type in spec of HPA is incorrect.
HPA	FailedGetScale	Major	HPA failed to obtain the resource object to be scaled.
HPA	FailedComputeMetricsReplicas	Major	An error occurs when the number of copies to be adjusted for resources is calculated. For example, metric-server is unavailable, resource metric collection fails, or the CPU usage is incorrectly set. You can run the following command to view details: kubectl describe horizontalpodautoscaler <hpa-name>
HPA	FailedGetObjectMetric	Major	Failed to obtain the metrics of the specified object (such as PVC and ConfigMaps).
HPA	FailedGetPodsMetric	Major	Failed to obtain the pod resource metric (resource usage of a pod).
HPA	FailedGetResourceMetric	Major	Failed to obtain the cluster resource metric (resource usage of a cluster).
HPA	FailedGetContainerResourceMetric	Major	Failed to obtain the resource metrics of a container.
HPA	FailedGetExternalMetric	Major	Failed to obtain external metrics.
HPA	FailedRescale	Major	Failed to update the desired number of copies of the resource object to be scaled.
HPA	SuccessfulRescale	Minor	The desired number of copies of the resource object to be scaled is updated.
CronHPA	ScaleFailed	Major	CronHPA failed to update the desired number of copies of the resource object to be scaled.
CronHPA	FailedGetHorizontalPodAutoscaler	Major	CronHPA failed to query the associated HPA object. (Generally, kube-apiserver cannot respond.)
CronHPA	FailedGetHpaScale	Major	CronHPA failed to obtain the resource object to be scaled.
CronHPA	UpdateHPAFailed	Major	CronHPA failed to update the associated HPA object.
CronHPA	UpdateHPASuccess	Minor	CronHPA successfully updates the associated HPA object.
CronHPA	SkipUpdateHPA	Minor	CronHPA skips updating the associated HPA object.
CronHPA	SkipUpdateTarget	Minor	CronHPA skips updating the number of copies of the resource object to be scaled.
CronHPA	UpdateTargetSuccess	Minor	CronHPA successfully updates the number of copies of the resource object to be scaled.
CustomedHPA	FailedSetPolicySettings	Major	Failed to parse the cooldown period of CustomedHPA.
CustomedHPA	FailedSubmitRule	Major	CustomedHPA failed to process schedule rules or metric rules.
CustomedHPA	FailedComputeReplicas	Major	CustomedHPA failed to trigger resource scaling based on the compute metrics.
CustomedHPA	FailedScale	Major	CustomedHPA failed to update the desired number of copies of the resource object to be scaled. (Generally, kube-apiserver cannot respond).
CustomedHPA	MetricScaleSuccess	Minor	CustomedHPA triggers resource scaling based on the metric rule.
CustomedHPA	CronScaleSuccess	Minor	CustomedHPA triggers resource scaling based on the periodic rule.

Control Plane Events

**Table 6** Control plane events
Event ID	Severity	Description
Internal error	Major	Check whether an internal error occurs in the cluster.
External dependency error	Major	Check whether an error occurs in cluster external dependencies.
Failed to initialize process thread	Major	Check whether a cluster initialization thread is executed.
Failed to update database	Major	Check whether the database for the cluster is updated.
Failed to create node by nodepool	Major	Check whether nodes are created in the node pool.
Failed to delete node by nodepool	Major	Check whether nodes are deleted from the node pool.
Failed to create yearly/monthly subscription node	Major	Check whether the yearly/monthly node is created in the cluster.
Failed to cancel the authorization of accessing the image of the master.	Major	When creating a cluster, check whether the authorization for the resource tenant to access the master node image is canceled.
Failed to create the virtual IP for the master	Major	When creating a cluster, check whether the virtual IP address is allocated.
Failed to delete the node VM	Major	Check whether the node (VM) is deleted from the cluster.
Failed to delete the security group of node	Major	Check whether the security group of the node is deleted from the cluster.
Failed to delete the security group of master	Major	Check whether the security group of the master node is deleted from the cluster.
Failed to delete the security group of port	Major	Check whether the ENI security group of the master node is deleted from the cluster.
Failed to delete the security group of eni or subeni	Major	Check whether ENI or sub-ENI security group is deleted from the cluster.
Failed to detach the port of master	Major	Check whether the ENI of the master node is detached from the cluster.
Failed to delete the port of master	Major	Check whether the ENI of the master node is deleted from the cluster.
Failed to delete the master VM	Major	Check whether the master node (VM) is deleted from the cluster.
Failed to delete the key pair of master	Major	Check whether the key pair of the master node is deleted from the cluster.
Failed to delete the subnet of master	Major	Check whether the subnet of the master node is deleted from the cluster.
Failed to delete the VPC of master	Major	Check whether the VPC of the master node is deleted from the cluster.
Failed to delete certificate of cluster	Major	Check whether the certificate is deleted from the cluster.
Failed to delete the server group of master	Major	Check whether the master node (ECS) is deleted from the cluster.
Failed to delete the virtual IP for the master	Major	Check whether the virtual IP address is deleted from the cluster.
Failed to get floating IP of the master	Major	Check whether the floating IP address of the master node is obtained.
Failed to get cluster flavor	Major	Check whether the cluster flavor is obtained.
Failed to get cluster endpoint	Major	Check whether the cluster endpoint is obtained.
Failed to get kubernetes connection	Major	Check whether the Kubernetes cluster connections are obtained.
Failed to update secret	Major	Check whether the cluster Secret is updated.
Operation timed out	Major	Check whether the user operation timed out.
Connecting to Kubernetes cluster timed out	Major	Check whether accessing the Kubernetes cluster timed out.
Failed to check component status or components are abnormal	Major	Check whether the statuses of cluster components can be obtained or whether the components malfunction.
The node is not found in kubernetes cluster	Major	Check whether the node can be found in the Kubernetes cluster.
The status of node is not ready in kubernetes cluster	Major	Check whether the node is running normally in the Kubernetes cluster.
Can't find corresponding vm of this node in ECS	Major	Check whether the node can be found on the ECS console.
Failed to upgrade the master	Major	Check whether the master node has been upgraded.
Failed to upgrade the node	Major	Check whether the node has been upgraded.
Failed to change flavor of the master	Major	Check whether the master node flavor has been changed.
Change flavor of the master timeout	Major	Check whether changing the master node flavor timed out.
Failed to pass verification while creating yearly/monthly subscription node	Major	Check whether creating a yearly/monthly node has been verified.
Failed to install the node	Major	Check whether the node is installed in the cluster.
Failed to clean routes of cluster container network in VPC	Major	Check whether the routes of cluster container VPCs are cleaned.
Cluster status is Unavailable	Major	Check whether the cluster is available.
Cluster status is Error	Major	Check whether the cluster is faulty.
Cluster status is not updated for a long time	Major	Check whether the cluster retains in a state for a long time.
Failed to update master status after upgrading cluster timeout	Major	Check whether the status of the master node is updated after the cluster upgrade timed out.
Failed to update running jobs after upgrading cluster timeout	Major	Check whether running tasks are updated after the cluster upgrade timed out.
Failed to update cluster status	Major	Check whether the cluster status is updated.
Failed to update node status	Major	Check whether the node status is updated.
Failed to remove the static node from database	Major	Check whether nodes are removed from the database after managing nodes timed out.
Failed to update node status to abnormal after node processing timeout	Major	Check whether the node status is updated to abnormal after processing the node timed out.
Failed to update the cluster endpoint	Major	Check whether the cluster endpoint is updated.
Failed to delete the unavailable connection of the Kubernetes cluster	Major	Check whether unavailable Kubernetes connections are deleted.
Failed to sync the cluster cert	Major	Check whether the cluster certificate is synchronized.