Help Center> Cloud Container Engine> User Guide> Old Console> System Steward> System Check

System Check

Scenario

System Steward consists of system check and system hardening. This topic describes the system check function.

System check detects faults or exceptions on nodes in real time.

Prerequisites

Before using the system check function, you must install the npd add-on, which is used to detect node exceptions.
Before using the system check function, you must install the prometheus add-on, which is used to obtain abnormal metrics reported by the npd add-on.

Procedure

Log in to the CCE console. In the navigation pane on the left, choose System Steward > System Check.
In the left pane of the System Check page, choose the node for which you want to perform a system check. The Indicator Check, Behavior Statistics, and Kubernetes Events tab pages are displayed.

Required add-ons have not been installed:

If the npd and prometheus add-ons are not installed, install them as prompted.

After the add-ons are installed, choose System Steward > System Check again to view the check information.
Figure 1 Installing add-ons required for system check

Required add-ons have been installed:

If the add-ons have been installed, you can click the Indicator Check, Behavior Statistics, and Kubernetes Events tabs to view the system check information.

Figure 2 Viewing the system check information

In the Indicator Check tab page, you can view system resources, system components, abnormal behaviors, and other information, and then perform operations as prompted.

**Table 1** Precautions for creating a cluster
Check Item	Check Sub-item	Description
System resources	Disk	Node disk usage.
	Memory	Node memory usage.
	PID	Node PID usage.
System components	CNI	CNI component running status
	Docker	Docker component running status
	kubelet	kubelet component running status
	kube-proxy	kube-proxy component running status
	NTP	Docker component running status
Abnormal behavior	Frequent containerd restart	Containerd restarts frequently.
	Frequent Docker restart	Docker restarts frequently.
	Frequent kubelet restart	kubelet restarts frequently.
	Frequent deregistration of network devices	Network devices, such as network adapters, are frequently deregistered.
Others	Ready	Whether the node status is Ready.

Click the Behavior Statistics tab to view the behavior information and the number of behavior occurrences.
Click the Kubernetes Event tab to view the event name, event type, number of occurrences, Kubernetes events, first occurrence time, and last occurrence time of the node.

Event data will be retained for 1 hour and then automatically deleted.

Recovery Suggestion

If system resources are insufficient, expand system resources on the node or increase the upper limit of kernel parameters. If the node cannot be recovered, you can add a taint to the node so that pods will not be scheduled to the node or the pods on the node are evicted to isolate the node.
A taint can be also added if a system component is abnormal or other exceptions occur.

Reference

Adding a taint to a node: Taints and Tolerations
Safe eviction: Safely Drain a Node while Respecting the PodDisruptionBudget

The following three commands can be used to smoothly migrate services from a node to another node during node maintenance, ensuring that services are not affected:

**Table 2** Marking a node as schedulable or unschedulable
Command	Function	Usage
cordon	Mark the node as unschedulable.	kubectl cordon {{node-name }}
uncordon	Mark the node as schedulable.	kubectl uncordon {{node-name }}
drain	Mark the node as unschedulable and evict the pods on the node.	kubectl drain {{node-name }}

Parent topic: System Steward

Last Article: System Steward

Next Article: System Hardening

Did this article solve your problem?

Thank you for your score！Your feedback would help us improve the website.

Products

Compute

Application

Dedicated Cloud

Storage

Management & Deployment

Migration

Network

Enterprise Intelligence

Video

Database

Edge Cloud Services

DevCloud

Security

Cloud Communications

Internet of Things

Solutions

Industry-Specific Solutions

General-Purpose Solutions

Security

DevOps

Enterprise Intelligence

Essential Platform

Big Data

Visual Cognition

Speech and Semantics

Support

Help Center

Customer Services

Developers

Console

语言 - Language

中国站 - 简体中文

中国站 - English

International - 简体中文

International - English