- What's New
- Function Overview
-
Product Bulletin
- Latest Notices
- Product Change Notices
- Cluster Version Release Notes
-
Vulnerability Notices
- Vulnerability Fixing Policies
- Notice of Container Escape Vulnerability in NVIDIA Container Toolkit (CVE-2024-0132)
- Notice of Linux Remote Code Execution Vulnerability in CUPS (CVE-2024-47076, CVE-2024-47175, CVE-2024-47176, and CVE-2024-47177)
- Notice of the NGINX Ingress Controller Vulnerability That Allows Attackers to Bypass Annotation Validation (CVE-2024-7646)
- Notice of Docker Engine Vulnerability That Allows Attackers to Bypass AuthZ (CVE-2024-41110)
- Notice of Linux Kernel Privilege Escalation Vulnerability (CVE-2024-1086)
- Notice of OpenSSH Remote Code Execution Vulnerability (CVE-2024-6387)
- Notice of runC systemd Attribute Injection Vulnerability (CVE-2024-3154)
- Notice of the Impact of runC Vulnerability (CVE-2024-21626)
- Notice on the Kubernetes Security Vulnerability (CVE-2022-3172)
- Privilege Escalation Vulnerability in Linux Kernel openvswitch Module (CVE-2022-2639)
- Notice on nginx-ingress Add-On Security Vulnerability (CVE-2021-25748)
- Notice on nginx-ingress Security Vulnerabilities (CVE-2021-25745 and CVE-2021-25746)
- Notice on the containerd Process Privilege Escalation Vulnerability (CVE-2022-24769)
- Notice on CRI-O Container Runtime Engine Arbitrary Code Execution Vulnerability (CVE-2022-0811)
- Notice on the Container Escape Vulnerability Caused by the Linux Kernel (CVE-2022-0492)
- Notice on the Non-Security Handling Vulnerability of containerd Image Volumes (CVE-2022-23648)
- Linux Kernel Integer Overflow Vulnerability (CVE-2022-0185)
- Linux Polkit Privilege Escalation Vulnerability (CVE-2021-4034)
- Notice on the Vulnerability of Kubernetes subPath Symlink Exchange (CVE-2021-25741)
- Notice of runC Vulnerability That Allows a Container Filesystem Breakout via Directory Traversal (CVE-2021-30465)
- Notice on the Docker Resource Management Vulnerability (CVE-2021-21285)
- Notice of NVIDIA GPU Driver Vulnerability (CVE-2021-1056)
- Notice on the Sudo Buffer Vulnerability (CVE-2021-3156)
- Notice on the Kubernetes Security Vulnerability (CVE-2020-8554)
- Notice of Apache containerd Security Vulnerability (CVE-2020-15257)
- Notice on the Docker Engine Input Verification Vulnerability (CVE-2020-13401)
- Notice of Kubernetes kube-apiserver Input Verification Vulnerability (CVE-2020-8559)
- Notice on the Kubernetes kubelet Resource Management Vulnerability (CVE-2020-8557)
- Notice on the Kubernetes kubelet and kube-proxy Authorization Vulnerability (CVE-2020-8558)
- Notice on Fixing Kubernetes HTTP/2 Vulnerability
- Notice on Fixing Linux Kernel SACK Vulnerabilities
- Notice on Fixing the Docker Command Injection Vulnerability (CVE-2019-5736)
- Notice on Fixing the Kubernetes Permission and Access Control Vulnerability (CVE-2018-1002105)
- Notice of Fixing the Kubernetes Dashboard Security Vulnerability (CVE-2018-18264)
-
Product Release Notes
-
Cluster Versions
- Kubernetes Version Policy
-
Kubernetes Version Release Notes
- Kubernetes 1.29 Release Notes
- Kubernetes 1.28 Release Notes
- Kubernetes 1.27 Release Notes
- Kubernetes 1.25 Release Notes
- Kubernetes 1.23 Release Notes
- Kubernetes 1.21 (EOM) Release Notes
- Kubernetes 1.19 (EOM) Release Notes
- Kubernetes 1.17 (EOM) Release Notes
- Kubernetes 1.15 (EOM) Release Notes
- Kubernetes 1.13 (EOM) Release Notes
- Kubernetes 1.11 (EOM) Release Notes
- Kubernetes 1.9 (EOM) and Earlier Versions Release Notes
- Patch Versions
- OS Images
-
Add-on Versions
- CoreDNS Release History
- CCE Container Storage (Everest) Release History
- CCE Node Problem Detector Release History
- Kubernetes Dashboard Release History
- CCE Cluster Autoscaler Release History
- NGINX Ingress Controller Release History
- Kubernetes Metrics Server Release History
- CCE Advanced HPA Release History
- CCE Cloud Bursting Engine for CCI Release History
- CCE AI Suite (NVIDIA GPU) Release History
- CCE AI Suite (Ascend NPU) Release History
- Volcano Scheduler Release History
- CCE Secrets Manager for DEW Release History
- CCE Network Metrics Exporter Release History
- NodeLocal DNSCache Release History
- Cloud Native Cluster Monitoring Release History
- Cloud Native Logging Release History
- CCE Cluster Backup & Recovery (End of Maintenance) Release History
- Kubernetes Web Terminal (End of Maintenance) Release History
- Prometheus (End of Maintenance) Release History
-
Cluster Versions
- Service Overview
- Billing
- Kubernetes Basics
- Getting Started
-
User Guide
- High-Risk Operations
-
Clusters
-
Cluster Overview
- Basic Cluster Information
-
Kubernetes Version Release Notes
- Kubernetes 1.29 Release Notes
- Kubernetes 1.28 Release Notes
- Kubernetes 1.27 Release Notes
- Kubernetes 1.25 Release Notes
- Kubernetes 1.23 Release Notes
- Kubernetes 1.21 (EOM) Release Notes
- Kubernetes 1.19 (EOM) Release Notes
- Kubernetes 1.17 (EOM) Release Notes
- Kubernetes 1.15 (EOM) Release Notes
- Kubernetes 1.13 (EOM) Release Notes
- Kubernetes 1.11 (EOM) Release Notes
- Release Notes for Kubernetes 1.9 (EOM) and Earlier Versions
- Patch Version Release Notes
- Buying a Cluster
- Connecting to a Cluster
-
Managing a Cluster
- Modifying Cluster Configurations
- Enabling Overload Control for a Cluster
- Changing Cluster Scale
- Changing the Default Security Group of a Node
- Deleting a Cluster
- Hibernating or Waking Up a Cluster
- Renewing a Yearly/Monthly Cluster
- Changing the Billing Mode of a Cluster from Pay-per-Use to Yearly/Monthly
-
Upgrading a Cluster
- Process and Method of Upgrading a Cluster
- Before You Start
- Performing Post-Upgrade Verification
- Migrating Services Across Clusters of Different Versions
-
Troubleshooting for Pre-upgrade Check Exceptions
- Pre-upgrade Check
- Node Restrictions
- Upgrade Management
- Add-ons
- Helm Charts
- SSH Connectivity of Master Nodes
- Node Pools
- Security Groups
- Arm Node Restrictions
- Residual Nodes
- Discarded Kubernetes Resources
- Compatibility Risks
- CCE Agent Versions
- Node CPU Usage
- CRDs
- Node Disks
- Node DNS
- Node Key Directory File Permissions
- kubelet
- Node Memory
- Node Clock Synchronization Server
- Node OS
- Node CPU Cores
- Node Python Commands
- ASM Version
- Node Readiness
- Node journald
- containerd.sock
- Internal Error
- Node Mount Points
- Kubernetes Node Taints
- Everest Restrictions
- cce-hpa-controller Limitations
- Enhanced CPU Policies
- Health of Worker Node Components
- Health of Master Node Components
- Memory Resource Limit of Kubernetes Components
- Discarded Kubernetes APIs
- IPv6 Support in CCE Turbo Clusters
- NetworkManager
- Node ID File
- Node Configuration Consistency
- Node Configuration File
- CoreDNS Configuration Consistency
- sudo
- Key Node Commands
- Mounting of a Sock File on a Node
- HTTPS Load Balancer Certificate Consistency
- Node Mounting
- Login Permissions of User paas on a Node
- Private IPv4 Addresses of Load Balancers
- Historical Upgrade Records
- CIDR Block of the Cluster Management Plane
- GPU Add-on
- Nodes' System Parameters
- Residual Package Version Data
- Node Commands
- Node Swap
- nginx-ingress Upgrade
- ELB Listener Access Control
- Master Node Flavor
- Subnet Quota of Master Nodes
- Node Runtime
- Node Pool Runtime
- Number of Node Images
- OpenKruise Compatibility Check
- Compatibility Check of Secret Encryption
- Compatibility Between the Ubuntu Kernel and GPU Driver
- Drainage Tasks
- Image Layers on a Node
- Cluster Rolling Upgrade
- Rotation Certificates
- Ingress and ELB Configuration Consistency
-
Cluster Overview
-
Nodes
- Node Overview
- Container Engines
- Node OSs
- Creating a Node
- Accepting Nodes for Management
-
Management Nodes
- Managing Node Labels
- Managing Node Taints
- Resetting a Node
- Removing a Node
- Synchronizing the Data of Cloud Servers
- Draining a Node
- Deleting or Unsubscribing from a Node
- Changing the Billing Mode of a Node to Yearly/Monthly
- Modifying the Auto-Renewal Configuration of a Yearly/Monthly Node
- Stopping a Node
-
Node O&M
- Node Resource Reservation Policy
- Space Allocation of a Data Disk
- Maximum Number of Pods That Can Be Created on a Node
- Differences in kubelet and Runtime Component Configurations Between CCE and the Native Community
- Migrating Nodes from Docker to containerd
- Optimizing Node System Parameters
- Configuring Node Fault Detection Policies
- Node Pools
-
Workloads
- Overview
- Creating a Workload
-
Configuring a Workload
- Configuring Time Zone Synchronization
- Configuring an Image Pull Policy
- Using Third-Party Images
- Configuring Container Specifications
- Configuring Container Lifecycle Parameters
- Configuring Container Health Check
- Configuring Environment Variables
- Configuring Workload Upgrade Policies
- Configuring Tolerance Policies
- Configuring Labels and Annotations
- Scheduling a Workload
- Logging In to a Container
- Managing Workloads
- Pod Security
- Scheduling
-
Network
- Overview
-
Container Network
- Overview
-
Cloud Native Network 2.0 Settings
- Cloud Native 2.0 Network Model
- Configuring Pod Subnets of a Cluster
- Binding a Security Group to a Workload Using a Security Group Policy
- Binding a Subnet and Security Group to a Namespace or Workload Using a Container Network Configuration
- Configuring Shared Bandwidth for a Pod with IPv6 Dual-Stack ENIs
- VPC Network Settings
- Tunnel Network Settings
- Pod Network Settings
-
Service
- Overview
- ClusterIP
- NodePort
-
LoadBalancer
- Creating a LoadBalancer Service
- Configuring LoadBalancer Services Using Annotations
- Configuring HTTP/HTTPS for a LoadBalancer Service
- Configuring SNI for a LoadBalancer Service
- Configuring HTTP/2 for a LoadBalancer Service
- Configuring Timeout for a LoadBalancer Service
- Configuring Health Check on Multiple Ports of a LoadBalancer Service
- Configuring Passthrough Networking for a LoadBalancer Service
- Setting the Pod Ready Status Through the ELB Health Check
- Headless Services
-
Ingresses
- Overview
-
LoadBalancer Ingresses
- Creating a LoadBalancer Ingress on the Console
- Creating a LoadBalancer Ingress Using kubectl
- Annotations for Configuring LoadBalancer Ingresses
-
Advanced Setting Examples of LoadBalancer Ingresses
- Configuring an HTTPS Certificate for a LoadBalancer Ingress
- Configuring SNI for a LoadBalancer Ingress
- Configuring Multiple Forwarding Policies for a LoadBalancer Ingress
- Configuring HTTP/2 for a LoadBalancer Ingress
- Configuring HTTPS Backend Services for a LoadBalancer Ingress
- Configuring Timeout for a LoadBalancer Ingress
- Configuring a Slow Start for a LoadBalancer Ingress
- Configuring a Range of Listening Ports for a LoadBalancer Ingress
- Nginx Ingresses
- DNS
- Configuring Intra-VPC Access
- Accessing the Internet from a Container
- Storage
- Observability
- Auto Scaling
- Namespaces
- ConfigMaps and Secrets
- Add-ons
- Helm Chart
- Permissions
- Settings
-
Old Console
- What Is Cloud Container Engine?
- High-Risk Operations and Solutions
- Clusters
-
Nodes
- Overview
- Buying a Node
- Accepting ECSs as Nodes into a Cluster
- Removing a Node
- Logging In to a Node
- Managing Node Labels
- Synchronizing Node Data
- Configuring Node Scheduling (Tainting)
- Resetting a Node
- Deleting a Node
- Stopping a Node
- Performing Rolling Upgrade for Nodes
- Formula for Calculating the Reserved Resources of a Node
- Creating a Linux LVM Disk Partition for Docker
- Data Disk Space Allocation
- Adding a Second Data Disk to a Node in a CCE Cluster
- Node Pools
-
Workloads
- Overview
- Creating a Deployment
- Creating a StatefulSet
- Creating a DaemonSet
- Creating a Job
- Creating a Cron Job
- Managing Pods
- GPU Scheduling
- NPU Scheduling
- Managing Workloads and Jobs
- Scaling a Workload
-
Configuring a Container
- Using a Third-Party Image
- Setting Container Specifications
- Setting Container Lifecycle Parameters
- Setting Container Startup Commands
- Setting Health Check for a Container
- Setting an Environment Variable
- Enabling ICMP Security Group Rules
- Configuring an Image Pull Policy
- Configuring Time Zone Synchronization
- DNS Configuration
- Pod Scale-in Priorities
- Configuring QoS Rate Limiting for Inter-Pod Access
- Adding Pod Annotations
- Affinity and Anti-Affinity Scheduling
- Networking
- Storage (CSI)
- Monitoring and Logs
- Namespaces
- Configuration Center
- Charts (Helm)
- Add-ons
- Auto Scaling
- Permissions Management
- Cloud Trace Service (CTS)
-
Best Practices
- Checklist for Deploying Containerized Applications in the Cloud
- Containerization
- Migration
- Disaster Recovery
-
Security
- Configuration Suggestions on CCE Cluster Security
- Configuration Suggestions on CCE Node Security
- Configuration Suggestions on CCE Container Runtime Security
- Configuration Suggestions on CCE Container Security
- Configuration Suggestions on CCE Container Image Security
- Configuration Suggestions on CCE Secret Security
- Auto Scaling
- Monitoring
- Cluster
- Networking
- Storage
- Container
- Permission
- Release
-
API Reference
- Before You Start
- API Overview
- Calling APIs
-
APIs
- API URL
-
Cluster Management
- Creating a Cluster
- Reading a Specified Cluster
- Listing Clusters in a Specified Project
- Updating a Specified Cluster
- Deleting a Cluster
- Hibernating a Cluster
- Waking Up a Cluster
- Obtaining a Cluster Certificate
- Modifying Cluster Specifications
- Querying a Job
- Binding/Unbinding Public API Server Address
- Obtaining Cluster Access Address
- Obtaining a Cluster's Logging Configurations
- Configuring Cluster Logs
- Obtaining the Partition List
- Creating a Partition
- Obtaining Partition Details
- Updating a Partition
- Node Management
- Node Pool Management
- Storage Management
- Add-on Management
-
Cluster Upgrade
- Upgrading a Cluster
- Obtaining Cluster Upgrade Task Details
- Retrying a Cluster Upgrade Task
- Suspending a Cluster Upgrade Task (Deprecated)
- Continuing to Execute a Cluster Upgrade Task (Deprecated)
- Obtaining a List of Cluster Upgrade Task Details
- Pre-upgrade Check
- Obtaining Details About a Pre-upgrade Check Task of a Cluster
- Obtaining a List of Pre-upgrade Check Tasks of a Cluster
- Post-upgrade Check
- Cluster Backup
- Obtaining a List of Cluster Backup Task Details
- Obtaining the Cluster Upgrade Information
- Obtaining a Cluster Upgrade Path
- Obtaining the Configuration of Cluster Upgrade Feature Gates
- Enabling the Cluster Upgrade Process Booting Task
- Obtaining a List of Upgrade Workflows
- Obtaining Details About a Specified Cluster Upgrade Task
- Updating the Status of a Specified Cluster Upgrade Booting Task
- Quota Management
- API Versions
- Tag Management
- Configuration Management
-
Chart Management
- Uploading a Chart
- Obtaining a Chart List
- Obtaining a Release List
- Updating a Chart
- Creating a Release
- Deleting a Chart
- Updating a Release
- Obtaining a Chart
- Deleting a Release
- Downloading a Chart
- Obtaining a Release
- Obtaining Chart Values
- Obtaining Historical Records of a Release
- Obtaining the Quota of a User Chart
- Kubernetes APIs
- Permissions and Supported Actions
-
Appendix
- Status Code
- Error Codes
- Obtaining a Project ID
- Obtaining an Account ID
- Specifying Add-ons to Be Installed During Cluster Creation
- How to Obtain Parameters in the API URI
- Creating a VPC and Subnet
- Creating a Key Pair
- Node Flavor Description
- Adding a Salt in the password Field When Creating a Node
- Maximum Number of Pods That Can Be Created on a Node
- Node OS
- Data Disk Space Allocation
- Attaching Disks to a Node
- SDK Reference
-
FAQs
- Common FAQ
- Billing
- Cluster
-
Node
- Node Creation
-
Node Running
- What Should I Do If a Cluster Is Available But Some Nodes Are Unavailable?
- How Do I Log In to a Node Using a Password and Reset the Password?
- How Do I Collect Logs of Nodes in a CCE Cluster?
- What Should I Do If the vdb Disk of a Node Is Damaged and the Node Cannot Be Recovered After Reset?
- What Should I Do If I/O Suspension Occasionally Occurs When SCSI EVS Disks Are Used?
- How Do I Fix an Abnormal Container or Node Due to No Thin Pool Disk Space?
- How Do I Rectify Failures When the NVIDIA Driver Is Used to Start Containers on GPU Nodes?
- Specification Change
- OSs
- Node Pool
-
Workload
-
Workload Exception Troubleshooting
- How Can I Find the Fault for an Abnormal Workload?
- What Should I Do If Pod Scheduling Fails?
- What Should I Do If a Pod Fails to Pull the Image?
- What Should I Do If Container Startup Fails?
- What Should I Do If a Pod Fails to Be Evicted?
- What Should I Do If a Storage Volume Cannot Be Mounted or the Mounting Times Out?
- What Should I Do If a Workload Remains in the Creating State?
- What Should I Do If a Pod Remains in the Terminating State?
- What Should I Do If a Workload Is Stopped Caused by Pod Deletion?
- What Should I Do If an Error Occurs When I Deploy a Service on the GPU Node?
- How Can I Locate Faults Using an Exit Code?
- Container Configuration
- Scheduling Policies
-
Others
- What Should I Do If a Cron Job Cannot Be Restarted After Being Stopped for a Period of Time?
- What Is a Headless Service When I Create a StatefulSet?
- What Should I Do If Error Message "Auth is empty" Is Displayed When a Private Image Is Pulled?
- What Is the Image Pull Policy for Containers in a CCE Cluster?
- What Can I Do If a Layer Is Missing During Image Pull?
-
Workload Exception Troubleshooting
-
Networking
-
Network Exception Troubleshooting
- How Do I Locate a Workload Networking Fault?
- Why Does the Browser Return Error Code 404 When I Access a Deployed Application?
- What Should I Do If a Container Fails to Access the Internet?
- What Should I Do If a Node Fails to Connect to the Internet (Public Network)?
- What Should I Do If Nginx Ingress Access in the Cluster Is Abnormal After the NGINX Ingress Controller Add-on Is Upgraded?
- What Could Cause Access Exceptions After Configuring an HTTPS Certificate for a LoadBalancer Ingress?
- Network Planning
- Security Hardening
-
Network Configuration
- How Can Container IP Addresses Survive a Container Restart?
- How Can I Check Whether an ENI Is Used by a Cluster?
- How Can I Delete a Security Group Rule Associated with a Deleted Subnet?
- How Can I Synchronize Certificates When Multiple Ingresses in Different Namespaces Share a Listener?
- How Can I Determine Which Ingress the Listener Settings Have Been Applied To?
-
Network Exception Troubleshooting
-
Storage
- How Do I Expand the Storage Capacity of a Container?
- What Are the Differences Among CCE Storage Classes in Terms of Persistent Storage and Multi-Node Mounting?
- Can I Create a CCE Node Without Adding a Data Disk to the Node?
- What Should I Do If the Host Cannot Be Found When Files Need to Be Uploaded to OBS During the Access to the CCE Service from a Public Network?
- How Can I Achieve Compatibility Between ExtendPathMode and Kubernetes client-go?
- Can CCE PVCs Detect Underlying Storage Faults?
- What Should I Do If a Yearly/Monthly EVS Disk Cannot Be Automatically Created?
- Namespace
-
Chart and Add-on
- What Should I Do If Installation of an Add-on Fails and "The release name is already exist" Is Displayed?
- How Do I Configure the Add-on Resource Quotas Based on Cluster Scale?
- How Can I Clean Up Residual Resources After the NGINX Ingress Controller Add-on in the Unknown State Is Deleted?
- Why TLS v1.0 and v1.1 Cannot Be Used After the NGINX Ingress Controller Add-on Is Upgraded?
-
API & kubectl FAQs
- How Can I Access a Cluster API Server?
- Can the Resources Created Using APIs or kubectl Be Displayed on the CCE Console?
- How Do I Download kubeconfig for Connecting to a Cluster Using kubectl?
- How Do I Rectify the Error Reported When Running the kubectl top node Command?
- Why Is "Error from server (Forbidden)" Displayed When I Use kubectl?
- DNS FAQs
- Image Repository FAQs
- Permissions
- Videos
Liveness Probe
Overview
Kubernetes applications have the self-healing capability, that is, when an application container crashes, the container can be detected and restarted automatically. However, this mechanism does not work for deadlocks. Assume that a Java program is having a memory leak. The program is unable to make any progress, while the JVM process is running. To address this issue, Kubernetes introduces liveness probes to check whether containers response normally and determine whether to restart containers. This is a good health check mechanism.
It is advised to define the liveness probe for every pod to gain a better understanding of pods' running statuses.
Supported detection mechanisms are as follows:
- HTTP GET: The kubelet sends an HTTP GET request to the container. Any 2XX or 3XX code indicates success. Any other code returned indicates failure.
- TCP Socket: The kubelet attempts to open a socket to your container on the specified port. If it can establish a connection, the container is considered healthy. If it fails to establish a connection, the container is considered a failure.
- Exec: kubelet executes a command in the target container. If the command succeeds, it returns 0, and kubelet considers the container to be alive and healthy. If the command returns a non-zero value, kubelet kills the container and restarts it.
In addition to liveness probes, readiness probes are also available for you to detect pod status. For details, see Readiness Probe.
HTTP GET
HTTP GET is the most common detection method. An HTTP GET request is sent to a container. Any 2xx or 3xx code returned indicates that the container is healthy. The following example shows how to define such a request:
apiVersion: v1 kind: Pod metadata: name: liveness-http spec: containers: - name: liveness image: nginx:alpine livenessProbe: # liveness probe httpGet: #HTTP GET definition path: / port: 80 imagePullSecrets: - name: default-secret
Create pod liveness-http.
$ kubectl create -f liveness-http.yaml pod/liveness-http created
The probe sends an HTTP Get request to port 80 of the container. If the request fails, Kubernetes restarts the container.
View details of pod liveness-http.
$ kubectl describe po liveness-http Name: liveness-http ...... Containers: liveness: ...... State: Running Started: Mon, 03 Aug 2020 03:08:55 +0000 Ready: True Restart Count: 0 Liveness: http-get http://:80/ delay=0s timeout=1s period=10s #success=1 #failure=3 Environment: <none> Mounts: /var/run/secrets/kubernetes.io/serviceaccount from default-token-vssmw (ro) ......
The preceding output reports that the pod is Running with Restart Count being 0, which indicates that the container is normal and no restarts have been triggered. If the value of Restart Count is not 0, the container has been restarted.
TCP Socket
TCP Socket: The kubelet attempts to open a socket to your container on the specified port. If it can establish a connection, the container is considered healthy. If it fails to establish a connection, the container is considered a failure. For detailed defining method, see the following example.
apiVersion: v1 kind: Pod metadata: labels: test: liveness name: liveness-tcp spec: containers: - name: liveness image: nginx:alpine livenessProbe: # liveness probe tcpSocket: port: 80 imagePullSecrets: - name: default-secret
Exec
kubelet executes a command in the target container. If the command succeeds, it returns 0, and kubelet considers the container to be alive and healthy. The following example shows how to define the command.
apiVersion: v1 kind: Pod metadata: labels: test: liveness name: liveness-exec spec: containers: - name: liveness image: nginx:alpine args: - /bin/sh - -c - touch /tmp/healthy; sleep 30; rm -rf /tmp/healthy; sleep 600 livenessProbe: # liveness probe exec: # Exec definition command: - cat - /tmp/healthy imagePullSecrets: - name: default-secret
In the preceding configuration file, kubelet executes the command cat /tmp/healthy in the container. If the command succeeds and returns 0, the container is considered healthy. For the first 30 seconds, there is a /tmp/healthy file. So during the first 30 seconds, the command cat /tmp/healthy returns a success code. After 30 seconds, the /tmp/healthy file is deleted. The probe will then consider the pod to be unhealthy and restart it.
Advanced Settings of a Liveness Probe
The describe command of liveness-http returns the following information:
Liveness: http-get http://:80/ delay=0s timeout=1s period=10s #success=1 #failure=3
This is the detailed configuration of the liveness probe.
- delay=0s indicates that the probe starts immediately after the container is started.
- timeout=1 indicates that the container must respond within one second. Otherwise, the health check is recorded as failed.
- period=10s indicates that the probe checks containers every 10 seconds.
- #success=1 indicates that the operation is recorded as successful if it is successful for once.
- #failure=3 indicates that a container will be restarted after three consecutive failures.
The preceding liveness probe indicates that the probe checks containers immediately after they are started. If a container does not respond within one second, the check is recorded as failed. The health check is performed every 10 seconds. If the check fails for three consecutive times, the container is restarted.
apiVersion: v1 kind: Pod metadata: name: liveness-http spec: containers: - name: liveness image: nginx:alpine livenessProbe: httpGet: path: / port: 80 initialDelaySeconds: 10 # Liveness probes are initiated after the container has started for 10s. timeoutSeconds: 2 # The container must respond within 2s. Otherwise, it is considered as a failure. periodSeconds: 30 # The probe is performed every 30s. successThreshold: 1 # The container is considered healthy as long as the probe succeeds once. failureThreshold: 3 # The container is considered unhealthy after three consecutive failures.
Normally, the value of initialDelaySeconds must be greater than 0, because it takes a while for the application to be ready. The probe often fails if the probe is initiated before the application is ready.
In addition, you can set the value of failureThreshold to be greater than 1. In this way, the kubelet checks the container for multiple times in one probe rather than performing the probe for multiple times.
Configuring a Liveness Probe
- What to check
An effective liveness probe should check all the key parts of an application and use a dedicated URL, such as /health. When the URL is accessed, the probe is triggered and a result is returned. Note that no authentication should be involved. Otherwise, the probe keeps failing and restarting the container.
In addition, a probe must not check parts that have external dependencies. For example, if a frontend web server cannot connect to a database, the web server should not be considered unhealthy for the connection failure.
- To be lightweight
A liveness probe must not occupy too many resources or certain resources for too long. Otherwise, resource shortage may affect service running. For example, the HTTP GET method is recommended for a Java application. If the Exec method is used, the JVM startup process occupies too many resources.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.