- What's New
- Function Overview
-
Product Bulletin
- Latest Notices
- Product Change Notices
- Cluster Version Release Notes
-
Vulnerability Notices
- Vulnerability Fixing Policies
- Notice of Container Escape Vulnerability in NVIDIA Container Toolkit (CVE-2024-0132)
- Notice of Linux Remote Code Execution Vulnerability in CUPS (CVE-2024-47076, CVE-2024-47175, CVE-2024-47176, and CVE-2024-47177)
- Notice of the NGINX Ingress Controller Vulnerability That Allows Attackers to Bypass Annotation Validation (CVE-2024-7646)
- Notice of Docker Engine Vulnerability That Allows Attackers to Bypass AuthZ (CVE-2024-41110)
- Notice of Linux Kernel Privilege Escalation Vulnerability (CVE-2024-1086)
- Notice of OpenSSH Remote Code Execution Vulnerability (CVE-2024-6387)
- Notice of runC systemd Attribute Injection Vulnerability (CVE-2024-3154)
- Notice of the Impact of runC Vulnerability (CVE-2024-21626)
- Notice on the Kubernetes Security Vulnerability (CVE-2022-3172)
- Privilege Escalation Vulnerability in Linux Kernel openvswitch Module (CVE-2022-2639)
- Notice on nginx-ingress Add-On Security Vulnerability (CVE-2021-25748)
- Notice on nginx-ingress Security Vulnerabilities (CVE-2021-25745 and CVE-2021-25746)
- Notice on the containerd Process Privilege Escalation Vulnerability (CVE-2022-24769)
- Notice on CRI-O Container Runtime Engine Arbitrary Code Execution Vulnerability (CVE-2022-0811)
- Notice on the Container Escape Vulnerability Caused by the Linux Kernel (CVE-2022-0492)
- Notice on the Non-Security Handling Vulnerability of containerd Image Volumes (CVE-2022-23648)
- Linux Kernel Integer Overflow Vulnerability (CVE-2022-0185)
- Linux Polkit Privilege Escalation Vulnerability (CVE-2021-4034)
- Notice on the Vulnerability of Kubernetes subPath Symlink Exchange (CVE-2021-25741)
- Notice of runC Vulnerability That Allows a Container Filesystem Breakout via Directory Traversal (CVE-2021-30465)
- Notice on the Docker Resource Management Vulnerability (CVE-2021-21285)
- Notice of NVIDIA GPU Driver Vulnerability (CVE-2021-1056)
- Notice on the Sudo Buffer Vulnerability (CVE-2021-3156)
- Notice on the Kubernetes Security Vulnerability (CVE-2020-8554)
- Notice of Apache containerd Security Vulnerability (CVE-2020-15257)
- Notice on the Docker Engine Input Verification Vulnerability (CVE-2020-13401)
- Notice of Kubernetes kube-apiserver Input Verification Vulnerability (CVE-2020-8559)
- Notice on the Kubernetes kubelet Resource Management Vulnerability (CVE-2020-8557)
- Notice on the Kubernetes kubelet and kube-proxy Authorization Vulnerability (CVE-2020-8558)
- Notice on Fixing Kubernetes HTTP/2 Vulnerability
- Notice on Fixing Linux Kernel SACK Vulnerabilities
- Notice on Fixing the Docker Command Injection Vulnerability (CVE-2019-5736)
- Notice on Fixing the Kubernetes Permission and Access Control Vulnerability (CVE-2018-1002105)
- Notice of Fixing the Kubernetes Dashboard Security Vulnerability (CVE-2018-18264)
-
Product Release Notes
-
Cluster Versions
- Kubernetes Version Policy
-
Kubernetes Version Release Notes
- Kubernetes 1.29 Release Notes
- Kubernetes 1.28 Release Notes
- Kubernetes 1.27 Release Notes
- Kubernetes 1.25 Release Notes
- Kubernetes 1.23 Release Notes
- Kubernetes 1.21 (EOM) Release Notes
- Kubernetes 1.19 (EOM) Release Notes
- Kubernetes 1.17 (EOM) Release Notes
- Kubernetes 1.15 (EOM) Release Notes
- Kubernetes 1.13 (EOM) Release Notes
- Kubernetes 1.11 (EOM) Release Notes
- Kubernetes 1.9 (EOM) and Earlier Versions Release Notes
- Patch Versions
- OS Images
-
Add-on Versions
- CoreDNS Release History
- CCE Container Storage (Everest) Release History
- CCE Node Problem Detector Release History
- Kubernetes Dashboard Release History
- CCE Cluster Autoscaler Release History
- NGINX Ingress Controller Release History
- Kubernetes Metrics Server Release History
- CCE Advanced HPA Release History
- CCE Cloud Bursting Engine for CCI Release History
- CCE AI Suite (NVIDIA GPU) Release History
- CCE AI Suite (Ascend NPU) Release History
- Volcano Scheduler Release History
- CCE Secrets Manager for DEW Release History
- CCE Network Metrics Exporter Release History
- NodeLocal DNSCache Release History
- Cloud Native Cluster Monitoring Release History
- Cloud Native Logging Release History
- CCE Cluster Backup & Recovery (End of Maintenance) Release History
- Kubernetes Web Terminal (End of Maintenance) Release History
- Prometheus (End of Maintenance) Release History
-
Cluster Versions
- Service Overview
- Billing
- Kubernetes Basics
- Getting Started
-
User Guide
- High-Risk Operations
-
Clusters
-
Cluster Overview
- Basic Cluster Information
-
Kubernetes Version Release Notes
- Kubernetes 1.29 Release Notes
- Kubernetes 1.28 Release Notes
- Kubernetes 1.27 Release Notes
- Kubernetes 1.25 Release Notes
- Kubernetes 1.23 Release Notes
- Kubernetes 1.21 (EOM) Release Notes
- Kubernetes 1.19 (EOM) Release Notes
- Kubernetes 1.17 (EOM) Release Notes
- Kubernetes 1.15 (EOM) Release Notes
- Kubernetes 1.13 (EOM) Release Notes
- Kubernetes 1.11 (EOM) Release Notes
- Release Notes for Kubernetes 1.9 (EOM) and Earlier Versions
- Patch Version Release Notes
- Buying a Cluster
- Connecting to a Cluster
-
Managing a Cluster
- Modifying Cluster Configurations
- Enabling Overload Control for a Cluster
- Changing Cluster Scale
- Changing the Default Security Group of a Node
- Deleting a Cluster
- Hibernating or Waking Up a Cluster
- Renewing a Yearly/Monthly Cluster
- Changing the Billing Mode of a Cluster from Pay-per-Use to Yearly/Monthly
-
Upgrading a Cluster
- Process and Method of Upgrading a Cluster
- Before You Start
- Performing Post-Upgrade Verification
- Migrating Services Across Clusters of Different Versions
-
Troubleshooting for Pre-upgrade Check Exceptions
- Pre-upgrade Check
- Node Restrictions
- Upgrade Management
- Add-ons
- Helm Charts
- SSH Connectivity of Master Nodes
- Node Pools
- Security Groups
- Arm Node Restrictions
- Residual Nodes
- Discarded Kubernetes Resources
- Compatibility Risks
- CCE Agent Versions
- Node CPU Usage
- CRDs
- Node Disks
- Node DNS
- Node Key Directory File Permissions
- kubelet
- Node Memory
- Node Clock Synchronization Server
- Node OS
- Node CPU Cores
- Node Python Commands
- ASM Version
- Node Readiness
- Node journald
- containerd.sock
- Internal Error
- Node Mount Points
- Kubernetes Node Taints
- Everest Restrictions
- cce-hpa-controller Limitations
- Enhanced CPU Policies
- Health of Worker Node Components
- Health of Master Node Components
- Memory Resource Limit of Kubernetes Components
- Discarded Kubernetes APIs
- IPv6 Support in CCE Turbo Clusters
- NetworkManager
- Node ID File
- Node Configuration Consistency
- Node Configuration File
- CoreDNS Configuration Consistency
- sudo
- Key Node Commands
- Mounting of a Sock File on a Node
- HTTPS Load Balancer Certificate Consistency
- Node Mounting
- Login Permissions of User paas on a Node
- Private IPv4 Addresses of Load Balancers
- Historical Upgrade Records
- CIDR Block of the Cluster Management Plane
- GPU Add-on
- Nodes' System Parameters
- Residual Package Version Data
- Node Commands
- Node Swap
- nginx-ingress Upgrade
- ELB Listener Access Control
- Master Node Flavor
- Subnet Quota of Master Nodes
- Node Runtime
- Node Pool Runtime
- Number of Node Images
- OpenKruise Compatibility Check
- Compatibility Check of Secret Encryption
- Compatibility Between the Ubuntu Kernel and GPU Driver
- Drainage Tasks
- Image Layers on a Node
- Cluster Rolling Upgrade
- Rotation Certificates
- Ingress and ELB Configuration Consistency
-
Cluster Overview
-
Nodes
- Node Overview
- Container Engines
- Node OSs
- Creating a Node
- Accepting Nodes for Management
-
Management Nodes
- Managing Node Labels
- Managing Node Taints
- Resetting a Node
- Removing a Node
- Synchronizing the Data of Cloud Servers
- Draining a Node
- Deleting or Unsubscribing from a Node
- Changing the Billing Mode of a Node to Yearly/Monthly
- Modifying the Auto-Renewal Configuration of a Yearly/Monthly Node
- Stopping a Node
-
Node O&M
- Node Resource Reservation Policy
- Space Allocation of a Data Disk
- Maximum Number of Pods That Can Be Created on a Node
- Differences in kubelet and Runtime Component Configurations Between CCE and the Native Community
- Migrating Nodes from Docker to containerd
- Optimizing Node System Parameters
- Configuring Node Fault Detection Policies
- Node Pools
-
Workloads
- Overview
- Creating a Workload
-
Configuring a Workload
- Configuring Time Zone Synchronization
- Configuring an Image Pull Policy
- Using Third-Party Images
- Configuring Container Specifications
- Configuring Container Lifecycle Parameters
- Configuring Container Health Check
- Configuring Environment Variables
- Configuring Workload Upgrade Policies
- Configuring Tolerance Policies
- Configuring Labels and Annotations
- Scheduling a Workload
- Logging In to a Container
- Managing Workloads
- Pod Security
- Scheduling
-
Network
- Overview
-
Container Network
- Overview
-
Cloud Native Network 2.0 Settings
- Cloud Native 2.0 Network Model
- Configuring Pod Subnets of a Cluster
- Binding a Security Group to a Workload Using a Security Group Policy
- Binding a Subnet and Security Group to a Namespace or Workload Using a Container Network Configuration
- Configuring Shared Bandwidth for a Pod with IPv6 Dual-Stack ENIs
- VPC Network Settings
- Tunnel Network Settings
- Pod Network Settings
-
Service
- Overview
- ClusterIP
- NodePort
-
LoadBalancer
- Creating a LoadBalancer Service
- Configuring LoadBalancer Services Using Annotations
- Configuring HTTP/HTTPS for a LoadBalancer Service
- Configuring SNI for a LoadBalancer Service
- Configuring HTTP/2 for a LoadBalancer Service
- Configuring Timeout for a LoadBalancer Service
- Configuring Health Check on Multiple Ports of a LoadBalancer Service
- Configuring Passthrough Networking for a LoadBalancer Service
- Setting the Pod Ready Status Through the ELB Health Check
- Headless Services
-
Ingresses
- Overview
-
LoadBalancer Ingresses
- Creating a LoadBalancer Ingress on the Console
- Creating a LoadBalancer Ingress Using kubectl
- Annotations for Configuring LoadBalancer Ingresses
-
Advanced Setting Examples of LoadBalancer Ingresses
- Configuring an HTTPS Certificate for a LoadBalancer Ingress
- Configuring SNI for a LoadBalancer Ingress
- Configuring Multiple Forwarding Policies for a LoadBalancer Ingress
- Configuring HTTP/2 for a LoadBalancer Ingress
- Configuring HTTPS Backend Services for a LoadBalancer Ingress
- Configuring Timeout for a LoadBalancer Ingress
- Configuring a Slow Start for a LoadBalancer Ingress
- Configuring a Range of Listening Ports for a LoadBalancer Ingress
- Nginx Ingresses
- DNS
- Configuring Intra-VPC Access
- Accessing the Internet from a Container
- Storage
- Observability
- Auto Scaling
- Namespaces
- ConfigMaps and Secrets
- Add-ons
- Helm Chart
- Permissions
- Settings
-
Old Console
- What Is Cloud Container Engine?
- High-Risk Operations and Solutions
- Clusters
-
Nodes
- Overview
- Buying a Node
- Accepting ECSs as Nodes into a Cluster
- Removing a Node
- Logging In to a Node
- Managing Node Labels
- Synchronizing Node Data
- Configuring Node Scheduling (Tainting)
- Resetting a Node
- Deleting a Node
- Stopping a Node
- Performing Rolling Upgrade for Nodes
- Formula for Calculating the Reserved Resources of a Node
- Creating a Linux LVM Disk Partition for Docker
- Data Disk Space Allocation
- Adding a Second Data Disk to a Node in a CCE Cluster
- Node Pools
-
Workloads
- Overview
- Creating a Deployment
- Creating a StatefulSet
- Creating a DaemonSet
- Creating a Job
- Creating a Cron Job
- Managing Pods
- GPU Scheduling
- NPU Scheduling
- Managing Workloads and Jobs
- Scaling a Workload
-
Configuring a Container
- Using a Third-Party Image
- Setting Container Specifications
- Setting Container Lifecycle Parameters
- Setting Container Startup Commands
- Setting Health Check for a Container
- Setting an Environment Variable
- Enabling ICMP Security Group Rules
- Configuring an Image Pull Policy
- Configuring Time Zone Synchronization
- DNS Configuration
- Pod Scale-in Priorities
- Configuring QoS Rate Limiting for Inter-Pod Access
- Adding Pod Annotations
- Affinity and Anti-Affinity Scheduling
- Networking
- Storage (CSI)
- Monitoring and Logs
- Namespaces
- Configuration Center
- Charts (Helm)
- Add-ons
- Auto Scaling
- Permissions Management
- Cloud Trace Service (CTS)
-
Best Practices
- Checklist for Deploying Containerized Applications in the Cloud
- Containerization
- Migration
- Disaster Recovery
-
Security
- Configuration Suggestions on CCE Cluster Security
- Configuration Suggestions on CCE Node Security
- Configuration Suggestions on CCE Container Runtime Security
- Configuration Suggestions on CCE Container Security
- Configuration Suggestions on CCE Container Image Security
- Configuration Suggestions on CCE Secret Security
- Auto Scaling
- Monitoring
- Cluster
- Networking
- Storage
- Container
- Permission
- Release
-
API Reference
- Before You Start
- API Overview
- Calling APIs
-
APIs
- API URL
-
Cluster Management
- Creating a Cluster
- Reading a Specified Cluster
- Listing Clusters in a Specified Project
- Updating a Specified Cluster
- Deleting a Cluster
- Hibernating a Cluster
- Waking Up a Cluster
- Obtaining a Cluster Certificate
- Modifying Cluster Specifications
- Querying a Job
- Binding/Unbinding Public API Server Address
- Obtaining Cluster Access Address
- Obtaining a Cluster's Logging Configurations
- Configuring Cluster Logs
- Obtaining the Partition List
- Creating a Partition
- Obtaining Partition Details
- Updating a Partition
- Node Management
- Node Pool Management
- Storage Management
- Add-on Management
-
Cluster Upgrade
- Upgrading a Cluster
- Obtaining Cluster Upgrade Task Details
- Retrying a Cluster Upgrade Task
- Suspending a Cluster Upgrade Task (Deprecated)
- Continuing to Execute a Cluster Upgrade Task (Deprecated)
- Obtaining a List of Cluster Upgrade Task Details
- Pre-upgrade Check
- Obtaining Details About a Pre-upgrade Check Task of a Cluster
- Obtaining a List of Pre-upgrade Check Tasks of a Cluster
- Post-upgrade Check
- Cluster Backup
- Obtaining a List of Cluster Backup Task Details
- Obtaining the Cluster Upgrade Information
- Obtaining a Cluster Upgrade Path
- Obtaining the Configuration of Cluster Upgrade Feature Gates
- Enabling the Cluster Upgrade Process Booting Task
- Obtaining a List of Upgrade Workflows
- Obtaining Details About a Specified Cluster Upgrade Task
- Updating the Status of a Specified Cluster Upgrade Booting Task
- Quota Management
- API Versions
- Tag Management
- Configuration Management
-
Chart Management
- Uploading a Chart
- Obtaining a Chart List
- Obtaining a Release List
- Updating a Chart
- Creating a Release
- Deleting a Chart
- Updating a Release
- Obtaining a Chart
- Deleting a Release
- Downloading a Chart
- Obtaining a Release
- Obtaining Chart Values
- Obtaining Historical Records of a Release
- Obtaining the Quota of a User Chart
- Kubernetes APIs
- Permissions and Supported Actions
-
Appendix
- Status Code
- Error Codes
- Obtaining a Project ID
- Obtaining an Account ID
- Specifying Add-ons to Be Installed During Cluster Creation
- How to Obtain Parameters in the API URI
- Creating a VPC and Subnet
- Creating a Key Pair
- Node Flavor Description
- Adding a Salt in the password Field When Creating a Node
- Maximum Number of Pods That Can Be Created on a Node
- Node OS
- Data Disk Space Allocation
- Attaching Disks to a Node
- SDK Reference
-
FAQs
- Common FAQ
- Billing
- Cluster
-
Node
- Node Creation
-
Node Running
- What Should I Do If a Cluster Is Available But Some Nodes Are Unavailable?
- How Do I Log In to a Node Using a Password and Reset the Password?
- How Do I Collect Logs of Nodes in a CCE Cluster?
- What Should I Do If the vdb Disk of a Node Is Damaged and the Node Cannot Be Recovered After Reset?
- What Should I Do If I/O Suspension Occasionally Occurs When SCSI EVS Disks Are Used?
- How Do I Fix an Abnormal Container or Node Due to No Thin Pool Disk Space?
- How Do I Rectify Failures When the NVIDIA Driver Is Used to Start Containers on GPU Nodes?
- Specification Change
- OSs
- Node Pool
-
Workload
-
Workload Exception Troubleshooting
- How Can I Find the Fault for an Abnormal Workload?
- What Should I Do If Pod Scheduling Fails?
- What Should I Do If a Pod Fails to Pull the Image?
- What Should I Do If Container Startup Fails?
- What Should I Do If a Pod Fails to Be Evicted?
- What Should I Do If a Storage Volume Cannot Be Mounted or the Mounting Times Out?
- What Should I Do If a Workload Remains in the Creating State?
- What Should I Do If a Pod Remains in the Terminating State?
- What Should I Do If a Workload Is Stopped Caused by Pod Deletion?
- What Should I Do If an Error Occurs When I Deploy a Service on the GPU Node?
- How Can I Locate Faults Using an Exit Code?
- Container Configuration
- Scheduling Policies
-
Others
- What Should I Do If a Cron Job Cannot Be Restarted After Being Stopped for a Period of Time?
- What Is a Headless Service When I Create a StatefulSet?
- What Should I Do If Error Message "Auth is empty" Is Displayed When a Private Image Is Pulled?
- What Is the Image Pull Policy for Containers in a CCE Cluster?
- What Can I Do If a Layer Is Missing During Image Pull?
-
Workload Exception Troubleshooting
-
Networking
-
Network Exception Troubleshooting
- How Do I Locate a Workload Networking Fault?
- Why Does the Browser Return Error Code 404 When I Access a Deployed Application?
- What Should I Do If a Container Fails to Access the Internet?
- What Should I Do If a Node Fails to Connect to the Internet (Public Network)?
- What Should I Do If Nginx Ingress Access in the Cluster Is Abnormal After the NGINX Ingress Controller Add-on Is Upgraded?
- What Could Cause Access Exceptions After Configuring an HTTPS Certificate for a LoadBalancer Ingress?
- Network Planning
- Security Hardening
-
Network Configuration
- How Can Container IP Addresses Survive a Container Restart?
- How Can I Check Whether an ENI Is Used by a Cluster?
- How Can I Delete a Security Group Rule Associated with a Deleted Subnet?
- How Can I Synchronize Certificates When Multiple Ingresses in Different Namespaces Share a Listener?
- How Can I Determine Which Ingress the Listener Settings Have Been Applied To?
-
Network Exception Troubleshooting
-
Storage
- How Do I Expand the Storage Capacity of a Container?
- What Are the Differences Among CCE Storage Classes in Terms of Persistent Storage and Multi-Node Mounting?
- Can I Create a CCE Node Without Adding a Data Disk to the Node?
- What Should I Do If the Host Cannot Be Found When Files Need to Be Uploaded to OBS During the Access to the CCE Service from a Public Network?
- How Can I Achieve Compatibility Between ExtendPathMode and Kubernetes client-go?
- Can CCE PVCs Detect Underlying Storage Faults?
- What Should I Do If a Yearly/Monthly EVS Disk Cannot Be Automatically Created?
- Namespace
-
Chart and Add-on
- What Should I Do If Installation of an Add-on Fails and "The release name is already exist" Is Displayed?
- How Do I Configure the Add-on Resource Quotas Based on Cluster Scale?
- How Can I Clean Up Residual Resources After the NGINX Ingress Controller Add-on in the Unknown State Is Deleted?
- Why TLS v1.0 and v1.1 Cannot Be Used After the NGINX Ingress Controller Add-on Is Upgraded?
-
API & kubectl FAQs
- How Can I Access a Cluster API Server?
- Can the Resources Created Using APIs or kubectl Be Displayed on the CCE Console?
- How Do I Download kubeconfig for Connecting to a Cluster Using kubectl?
- How Do I Rectify the Error Reported When Running the kubectl top node Command?
- Why Is "Error from server (Forbidden)" Displayed When I Use kubectl?
- DNS FAQs
- Image Repository FAQs
- Permissions
- Videos
Selecting a Data Disk for the Node
When a node is created, a data disk is attached by default for a container runtime and kubelet. The data disk used by the container runtime and kubelet cannot be detached, and the default capacity is 100 GiB. To cut costs, you can adjust the disk capacity to the minimum of 20 GiB or reduce the disk capacity attached to a node to the minimum of 10 GiB.
Adjusting the size of the data disk used by the container runtime and kubelet may incur risks. You are advised to evaluate the capacity adjustment and then perform the operations described in this section.
- If the disk capacity is too small, the image pull may fail. If different images need to be frequently pulled on the node, you are not advised to reduce the data disk capacity.
- Before a cluster upgrade, the system checks whether the data disk usage exceeds 95%. If the usage is high, the cluster upgrade may be affected.
- If Device Mapper is used, the disk capacity may be insufficient. You are advised to use the OverlayFS or select a large-capacity data disk.
- For dumping logs, application logs must be stored in a separate disk to prevent insufficient storage capacity of the dockersys volume from affecting service running.
- After reducing the data disk capacity, you are advised to install the npd add-on in the cluster to detect disk usage. If the disk usage of a node is high, resolve this problem by referring to What If the Data Disk Capacity Is Insufficient?
Constraints
- Only clusters of v1.19 or later allow reducing the capacity of the data disk used by container runtimes and kubelet.
- Only the EVS disk capacity can be adjusted. (Local disks are available only when the node specification is disk-intensive or Ultra-high I/O.)
Selecting a Data Disk
When selecting a data disk, consider the following factors:
- During image pull, the system downloads the image package (the .tar package) from the image repository, and decompresses the package. Then it deletes the package but retain the image file. During the decompression of the .tar package, the package and the decompressed image file coexist. Reserve the capacity for the decompressed files.
- Mandatory add-ons (such as everest and coredns) may be deployed on nodes during cluster creation. When calculating the data disk size, reserve about 2 GiB storage capacity for them.
- Logs are generated during application running. To ensure stable application running, reserve about 1 GiB storage capacity for each pod.
For details about the calculation formulas, see OverlayFS and Device Mapper.
OverlayFS
By default, the container engine and container image storage capacity of a node using the OverlayFS storage driver occupies 90% of the data disk capacity (you are advised to retain this value). All the 90% storage capacity is used for dockersys partitioning. The calculation methods are as follows:
- Capacity for storing container engines and container images requires 90% of the data disk capacity by default.
- Capacity for dockersys volume (in the /var/lib/docker directory) requires 90% of the data disk capacity. The entire container engine and container image capacity (need 90% of the data disk capacity by default) are in the /var/lib/docker directory.
- Capacity for storing temporary kubelet and emptyDir requires 10% of the data disk capacity.
On a node using the OverlayFS, when an image is pulled, the .tar package is decompressed after being downloaded. During this process, the .tar package and the decompressed image file are stored in the dockersys volume, occupying about twice the actual image storage capacity. After the decompression is complete, the .tar package is deleted. Therefore, during image pull, after deducting the storage capacity occupied by the system add-on images, ensure that the remaining capacity of the dockersys volume is greater than twice the actual image storage capacity. To ensure that the containers can run stably, reserve certain capacity in the dockersys volume for container logs and other related files.
When selecting a data disk, consider the following formula:
Capacity of dockersys volume > Actual total image storage capacity x 2 + Total system add-on image storage capacity (about 2 GiB) + Number of containers x Available storage capacity for a single container (about 1 GiB log storage capacity for each container)
If container logs are output in the json.log format, they will occupy some capacity in the dockersys volume. If container logs are stored on persistent storage, they will not occupy capacity in the dockersys volume. Estimate the capacity of every container as required.
Example:
Assume that the node uses the OverlayFS and the data disk attached to this node is 20 GiB. According to the preceding methods, the capacity for storing container engines and images occupies 90% of the data disk capacity, and the capacity for the dockersys volume is 18 GiB (20 GiB x 90%). Additionally, mandatory add-ons may occupy about 2 GiB storage capacity during cluster creation. If you deploy a .tar package of 10 GiB, the package decompression takes 20 GiB of the dockersys volume's storage capacity. This, coupled with the storage capacity occupied by mandatory add-ons, exceeds the remaining capacity of the dockersys volume. As a result, the image pull may fail.
Device Mapper
By default, the capacity for storing container engines and container images of a node using the Device Mapper storage driver occupies 90% of the data disk capacity (you are advised to retain this value). The occupied capacity includes the dockersys volume and thinpool volume. The calculation methods are as follows:
- Capacity for storing container engines and container images requires 90% of the data disk capacity by default.
- Capacity for the dockersys volume (in the /var/lib/docker directory) requires 20% of the capacity for storing container engines and container images.
- Capacity for the thinpool volume requires 80% of the container engine and container image storage capacity.
- Capacity for storing temporary kubelet and emptyDir requires 10% of the data disk capacity.
On a node using the Device Mapper storage driver, when an image is pulled, the .tar package is temporarily stored in the dockersys volume. After the .tar package is decompressed, the image file is stored in the thinpool volume, and the package in the dockersys volume will be deleted. Therefore, during image pull, ensure that the dockersys partition space and thinpool space are sufficient, and note that the former is smaller than the latter. To ensure that the containers can run stably, reserve certain capacity in the dockersys volume for container logs and other related files.
- Capacity for dockersys volume > Temporary storage capacity of the .tar package (approximately equal to the actual total image storage capacity) + Number of containers x Storage capacity of a single container (about 1 GiB log storage capacity must be reserved for each container)
- Capacity for thinpool volume > Actual total image storage capacity + Total add-on image storage capacity (about 2 GiB)
If container logs are output in the json.log format, they will occupy some capacity in the dockersys volume. If container logs are stored on persistent storage, they will not occupy capacity in the dockersys volume. Estimate the capacity of every container as required.
Example:
Assume that the node uses the Device Mapper and the data disk attached to this node is 20 GiB. According to the preceding methods, the container engine and image storage capacity occupies 90% of the data disk capacity, and the disk usage of the dockersys volume is 3.6 GiB. Additionally, the storage capacity of the mandatory add-ons may occupy about 2 GiB of the dockersys volume during cluster creation. The remaining storage capacity is about 1.6 GiB. If you deploy a .tar image package larger than 1.6 GiB, the storage capacity of the dockersys volume is insufficient for the package to be decompressed. As a result, the image pull may fail.
What If the Data Disk Capacity Is Insufficient?
Solution 1: Clearing images
- Nodes that use containerd
- Obtain local images on the node.
crictl images -v
- Delete the images that are not required by image ID.
crictl rmi Image ID
- Obtain local images on the node.
- Nodes that use Docker
- Obtain local images on the node.
docker images
- Delete the images that are not required by image ID.
docker rmi Image ID
- Obtain local images on the node.
Do not delete system images such as the cce-pause image. Otherwise, pods may fail to be created.
Solution 2: Expanding the disk capacity
- Expand the capacity of a data disk on the EVS console.
Only the storage capacity of the EVS disk is expanded. You also need to perform the following steps to expand the capacity of the logical volume and file system.
- Log in to the CCE console and click the cluster. In the navigation pane, choose Nodes. Click More > Sync Server Data in the row containing the target node.
- Log in to the target node.
- Run the lsblk command to check the block device information of the node.
A data disk is divided depending on the container storage Rootfs:
Overlayfs: No independent thin pool is allocated. Image data is stored in dockersys.
- Check the disk and partition sizes of the device.
# lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT sda 8:0 0 50G 0 disk └─sda1 8:1 0 50G 0 part / sdb 8:16 0 150G 0 disk # The data disk has been expanded to 150 GiB, but 50 GiB space is not allocated. ├─vgpaas-dockersys 253:0 0 90G 0 lvm /var/lib/containerd └─vgpaas-kubernetes 253:1 0 10G 0 lvm /mnt/paas/kubernetes/kubelet
- Expand the disk capacity.
Add the new disk capacity to the dockersys logical volume used by the container engine.
- Expand the PV capacity so that LVM can identify the new EVS capacity. /dev/sdb specifies the physical volume where dockersys is located.
pvresize /dev/sdb
Information similar to the following is displayed:
Physical volume "/dev/sdb" changed 1 physical volume(s) resized or updated / 0 physical volume(s) not resized
- Expand 100% of the free capacity to the logical volume. vgpaas/dockersys specifies the logical volume used by the container engine.
lvextend -l+100%FREE -n vgpaas/dockersys
Information similar to the following is displayed:
Size of logical volume vgpaas/dockersys changed from <90.00 GiB (23039 extents) to 140.00 GiB (35840 extents). Logical volume vgpaas/dockersys successfully resized.
- Adjust the size of the file system. /dev/vgpaas/dockersys specifies the file system path of the container engine.
resize2fs /dev/vgpaas/dockersys
Information similar to the following is displayed:
Filesystem at /dev/vgpaas/dockersys is mounted on /var/lib/containerd; on-line resizing required old_desc_blocks = 12, new_desc_blocks = 18 The filesystem on /dev/vgpaas/dockersys is now 36700160 blocks long.
- Expand the PV capacity so that LVM can identify the new EVS capacity. /dev/sdb specifies the physical volume where dockersys is located.
- Check whether the capacity is expanded.
# lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT sda 8:0 0 50G 0 disk └─sda1 8:1 0 50G 0 part / sdb 8:16 0 150G 0 disk ├─vgpaas-dockersys 253:0 0 140G 0 lvm /var/lib/containerd └─vgpaas-kubernetes 253:1 0 10G 0 lvm /mnt/paas/kubernetes/kubelet
Devicemapper: A thin pool is allocated to store image data.
- Check the disk and partition sizes of the device.
# lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT vda 8:0 0 50G 0 disk └─vda1 8:1 0 50G 0 part / vdb 8:16 0 200G 0 disk ├─vgpaas-dockersys 253:0 0 18G 0 lvm /var/lib/docker ├─vgpaas-thinpool_tmeta 253:1 0 3G 0 lvm │ └─vgpaas-thinpool 253:3 0 67G 0 lvm # Space used by thinpool │ ... ├─vgpaas-thinpool_tdata 253:2 0 67G 0 lvm │ └─vgpaas-thinpool 253:3 0 67G 0 lvm │ ... └─vgpaas-kubernetes 253:4 0 10G 0 lvm /mnt/paas/kubernetes/kubelet
- Expand the disk capacity.
- Expand the PV capacity so that LVM can identify the new EVS capacity. /dev/vdb specifies the physical volume where thinpool is located.
pvresize /dev/vdb
Information similar to the following is displayed:
Physical volume "/dev/vdb" changed 1 physical volume(s) resized or updated / 0 physical volume(s) not resized
- Expand 100% of the free capacity to the logical volume. vgpaas/thinpool specifies the logical volume used by the container engine.
lvextend -l+100%FREE -n vgpaas/thinpool
Information similar to the following is displayed:
Size of logical volume vgpaas/thinpool changed from <67.00 GiB (23039 extents) to <167.00 GiB (48639 extents). Logical volume vgpaas/thinpool successfully resized.
- Do not need to adjust the size of the file system, because the thin pool is not mounted to any devices.
- Check whether the capacity is expanded. Run the lsblk command to check the disk and partition sizes of the device. If the new disk capacity has been added to the thin pool, the capacity is expanded.
# lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT vda 8:0 0 50G 0 disk └─vda1 8:1 0 50G 0 part / vdb 8:16 0 200G 0 disk ├─vgpaas-dockersys 253:0 0 18G 0 lvm /var/lib/docker ├─vgpaas-thinpool_tmeta 253:1 0 3G 0 lvm │ └─vgpaas-thinpool 253:3 0 167G 0 lvm # Thin pool space after capacity expansion │ ... ├─vgpaas-thinpool_tdata 253:2 0 67G 0 lvm │ └─vgpaas-thinpool 253:3 0 67G 0 lvm │ ... └─vgpaas-kubernetes 253:4 0 10G 0 lvm /mnt/paas/kubernetes/kubelet
Option 2: Add the new disk capacity to the dockersys disk.
- Expand the PV capacity so that LVM can identify the new EVS capacity. /dev/vdb specifies the physical volume where dockersys is located.
pvresize /dev/vdb
Information similar to the following is displayed:
Physical volume "/dev/vdb" changed 1 physical volume(s) resized or updated / 0 physical volume(s) not resized
- Expand 100% of the free capacity to the logical volume. vgpaas/dockersys specifies the logical volume used by the container engine.
lvextend -l+100%FREE -n vgpaas/dockersys
Information similar to the following is displayed:
Size of logical volume vgpaas/dockersys changed from <18.00 GiB (4607 extents) to <118.00 GiB (30208 extents). Logical volume vgpaas/dockersys successfully resized.
- Adjust the size of the file system. /dev/vgpaas/dockersys specifies the file system path of the container engine.
resize2fs /dev/vgpaas/dockersys
Information similar to the following is displayed:
Filesystem at /dev/vgpaas/dockersys is mounted on /var/lib/docker; on-line resizing required old_desc_blocks = 3, new_desc_blocks = 15 The filesystem on /dev/vgpaas/dockersys is now 30932992 blocks long.
- Check whether the capacity is expanded. Run the lsblk command to check the disk and partition sizes of the device. If the new disk capacity has been added to the dockersys, the capacity is expanded.
# lsblk NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT vda 8:0 0 50G 0 disk └─vda1 8:1 0 50G 0 part / vdb 8:16 0 200G 0 disk ├─vgpaas-dockersys 253:0 0 118G 0 lvm /var/lib/docker # dockersys after capacity expansion ├─vgpaas-thinpool_tmeta 253:1 0 3G 0 lvm │ └─vgpaas-thinpool 253:3 0 67G 0 lvm │ ... ├─vgpaas-thinpool_tdata 253:2 0 67G 0 lvm │ └─vgpaas-thinpool 253:3 0 67G 0 lvm │ ... └─vgpaas-kubernetes 253:4 0 10G 0 lvm /mnt/paas/kubernetes/kubelet
- Expand the PV capacity so that LVM can identify the new EVS capacity. /dev/vdb specifies the physical volume where thinpool is located.
- Check the disk and partition sizes of the device.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.