- What's New
- Function Overview
-
Product Bulletin
- Latest Notices
-
Product Change Notices
- EOM of CentOS
- Billing Changes for Huawei Cloud CCE Autopilot Data Plane
- CCE Autopilot for Commercial Use on September 30, 2024, 00:00 GMT+08:00
- Reliability Hardening for Cluster Networks and Storage Functions
- Support for Docker
- Service Account Token Security Improvement
- Upgrade of Helm v2 to Helm v3
- Problems Caused by conn_reuse_mode Settings in the IPVS Forwarding Mode of CCE Clusters
- Optimized Key Authentication of the everest Add-on
-
Cluster Version Release Notes
- End of Maintenance for Clusters 1.25
- End of Maintenance for Clusters 1.23
- End of Maintenance for Clusters 1.21
- End of Maintenance for Clusters 1.19
- End of Maintenance for Clusters 1.17
- End of Maintenance for Clusters 1.15
- End of Maintenance for Clusters 1.13
- Creation of CCE Clusters 1.13 and Earlier Not Supported
- Upgrade for Kubernetes Clusters 1.9
-
Vulnerability Notices
- Vulnerability Fixing Policies
- Notice of Kubernetes Security Vulnerability (CVE-2024-10220)
- Notice of Kubernetes Security Vulnerabilities (CVE-2024-9486 and CVE-2024-9594)
- Notice of Container Escape Vulnerability in NVIDIA Container Toolkit (CVE-2024-0132)
- Notice of Linux Remote Code Execution Vulnerability in CUPS (CVE-2024-47076, CVE-2024-47175, CVE-2024-47176, and CVE-2024-47177)
- Notice of the NGINX Ingress Controller Vulnerability That Allows Attackers to Bypass Annotation Validation (CVE-2024-7646)
- Notice of Docker Engine Vulnerability That Allows Attackers to Bypass AuthZ (CVE-2024-41110)
- Notice of Linux Kernel Privilege Escalation Vulnerability (CVE-2024-1086)
- Notice of OpenSSH Remote Code Execution Vulnerability (CVE-2024-6387)
- Notice of Fluent Bit Memory Corruption Vulnerability (CVE-2024-4323)
- Notice of runC systemd Attribute Injection Vulnerability (CVE-2024-3154)
- Notice of the Impact of runC Vulnerability (CVE-2024-21626)
- Notice of Kubernetes Security Vulnerability (CVE-2022-3172)
- Notice of Privilege Escalation Vulnerability in Linux Kernel openvswitch Module (CVE-2022-2639)
- Notice of nginx-ingress Add-on Security Vulnerability (CVE-2021-25748)
- Notice of nginx-ingress Security Vulnerabilities (CVE-2021-25745 and CVE-2021-25746)
- Notice of containerd Process Privilege Escalation Vulnerability (CVE-2022-24769)
- Notice of CRI-O Container Runtime Engine Arbitrary Code Execution Vulnerability (CVE-2022-0811)
- Notice of Container Escape Vulnerability Caused by the Linux Kernel (CVE-2022-0492)
- Notice of Non-Security Handling Vulnerability of containerd Image Volumes (CVE-2022-23648)
- Notice of Linux Kernel Integer Overflow Vulnerability (CVE-2022-0185)
- Notice of Linux Polkit Privilege Escalation Vulnerability (CVE-2021-4034)
- Notice of Vulnerability of Kubernetes subPath Symlink Exchange (CVE-2021-25741)
- Notice of runC Vulnerability That Allows a Container Filesystem Breakout via Directory Traversal (CVE-2021-30465)
- Notice of Docker Resource Management Vulnerability (CVE-2021-21285)
- Notice of NVIDIA GPU Driver Vulnerability (CVE-2021-1056)
- Notice of the Sudo Buffer Vulnerability (CVE-2021-3156)
- Notice of the Kubernetes Security Vulnerability (CVE-2020-8554)
- Notice of Apache containerd Security Vulnerability (CVE-2020-15257)
- Notice of Docker Engine Input Verification Vulnerability (CVE-2020-13401)
- Notice of Kubernetes kube-apiserver Input Verification Vulnerability (CVE-2020-8559)
- Notice of Kubernetes kubelet Resource Management Vulnerability (CVE-2020-8557)
- Notice of Kubernetes kubelet and kube-proxy Authorization Vulnerability (CVE-2020-8558)
- Notice of Fixing the Kubernetes HTTP/2 Vulnerability
- Notice of Fixing the Linux Kernel SACK Vulnerabilities
- Notice of Fixing the Docker Command Injection Vulnerability (CVE-2019-5736)
- Notice of Fixing the Kubernetes Permission and Access Control Vulnerability (CVE-2018-1002105)
- Notice of Fixing the Kubernetes Dashboard Security Vulnerability (CVE-2018-18264)
-
Product Release Notes
-
Cluster Versions
- Kubernetes Version Policy
-
Kubernetes Version Release Notes
- Kubernetes 1.31 Release Notes
- Kubernetes 1.30 Release Notes
- Kubernetes 1.29 Release Notes
- Kubernetes 1.28 Release Notes
- Kubernetes 1.27 Release Notes
- Kubernetes 1.25 Release Notes
- Kubernetes 1.23 Release Notes
- Kubernetes 1.21 (EOM) Release Notes
- Kubernetes 1.19 (EOM) Release Notes
- Kubernetes 1.17 (EOM) Release Notes
- Kubernetes 1.15 (EOM) Release Notes
- Kubernetes 1.13 (EOM) Release Notes
- Kubernetes 1.11 (EOM) Release Notes
- Kubernetes 1.9 (EOM) and Earlier Versions Release Notes
- Patch Versions
- OS Images
-
Add-on Versions
- CoreDNS Release History
- CCE Container Storage (Everest) Release History
- CCE Node Problem Detector Release History
- Kubernetes Dashboard Release History
- CCE Cluster Autoscaler Release History
- NGINX Ingress Controller Release History
- Kubernetes Metrics Server Release History
- CCE Advanced HPA Release History
- CCE Cloud Bursting Engine for CCI Release History
- CCE AI Suite (NVIDIA GPU) Release History
- CCE AI Suite (Ascend NPU) Release History
- Volcano Scheduler Release History
- CCE Secrets Manager for DEW Release History
- CCE Network Metrics Exporter Release History
- NodeLocal DNSCache Release History
- Cloud Native Cluster Monitoring Release History
- Cloud Native Log Collection Release History
- Container Image Signature Verification Release History
- Grafana Release History
- OpenKruise Release History
- Gatekeeper Release History
- Vertical Pod Autoscaler Release History
- CCE Cluster Backup & Recovery (End of Maintenance) Release History
- Kubernetes Web Terminal (End of Maintenance) Release History
- Prometheus (End of Maintenance) Release History
-
Cluster Versions
- Service Overview
- Billing
- Kubernetes Basics
- Getting Started
-
User Guide
- High-Risk Operations
-
Clusters
- Basic Cluster Information
-
Cluster Version Release Notes
-
Kubernetes Version Release Notes
- Kubernetes 1.31 Release Notes
- Kubernetes 1.30 Release Notes
- Kubernetes 1.29 Release Notes
- Kubernetes 1.28 Release Notes
- Kubernetes 1.27 Release Notes
- Kubernetes 1.25 Release Notes
- Kubernetes 1.23 Release Notes
- Kubernetes 1.21 (EOM) Release Notes
- Kubernetes 1.19 (EOM) Release Notes
- Kubernetes 1.17 (EOM) Release Notes
- Kubernetes 1.15 (EOM) Release Notes
- Kubernetes 1.13 (EOM) Release Notes
- Kubernetes 1.11 (EOM) Release Notes
- Release Notes for Kubernetes 1.9 (EOM) and Earlier Versions
- Patch Version Release Notes
-
Kubernetes Version Release Notes
- Buying a Cluster
- Connecting to a Cluster
-
Managing a Cluster
- Modifying Cluster Configurations
- Enabling Overload Control for a Cluster
- Changing Cluster Scale
- Changing the Default Security Group of a Node
- Deleting a Cluster
- Hibernating or Waking Up a Pay-per-Use Cluster
- Renewing a Yearly/Monthly Cluster
- Changing the Billing Mode of a Cluster from Pay-per-Use to Yearly/Monthly
-
Upgrading a Cluster
- Process and Method of Upgrading a Cluster
- Before You Start
- Performing Post-Upgrade Verification
- Migrating Services Across Clusters of Different Versions
-
Troubleshooting for Pre-upgrade Check Exceptions
- Pre-upgrade Check
- Node Restrictions
- Upgrade Management
- Add-ons
- Helm Charts
- SSH Connectivity of Master Nodes
- Node Pools
- Security Groups
- Residual Nodes
- Discarded Kubernetes Resources
- Compatibility Risks
- CCE Agent Versions
- Node CPU Usage
- CRDs
- Node Disks
- Node DNS
- Node Key Directory File Permissions
- kubelet
- Node Memory
- Node Clock Synchronization Server
- Node OS
- Node CPU Cores
- Node Python Commands
- ASM Version
- Node Readiness
- Node journald
- containerd.sock
- Internal Error
- Node Mount Points
- Kubernetes Node Taints
- Everest Restrictions
- cce-hpa-controller Limitations
- Enhanced CPU Policies
- Health of Worker Node Components
- Health of Master Node Components
- Memory Resource Limit of Kubernetes Components
- Discarded Kubernetes APIs
- NetworkManager
- Node ID File
- Node Configuration Consistency
- Node Configuration File
- CoreDNS Configuration Consistency
- sudo
- Key Node Commands
- Mounting of a Sock File on a Node
- HTTPS Load Balancer Certificate Consistency
- Node Mounting
- Login Permissions of User paas on a Node
- Private IPv4 Addresses of Load Balancers
- Historical Upgrade Records
- CIDR Block of the Cluster Management Plane
- GPU Add-on
- Nodes' System Parameters
- Residual Package Version Data
- Node Commands
- Node Swap
- NGINX Ingress Controller
- Upgrade of Cloud Native Cluster Monitoring
- containerd Pod Restart Risks
- Key GPU Add-on Parameters
- GPU or NPU Pod Rebuild Risks
- ELB Listener Access Control
- Master Node Flavor
- Subnet Quota of Master Nodes
- Node Runtime
- Node Pool Runtime
- Number of Node Images
- OpenKruise Compatibility Check
- Compatibility Check of Secret Encryption
- Compatibility Between the Ubuntu Kernel and GPU Driver
- Drainage Tasks
- Image Layers on a Node
- Cluster Rolling Upgrade
- Rotation Certificates
- Ingress and ELB Configuration Consistency
- Network Policies of Cluster Network Components
- Cluster and Node Pool Configurations
- Time Zone of Master Nodes
-
Nodes
- Node Overview
- Container Engines
- Node OSs
- Node Specifications
- Creating a Node
- Accepting Nodes for Management
- Logging In to a Node
-
Management Nodes
- Managing Node Labels
- Managing Node Taints
- Resetting a Node
- Removing a Node
- Synchronizing the Data of Cloud Servers
- Draining a Node
- Deleting or Unsubscribing from a Node
- Changing the Billing Mode of a Node to Yearly/Monthly
- Modifying the Auto-Renewal Configuration of a Yearly/Monthly Node
- Stopping a Node
- Performing Rolling Upgrade for Nodes
-
Node O&M
- Node Resource Reservation Policy
- Space Allocation of a Data Disk
- Maximum Number of Pods That Can Be Created on a Node
- Differences in kubelet and Runtime Component Configurations Between CCE and the Native Community
- Migrating Nodes from Docker to containerd
- Optimizing Node System Parameters
- Configuring Node Fault Detection Policies
- Executing the Pre- or Post-installation Commands During Node Creation
- ECS Event Handling Suggestions
- Node Pools
-
Workloads
- Overview
- Creating a Workload
-
Configuring a Workload
- Secure Runtime and Common Runtime
- Configuring Time Zone Synchronization
- Configuring an Image Pull Policy
- Using Third-Party Images
- Configuring Container Specifications
- Configuring Container Lifecycle Parameters
- Configuring Container Health Check
- Configuring Environment Variables
- Configuring APM
- Configuring Workload Upgrade Policies
- Configuring Tolerance Policies
- Configuring Labels and Annotations
- Scheduling a Workload
- Logging In to a Container
- Managing Workloads
- Managing Custom Resources
- Pod Security
-
Scheduling
- Overview
- CPU Scheduling
- GPU Scheduling
- NPU Scheduling
- Volcano Scheduling
- Cloud Native Hybrid Deployment
-
Network
- Overview
-
Container Network
- Overview
-
Cloud Native Network 2.0 Settings
- Cloud Native Network 2.0
- Configuring a Default Container Subnet for a CCE Turbo Cluster
- Binding a Security Group to a Pod Using an Annotation
- Binding a Security Group to a Workload Using a Security Group Policy
- Binding a Subnet and Security Group to a Namespace or Workload Using a Container Network Configuration
- Configuring a Static IP Address for a Pod
- Configuring an EIP for a Pod
- Configuring a Static EIP for a Pod
- Configuring Shared Bandwidth for a Pod with IPv6 Dual-Stack ENIs
- VPC Network Settings
- Tunnel Network Settings
- Pod Network Settings
-
Service
- Overview
- ClusterIP
- NodePort
-
LoadBalancer
- Creating a LoadBalancer Service
- Configuring LoadBalancer Services Using Annotations
- Configuring HTTP/HTTPS for a LoadBalancer Service
- Configuring SNI for a LoadBalancer Service
- Configuring HTTP/2 for a LoadBalancer Service
- Configuring an HTTP/HTTPS Header for a LoadBalancer Service
- Configuring Timeout for a LoadBalancer Service
- Configuring TLS for a LoadBalancer Service
- Configuring GZIP Data Compression for a LoadBalancer Service
- Configuring a Blocklist/Trustlist Access Policy for a LoadBalancer Service
- Configuring Health Check on Multiple Ports of a LoadBalancer Service
- Configuring Passthrough Networking for a LoadBalancer Service
- Enabling a LoadBalancer Service to Obtain the Client IP Address
- Configuring a Custom EIP for a LoadBalancer Service
- Configuring a Range of Listening Ports for LoadBalancer Services
- Setting the Pod Ready Status Through the ELB Health Check
- Enabling ICMP Security Group Rules
- DNAT
- Headless Services
-
Ingresses
- Overview
- Comparison Between ELB Ingress and Nginx Ingress
-
LoadBalancer Ingresses
- Creating a LoadBalancer Ingress on the Console
- Creating a LoadBalancer Ingress Using kubectl
- Annotations for Configuring LoadBalancer Ingresses
-
Advanced Setting Examples of LoadBalancer Ingresses
- Configuring an HTTPS Certificate for a LoadBalancer Ingress
- Updating the HTTPS Certificate for a LoadBalancer Ingress
- Configuring SNI for a LoadBalancer Ingress
- Configuring Multiple Forwarding Policies for a LoadBalancer Ingress
- Configuring HTTP/2 for a LoadBalancer Ingress
- Configuring HTTPS Backend Services for a LoadBalancer Ingress
- Configuring gRPC Backend Services for a LoadBalancer Ingress
- Configuring Timeout for a LoadBalancer Ingress
- Configuring a Slow Start for a LoadBalancer Ingress
- Configuring Grayscale Release for a LoadBalancer Ingress
- Configuring a Blocklist/Trustlist Access Policy for a LoadBalancer Ingress
- Configuring a Range of Listening Ports for a LoadBalancer Ingress
- Configuring an HTTP/HTTPS Header for a LoadBalancer Ingress
- Configuring GZIP Data Compression for a LoadBalancer Ingress
- Configuring URL Redirection for a LoadBalancer Ingress
- Configuring URL Rewriting for a LoadBalancer Ingress
- Redirecting HTTP to HTTPS for a LoadBalancer Ingress
- Configuring the Priorities of Forwarding Rules for LoadBalancer Ingresses
- Configuring a Custom Header Forwarding Policy for a LoadBalancer Ingress
- Configuring a Custom EIP for a LoadBalancer Ingress
- Configuring Cross-Origin Access for LoadBalancer Ingresses
- Configuring Advanced Forwarding Rules for a LoadBalancer Ingress
- Configuring Advanced Forwarding Actions for a LoadBalancer Ingress
- Forwarding Policy Priorities of LoadBalancer Ingresses
- Configuring Multiple Ingresses to Use the Same External ELB Port
-
Nginx Ingresses
- Creating an Nginx Ingress on the Console
- Creating an Nginx Ingress Using kubectl
- Annotations for Configuring Nginx Ingresses
-
Advanced Setting Examples of Nginx Ingresses
- Configuring an HTTPS Certificate for an Nginx Ingress
- Configuring Redirection Rules for an Nginx Ingress
- Configuring URL Rewriting Rules for an Nginx Ingress
- Configuring HTTPS Backend Services for an Nginx Ingress
- Configuring gRPC Backend Services for an Nginx Ingress
- Configuring Consistent Hashing for Load Balancing of an Nginx Ingress
- Configuring Application Traffic Mirroring for an Nginx Ingress
- Configuring Cross-Origin Access for Nginx Ingresses
- Nginx Ingress Usage Suggestions
- Optimizing NGINX Ingress Controller in High-Traffic Scenarios
- Configuring an ELB Certificate for NGINX Ingress Controller
- Migrating Data from a Bring-Your-Own Nginx Ingress to a LoadBalancer Ingress
- DNS
- Cluster Network Settings
- Configuring Intra-VPC Access
- Accessing the Internet from a Container
- Storage
- Auto Scaling
-
O&M
- Overview
- Agency Permissions
- Health Center
- Monitoring Center
- Logging
- Alarm Center
- Log Auditing
- O&M FAQ
-
O&M Best Practices
- Cloud Native Cluster Monitoring Is Compatible with Self-Built Prometheus
- Monitoring Custom Metrics Using Cloud Native Cluster Monitoring
- Monitoring Custom Metrics on AOM
- Monitoring Metrics of Master Node Components Using Prometheus
- Monitoring Metrics of NGINX Ingress Controller
- Monitoring Container Network Metrics of CCE Turbo Clusters
- Cloud Native Cost Governance
- Namespaces
- ConfigMaps and Secrets
- Add-ons
- Helm Chart
- Permissions
- Settings
- Storage Management: FlexVolume (Deprecated)
-
Best Practices
- Checklist for Deploying Containerized Applications in the Cloud
- Containerization
- Migration
- DevOps
- Disaster Recovery
-
Security
- Overview
- Configuration Suggestions on CCE Cluster Security
- Configuration Suggestions on CCE Node Security
- Configuration Suggestions on CCE Container Runtime Security
- Configuration Suggestions on CCE Container Security
- Configuration Suggestions on CCE Container Image Security
- Configuration Suggestions on CCE Secret Security
- Configuration Suggestions on CCE Workload Identity Security
- Auto Scaling
- Monitoring
-
Cluster
- Suggestions on CCE Cluster Selection
- Creating an IPv4/IPv6 Dual-Stack Cluster in CCE
- Creating a Custom CCE Node Image
- Executing the Pre- or Post-installation Commands During Node Creation
- Using OBS Buckets to Implement Custom Script Injection During Node Creation
- Connecting to Multiple Clusters Using kubectl
- Selecting a Data Disk for the Node
- Implementing Cost Visualization for a CCE Cluster
- Creating a CCE Turbo Cluster Using a Shared VPC
- Protecting a CCE Cluster Against Overload
- Managing Costs for a Cluster
-
Networking
- Planning CIDR Blocks for a Cluster
- Selecting a Network Model
- Enabling Cross-VPC Network Communications Between CCE Clusters
- Implementing Network Communications Between Containers and IDCs Using VPC and Direct Connect
- Enabling a CCE Cluster to Resolve Domain Names on Both On-Premises IDCs and HUAWEI CLOUD
- Implementing Sticky Session Through Load Balancing
- Obtaining the Client Source IP Address for a Container
- Increasing the Listening Queue Length by Configuring Container Kernel Parameters
- Configuring Passthrough Networking for a LoadBalancer Service
- Accessing an External Network from a Pod
- Deploying Nginx Ingress Controllers Using a Chart
- CoreDNS Configuration Optimization
- Pre-Binding Container ENI for CCE Turbo Clusters
- Connecting a Cluster to the Peer VPC Through an Enterprise Router
- Accessing an IP Address Outside a Cluster That Uses a VPC Network Using Source Pod IP Addresses in the Cluster
-
Storage
- Expanding the Storage Space
- Mounting Object Storage Across Accounts
- Dynamically Creating an SFS Turbo Subdirectory Using StorageClass
- Changing the Storage Class Used by a Cluster of v1.15 from FlexVolume to CSI Everest
- Using Custom Storage Classes
- Scheduling EVS Disks Across AZs Using csi-disk-topology
- Automatically Collecting JVM Dump Files That Exit Unexpectedly Using SFS 3.0
- Deploying Storage Volumes in Multiple AZs
-
Container
- Recommended Configurations for Workloads
- Properly Allocating Container Computing Resources
- Upgrading Pods Without Interrupting Services
- Modifying Kernel Parameters Using a Privileged Container
- Using Init Containers to Initialize an Application
- Setting Time Zone Synchronization
- Configuration Suggestions on Container Network Bandwidth Limit
- Configuring the /etc/hosts File of a Pod Using hostAliases
- Configuring Domain Name Resolution for CCE Containers
- Using Dual-Architecture Images (x86 and Arm) in CCE
- Locating Container Faults Using the Core Dump File
- Configuring Parameters to Delay the Pod Startup in a CCE Turbo Cluster
- Automatically Updating a Workload Version Using SWR Triggers
- Permission
- Release
- Batch Computing
-
API Reference
- Before You Start
- API Overview
- Calling APIs
-
APIs
- API URL
-
Cluster Management
- Creating a Cluster
- Reading a Specified Cluster
- Listing Clusters in a Specified Project
- Updating a Specified Cluster
- Deleting a Cluster
- Hibernating a Cluster
- Waking Up a Cluster
- Obtaining a Cluster Certificate
- Revoking a Cluster Certificate of a User
- Modifying Cluster Specifications
- Querying a Job
- Binding/Unbinding Public API Server Address
- Obtaining Cluster Access Address
- Obtaining a Cluster's Logging Configurations
- Configuring Cluster Logs
- Obtaining the Partition List
- Creating a Partition
- Obtaining Partition Details
- Updating a Partition
-
Node Management
- Creating a Node
- Reading a Specified Node
- Listing All Nodes in a Cluster
- Updating a Specified Node
- Deleting a Node
- Enabling Scale-In Protection for a Node
- Disabling Scale-In Protection for a Node
- Synchronizing Nodes
- Synchronizing Nodes in Batches
- Accepting a Node
- Managing a Node in a Customized Node Pool
- Resetting a Node
- Removing a Node
- Migrating a Node
- Migrating a Node to a Custom Node Pool
- Node Pool Management
- Storage Management
- Add-on Management
-
Cluster Upgrade
- Upgrading a Cluster
- Obtaining Cluster Upgrade Task Details
- Retrying a Cluster Upgrade Task
- Suspending a Cluster Upgrade Task (Deprecated)
- Continuing to Execute a Cluster Upgrade Task (Deprecated)
- Obtaining a List of Cluster Upgrade Task Details
- Pre-upgrade Check
- Obtaining Details About a Pre-upgrade Check Task of a Cluster
- Obtaining a List of Pre-upgrade Check Tasks of a Cluster
- Post-upgrade Check
- Cluster Backup
- Obtaining a List of Cluster Backup Task Details
- Obtaining the Cluster Upgrade Information
- Obtaining a Cluster Upgrade Path
- Obtaining the Configuration of Cluster Upgrade Feature Gates
- Enabling the Cluster Upgrade Process Booting Task
- Obtaining a List of Upgrade Workflows
- Obtaining Details About a Specified Cluster Upgrade Task
- Updating the Status of a Specified Cluster Upgrade Booting Task
- Quota Management
- API Versions
- Tag Management
- Configuration Management
-
Chart Management
- Uploading a Chart
- Obtaining a Chart List
- Obtaining a Release List
- Updating a Chart
- Creating a Release
- Deleting a Chart
- Updating a Release
- Obtaining a Chart
- Deleting a Release
- Downloading a Chart
- Obtaining a Release
- Obtaining Chart Values
- Obtaining Historical Records of a Release
- Obtaining the Quota of a User Chart
-
Add-on Instance Parameters
- CoreDNS
- CCE Container Storage (Everest)
- CCE Node Problem Detector
- Kubernetes Dashboard
- CCE Cluster Autoscaler
- NGINX Ingress Controller
- Kubernetes Metrics Server
- CCE Advanced HPA
- CCE Cloud Bursting Engine for CCI
- CCE AI Suite (NVIDIA GPU)
- CCE AI Suite (Ascend NPU)
- Volcano Scheduler
- CCE Secrets Manager for DEW
- CCE Network Metrics Exporter
- NodeLocal DNSCache
- Cloud Native Cluster Monitoring
- Cloud Native Logging
- Kubernetes APIs
- Out-of-Date APIs
- Permissions and Supported Actions
-
Appendix
- Status Code
- Error Codes
- Obtaining a Project ID
- Obtaining an Account ID
- Specifying Add-ons to Be Installed During Cluster Creation
- How to Obtain Parameters in the API URI
- Creating a VPC and Subnet
- Creating a Key Pair
- Node Flavor Description
- Adding a Salt in the password Field When Creating a Node
- Maximum Number of Pods That Can Be Created on a Node
- Node OS
- Space Allocation of a Data Disk
- Attaching Disks to a Node
- SDK Reference
-
FAQs
- Common FAQ
-
Billing
- How Is CCE Billed?
- How Do I Change the Billing Mode of a CCE Cluster from Pay-per-Use to Yearly/Monthly?
- Can I Change the Billing Mode of CCE Nodes from Pay-per-Use to Yearly/Monthly?
- Which Invoice Modes Are Supported by Huawei Cloud?
- Will I Be Notified When My Balance Is Insufficient?
- Will I Be Notified When My Account Balance Changes?
- Can I Delete a Yearly/Monthly-Billed CCE Cluster Directly When It Expires?
- How Do I Unsubscribe From CCE?
-
Cluster
- Cluster Creation
-
Cluster Running
- How Do I Locate the Fault When a Cluster Is Unavailable?
- How Do I Reset or Reinstall a CCE Cluster?
- How Do I Check Whether a Cluster Is in Multi-Master Mode?
- Can I Directly Connect to the Master Node of a CCE Cluster?
- How Do I Retrieve Data After a CCE Cluster Is Deleted?
- Why Does CCE Display Node Disk Usage Inconsistently with Cloud Eye?
- How Do I Change the Name of a CCE Cluster?
- Cluster Deletion
- Cluster Upgrade
-
Node
- Node Creation
-
Node Running
- What Should I Do If a Cluster Is Available But Some Nodes Are Unavailable?
- How Do I Troubleshoot the Failure to Remotely Log In to a Node in a CCE Cluster?
- How Do I Log In to a Node Using a Password and Reset the Password?
- How Do I Collect Logs of Nodes in a CCE Cluster?
- What Can I Do If the Container Network Becomes Unavailable After yum update Is Used to Upgrade the OS?
- What Should I Do If the vdb Disk of a Node Is Damaged and the Node Cannot Be Recovered After Reset?
- Which Ports Are Used to Install kubelet on CCE Cluster Nodes?
- How Do I Configure a Pod to Use the Acceleration Capability of a GPU Node?
- What Should I Do If I/O Suspension Occasionally Occurs When SCSI EVS Disks Are Used?
- What Should I Do If Excessive Docker Audit Logs Affect the Disk I/O?
- How Do I Fix an Abnormal Container or Node Due to No Thin Pool Disk Space?
- Where Can I Get the Listening Ports of CCE Worker Nodes?
- How Do I Rectify Failures When the NVIDIA Driver Is Used to Start Containers on GPU Nodes?
- What Can I Do If the Time of CCE Nodes Is Not Synchronized with the NTP Server?
- What Should I Do If the Data Disk Usage Is High Because a Large Volume of Data Is Written Into the Log File?
- Why Does My Node Memory Usage Obtained by Running the kubelet top node Command Exceed 100%?
- What Should I Do If "Failed to reclaim image" Is Displayed in the Node Events?
- Specification Change
-
OSs
- What Can I Do If cgroup kmem Leakage Occasionally Occurs When an Application Is Repeatedly Created or Deleted on a Node Running CentOS with an Earlier Kernel Version?
- What Should I Do If There Is a Service Access Failure After a Backend Service Upgrade or a 1-Second Latency When a Service Accesses a CCE Cluster?
- Why Are Pods Evicted by kubelet Due to Abnormal cgroup Statistics?
- When Container OOM Occurs on the CentOS Node with an Earlier Kernel Version, the Ext4 File System Is Occasionally Suspended
- What Should I Do If a DNS Resolution Failure Occurs Due to a Defect in IPVS?
- What Should I Do If the Number of ARP Entries Exceeds the Upper Limit?
- What Should I Do If a VM Is Suspended Due to an EulerOS 2.9 Kernel Defect?
-
Node Pool
- What Should I Do If a Node Pool Is Abnormal?
- What Should I Do If No Node Creation Record Is Displayed When the Node Pool Is Being Expanding?
- What Should I Do If a Node Pool Scale-Out Fails?
- What Should I Do If Some Kubernetes Events Fail to Display After Nodes Were Added to or Deleted from a Node Pool in Batches?
- How Do I Modify ECS Configurations When an ECS Cannot Be Managed by a Node Pool?
-
Workload
-
Workload Exception Troubleshooting
- How Can I Find the Fault for an Abnormal Workload?
- What Should I Do If Pod Scheduling Fails?
- What Should I Do If a Pod Fails to Pull the Image?
- What Should I Do If Container Startup Fails?
- What Should I Do If a Pod Fails to Be Evicted?
- What Should I Do If a Storage Volume Cannot Be Mounted or the Mounting Times Out?
- What Should I Do If a Workload Remains in the Creating State?
- What Should I Do If a Pod Remains in the Terminating State?
- What Should I Do If a Workload Is Stopped Caused by Pod Deletion?
- What Should I Do If an Error Occurs When I Deploy a Service on the GPU Node?
- What Should I Do If a Workload Exception Occurs Due to a Storage Volume Mount Failure?
- Why Does Pod Fail to Write Data?
- Why Is Pod Creation or Deletion Suspended on a Node Where File Storage Is Mounted?
- How Can I Locate Faults Using an Exit Code?
- Container Configuration
- Monitoring Log
-
Scheduling Policies
- How Do I Evenly Distribute Multiple Pods to Each Node?
- How Do I Prevent a Container on a Node from Being Evicted?
- Why Are Pods Not Evenly Distributed on Nodes?
- How Do I Evict All Pods on a Node?
- How Do I Check Whether a Pod Is Bound with vCPUs?
- What Should I Do If Pods Cannot Be Rescheduled After the Node Is Stopped?
- How Do I Prevent a Non-GPU or Non-NPU Workload from Being Scheduled to a GPU or NPU Node?
- Why Cannot a Pod Be Scheduled to a Node?
-
Others
- What Should I Do If a Cron Job Cannot Be Restarted After Being Stopped for a Period of Time?
- What Is a Headless Service When I Create a StatefulSet?
- What Should I Do If Error Message "Auth is empty" Is Displayed When a Private Image Is Pulled?
- What Is the Image Pull Policy for Containers in a CCE Cluster?
- Why Is the Mount Point of a Docker Container in the Kunpeng Cluster Uninstalled?
- What Can I Do If a Layer Is Missing During Image Pull?
- Why the File Permission and User in the Container Are Question Marks?
-
Workload Exception Troubleshooting
-
Networking
-
Network Exception Troubleshooting
- How Do I Locate a Workload Networking Fault?
- Why the ELB Address Cannot Be used to Access Workloads in a Cluster?
- Why the Ingress Cannot Be Accessed Outside the Cluster?
- Why Does the Browser Return Error Code 404 When I Access a Deployed Application?
- What Should I Do If a Container Fails to Access the Internet?
- What Can I Do If a VPC Subnet Cannot Be Deleted?
- How Do I Restore a Faulty Container NIC?
- What Should I Do If a Node Fails to Connect to the Internet (Public Network)?
- How Do I Resolve a Conflict Between the VPC CIDR Block and the Container CIDR Block?
- What Should I Do If the Java Error "Connection reset by peer" Is Reported During Layer-4 ELB Health Check
- How Do I Locate the Service Event Indicating That No Node Is Available for Binding?
- Why Does "Dead loop on virtual device gw_11cbf51a, fix it urgently" Intermittently Occur When I Log In to a VM using VNC?
- Why Does a Panic Occasionally Occur When I Use Network Policies on a Cluster Node?
- Why Are Lots of source ip_type Logs Generated on the VNC?
- What Should I Do If Status Code 308 Is Displayed When the Nginx Ingress Controller Is Accessed Using the Internet Explorer?
- What Should I Do If Nginx Ingress Access in the Cluster Is Abnormal After the NGINX Ingress Controller Add-on Is Upgraded?
- What Should I Do If An Error Occurred During a LoadBalancer Update?
- What Could Cause Access Exceptions After Configuring an HTTPS Certificate for a LoadBalancer Ingress?
-
Network Planning
- What Is the Relationship Between Clusters, VPCs, and Subnets?
- How Do I View the VPC CIDR Block?
- How Do I Set the VPC CIDR Block and Subnet CIDR Block for a CCE Cluster?
- How Do I Set a Container CIDR Block for a CCE Cluster?
- When Should I Use Cloud Native Network 2.0?
- What Is an ENI?
- How Can I Configure a Security Group Rule in a Cluster?
- How Do I Configure the IPv6 Service CIDR Block When Creating a CCE Turbo Cluster?
- Can Multiple NICs Be Bound to a Node in a CCE Cluster?
- Security Hardening
-
Network Configuration
- How Does CCE Communicate with Other Huawei Cloud Services over an Intranet?
- How Do I Set the Port When Configuring the Workload Access Mode on CCE?
- How Can I Achieve Compatibility Between Ingress's property and Kubernetes client-go?
- How Do I Obtain the Actual Source IP Address of a Client After a Service Is Added into Istio?
- Why Cannot an Ingress Be Created After the Namespace Is Changed?
- Why Is the Backend Server Group of an ELB Automatically Deleted After a Service Is Published to the ELB?
- How Can Container IP Addresses Survive a Container Restart?
- How Can I Check Whether an ENI Is Used by a Cluster?
- How Can I Delete a Security Group Rule Associated with a Deleted Subnet?
- How Can I Synchronize Certificates When Multiple Ingresses in Different Namespaces Share a Listener?
- How Can I Determine Which Ingress the Listener Settings Have Been Applied To?
-
Network Exception Troubleshooting
-
Storage
- How Do I Expand the Storage Capacity of a Container?
- What Are the Differences Among CCE Storage Classes in Terms of Persistent Storage and Multi-Node Mounting?
- Can I Create a CCE Node Without Adding a Data Disk to the Node?
- Can EVS Volumes in a CCE Cluster Be Restored After They Are Deleted or Expired?
- What Should I Do If the Host Cannot Be Found When Files Need to Be Uploaded to OBS During the Access to the CCE Service from a Public Network?
- How Can I Achieve Compatibility Between ExtendPathMode and Kubernetes client-go?
- What Can I Do If a Storage Volume Fails to Be Created?
- Can CCE PVCs Detect Underlying Storage Faults?
- An Error Is Reported When the Owner Group and Permissions of the Mount Point of the SFS 3.0 File System in the OS Are Modified
- Why Cannot I Delete a PV or PVC Using the kubectl delete Command?
- What Should I Do If "target is busy" Is Displayed When a Pod with Cloud Storage Mounted Is Being Deleted?
- What Should I Do If a Yearly/Monthly EVS Disk Cannot Be Automatically Created?
- Namespace
-
Chart and Add-on
- What Should I Do If the nginx-ingress Add-on Fails to Be Installed on a Cluster and Remains in the Creating State?
- What Should I Do If Residual Process Resources Exist Due to an Earlier npd Add-on Version?
- What Should I Do If a Chart Release Cannot Be Deleted Because the Chart Format Is Incorrect?
- Does CCE Support nginx-ingress?
- What Should I Do If Installation of an Add-on Fails and "The release name is already exist" Is Displayed?
- What Should I Do If a Release Creation or Upgrade Fails and "rendered manifests contain a resource that already exists" Is Displayed?
- What Can I Do If the kube-prometheus-stack Add-on Instance Fails to Be Scheduled?
- What Can I Do If a Chart Fails to Be Uploaded?
- How Do I Configure the Add-on Resource Quotas Based on Cluster Scale?
- How Can I Clean Up Residual Resources After the NGINX Ingress Controller Add-on in the Unknown State Is Deleted?
- Why TLS v1.0 and v1.1 Cannot Be Used After the NGINX Ingress Controller Add-on Is Upgraded?
-
API & kubectl FAQs
- How Can I Access a Cluster API Server?
- Can the Resources Created Using APIs or kubectl Be Displayed on the CCE Console?
- How Do I Download kubeconfig for Connecting to a Cluster Using kubectl?
- How Do I Rectify the Error Reported When Running the kubectl top node Command?
- Why Is "Error from server (Forbidden)" Displayed When I Use kubectl?
-
DNS FAQs
- What Should I Do If Domain Name Resolution Fails in a CCE Cluster?
- Why Does a Container in a CCE Cluster Fail to Perform DNS Resolution?
- Why Cannot the Domain Name of the Tenant Zone Be Resolved After the Subnet DNS Configuration Is Modified?
- How Do I Optimize the Configuration If the External Domain Name Resolution Is Slow or Times Out?
- How Do I Configure a DNS Policy for a Container?
- Image Repository FAQs
- Permissions
- Related Services
- Videos
-
More Documents
-
User Guide (ME-Abu Dhabi Region)
- Service Overview
- Getting Started
- High-Risk Operations and Solutions
-
Clusters
- Cluster Overview
- Buying a Cluster
- Connecting to a Cluster
-
Upgrading a Cluster
- Upgrade Overview
- Before You Start
- Performing an In-place Upgrade
- Performing Post-Upgrade Verification
- Migrating Services Across Clusters of Different Versions
-
Troubleshooting for Pre-upgrade Check Exceptions
- Pre-upgrade Check
- Node Restrictions
- Upgrade Management
- Add-ons
- Helm Charts
- SSH Connectivity of Master Nodes
- Node Pools
- Security Groups
- Arm Node Restrictions
- To-Be-Migrated Nodes
- Discarded Kubernetes Resources
- Compatibility Risks
- Node CCE Agent Versions
- Node CPU Usage
- CRDs
- Node Disks
- Node DNS
- Node Key Directory File Permissions
- Kubelet
- Node Memory
- Node Clock Synchronization Server
- Node OS
- Node CPUs
- Node Python Commands
- ASM Version
- Node Readiness
- Node journald
- containerd.sock
- Internal Errors
- Node Mount Points
- Kubernetes Node Taints
- Everest Restrictions
- cce-hpa-controller Restrictions
- Enhanced CPU Policies
- Health of Worker Node Components
- Health of Master Node Components
- Memory Resource Limit of Kubernetes Components
- Discarded Kubernetes APIs
- IPv6 Capabilities of a CCE Turbo Cluster
- Node NetworkManager
- Node ID File
- Node Configuration Consistency
- Node Configuration File
- CoreDNS Configuration Consistency
- sudo Commands of a Node
- Key Commands of Nodes
- Mounting of a Sock File on a Node
- HTTPS Load Balancer Certificate Consistency
- Node Mounting
- Login Permissions of User paas on a Node
- Private IPv4 Addresses of Load Balancers
- Historical Upgrade Records
- CIDR Block of the Cluster Management Plane
- GPU Add-on
- Nodes' System Parameter Settings
- Residual Package Versions
- Node Commands
- Node Swap
- nginx-ingress Upgrade
- Managing a Cluster
- Nodes
- Node Pools
-
Workloads
- Overview
- Creating a Workload
-
Configuring a Container
- Configuring Time Zone Synchronization
- Configuring an Image Pull Policy
- Using Third-Party Images
- Configuring Container Specifications
- Configuring Container Lifecycle Parameters
- Configuring Container Health Check
- Configuring Environment Variables
- Configuring APM Settings for Performance Bottleneck Analysis
- Workload Upgrade Policies
- Scheduling Policies (Affinity/Anti-affinity)
- Taints and Tolerations
- Labels and Annotations
- Accessing a Container
- Managing Workloads and Jobs
- Kata Runtime and Common Runtime
- Scheduling
-
Network
- Overview
- Container Network Models
-
Service
- Overview
- ClusterIP
- NodePort
-
LoadBalancer
- Creating a LoadBalancer Service
- Using Annotations to Configure Load Balancing
- Service Using HTTP or HTTPS
- Configuring Health Check for Multiple Ports
- Setting the Pod Ready Status Through the ELB Health Check
- Configuring Timeout for a LoadBalancer Service
- Enabling Passthrough Networking for LoadBalancer Services
- Enabling ICMP Security Group Rules
- DNAT
- Headless Service
-
Ingresses
- Overview
-
ELB Ingresses
- Creating an ELB Ingress on the Console
- Using kubectl to Create an ELB Ingress
- Configuring ELB Ingresses Using Annotations
- Configuring HTTPS Certificates for ELB Ingresses
- Configuring the Server Name Indication (SNI) for ELB Ingresses
- ELB Ingresses Routing to Multiple Services
- ELB Ingresses Using HTTP/2
- Interconnecting ELB Ingresses with HTTPS Backend Services
- Configuring Timeout for an ELB Ingress
-
Nginx Ingresses
- Creating Nginx Ingresses on the Console
- Using kubectl to Create an Nginx Ingress
- Configuring Nginx Ingresses Using Annotations
- Configuring HTTPS Certificates for Nginx Ingresses
- Configuring URL Rewriting Rules for Nginx Ingresses
- Interconnecting Nginx Ingresses with HTTPS Backend Services
- Nginx Ingresses Using Consistent Hashing for Load Balancing
- DNS
- Container Network Settings
- Cluster Network Settings
- Configuring Intra-VPC Access
- Accessing Public Networks from a Container
- Storage
- Observability
- Namespaces
- ConfigMaps and Secrets
- Auto Scaling
-
Add-ons
- Overview
- CoreDNS
- CCE Container Storage (Everest)
- CCE Node Problem Detector
- Kubernetes Dashboard
- CCE Cluster Autoscaler
- Nginx Ingress Controller
- Kubernetes Metrics Server
- CCE Advanced HPA
- CCE AI Suite (NVIDIA GPU)
- CCE AI Suite (Ascend NPU)
- Volcano Scheduler
- CCE Secrets Manager for DEW
- CCE Network Metrics Exporter
- NodeLocal DNSCache
- Prometheus
- Helm Chart
- Permissions
-
Best Practices
- Checklist for Deploying Containerized Applications in the Cloud
- Containerization
- Disaster Recovery
- Security
- Auto Scaling
- Monitoring
- Cluster
- Networking
- Storage
- Container
- Permission
- Release
-
FAQs
- Common Questions
- Cluster
-
Node
- Node Creation
-
Node Running
- What Should I Do If a Cluster Is Available But Some Nodes Are Unavailable?
- How Do I Log In to a Node Using a Password and Reset the Password?
- How Do I Collect Logs of Nodes in a CCE Cluster?
- What Should I Do If the vdb Disk of a Node Is Damaged and the Node Cannot Be Recovered After Reset?
- What Should I Do If I/O Suspension Occasionally Occurs When SCSI EVS Disks Are Used?
- How Do I Fix an Abnormal Container or Node Due to No Thin Pool Disk Space?
- How Do I Rectify Failures When the NVIDIA Driver Is Used to Start Containers on GPU Nodes?
- Specification Change
- Node Pool
-
Workload
-
Workload Abnormalities
- How Do I Use Events to Fix Abnormal Workloads?
- What Should I Do If Pod Scheduling Fails?
- What Should I Do If a Pod Fails to Pull the Image?
- What Should I Do If Container Startup Fails?
- What Should I Do If a Pod Fails to Be Evicted?
- What Should I Do If a Storage Volume Cannot Be Mounted or the Mounting Times Out?
- What Should I Do If a Workload Remains in the Creating State?
- What Should I Do If Pods in the Terminating State Cannot Be Deleted?
- What Should I Do If a Workload Is Stopped Caused by Pod Deletion?
- What Should I Do If an Error Occurs When Deploying a Service on the GPU Node?
-
Container Configuration
- When Is Pre-stop Processing Used?
- How Do I Set an FQDN for Accessing a Specified Container in the Same Namespace?
- What Should I Do If Health Check Probes Occasionally Fail?
- How Do I Set the umask Value for a Container?
- What Can I Do If an Error Is Reported When a Deployed Container Is Started After the JVM Startup Heap Memory Parameter Is Specified for ENTRYPOINT in Dockerfile?
- What Is the Retry Mechanism When CCE Fails to Start a Pod?
- Scheduling Policies
-
Others
- What Should I Do If a Scheduled Task Cannot Be Restarted After Being Stopped for a Period of Time?
- What Is a Headless Service When I Create a StatefulSet?
- What Should I Do If Error Message "Auth is empty" Is Displayed When a Private Image Is Pulled?
- Why Cannot a Pod Be Scheduled to a Node?
- What Is the Image Pull Policy for Containers in a CCE Cluster?
- What Can I Do If a Layer Is Missing During Image Pull?
-
Workload Abnormalities
- Networking
-
Storage
- What Are the Differences Among CCE Storage Classes in Terms of Persistent Storage and Multi-node Mounting?
- Can I Add a Node Without a Data Disk?
- What Should I Do If the Host Cannot Be Found When Files Need to Be Uploaded to OBS During the Access to the CCE Service from a Public Network?
- How Can I Achieve Compatibility Between ExtendPathMode and Kubernetes client-go?
- Can CCE PVCs Detect Underlying Storage Faults?
- Namespace
- Chart and Add-on
-
API & kubectl FAQs
- How Can I Access a CCE Cluster?
- Can the Resources Created Using APIs or kubectl Be Displayed on the CCE Console?
- How Do I Download kubeconfig for Connecting to a Cluster Using kubectl?
- How Do I Rectify the Error Reported When Running the kubectl top node Command?
- Why Is "Error from server (Forbidden)" Displayed When I Use kubectl?
- DNS FAQs
- Image Repository FAQs
- Permissions
- Reference
-
API Reference (ME-Abu Dhabi Region)
- Before You Start
- API Overview
- Calling APIs
-
APIs
- API URL
-
Cluster Management
- Creating a Cluster
- Reading a Specified Cluster
- Listing Clusters in a Specified Project
- Updating a Specified Cluster
- Deleting a Cluster
- Hibernating a Cluster
- Waking Up a Cluster
- Obtaining a Cluster Certificate
- Querying a Job
- Binding/Unbinding Public API Server Address
- Obtaining Cluster Access Address
- Node Management
- Node Pool Management
- Storage Management
- Add-on Management
- Quota Management
- API Versions
- Kubernetes APIs
- Permissions Policies and Supported Actions
-
Appendix
- Status Code
- Error Codes
- Obtaining a Project ID
- Obtaining the Account ID
- Specifying Add-ons to Be Installed During Cluster Creation
- How to Obtain Parameters in the API URI
- Creating a VPC and Subnet
- Creating a Key Pair
- Node Flavor Description
- Adding a Salt in the password Field When Creating a Node
- Maximum Number of Pods That Can Be Created on a Node
- Node OS
- Data Disk Space Allocation
- Attaching Disks to a Node
- Change History
-
User Guide (Paris Regions)
- Service Overview
-
Product Bulletin
- Risky Operations on Cluster Nodes
- CCE Security Guide
- Cluster Node OS Patch Notes
-
Vulnerability Notice
- Notice on the Kubernetes Security Vulnerability (CVE-2022-3172)
- Privilege Escalation Vulnerability in Linux openvswitch Kernel Module (CVE-2022-2639)
- Notice on CRI-O Container Runtime Engine Arbitrary Code Execution Vulnerability (CVE-2022-0811)
- Notice on the Container Escape Vulnerability Caused by the Linux Kernel (CVE-2022-0492)
- Linux Kernel Integer Overflow Vulnerability (CVE-2022-0185)
- Kubernetes Basics
- Getting Started
- High-Risk Operations and Solutions
-
Clusters
- Cluster Overview
- Creating a Cluster
- Connecting to a Cluster
-
Upgrading a Cluster
- Upgrade Overview
- Before You Start
- Performing In-place Upgrade
- Performing Post-Upgrade Verification
- Migrating Services Across Clusters of Different Versions
-
Troubleshooting for Pre-upgrade Check Exceptions
- Pre-upgrade Check
- Node Restrictions
- Upgrade Management
- Add-ons
- Helm Charts
- SSH Connectivity of Master Nodes
- Node Pools
- Security Groups
- Arm Node Restrictions
- To-Be-Migrated Nodes
- Discarded Kubernetes Resources
- Compatibility Risks
- Node CCE Agent Versions
- Node CPU Usage
- CRDs
- Node Disks
- Node DNS
- Node Key Directory File Permissions
- Kubelet
- Node Memory
- Node Clock Synchronization Server
- Node OS
- Node CPUs
- Node Python Commands
- ASM Version
- Node Readiness
- Node journald
- containerd.sock
- Internal Errors
- Node Mount Points
- Kubernetes Node Taints
- everest Restrictions
- cce-hpa-controller Restrictions
- Enhanced CPU Policies
- Health of Worker Node Components
- Health of Master Node Components
- Memory Resource Limit of Kubernetes Components
- Discarded Kubernetes APIs
- IPv6 Capabilities of a CCE Turbo Cluster
- Node NetworkManager
- Node ID File
- Node Configuration Consistency
- Node Configuration File
- CoreDNS Configuration Consistency
- sudo Commands of a Node
- Key Commands of Nodes
- Mounting of a Sock File on a Node
- HTTPS Load Balancer Certificate Consistency
- Node Mounting
- Login Permissions of User paas on a Node
- Private IPv4 Addresses of Load Balancers
- Historical Upgrade Records
- CIDR Block of the Cluster Management Plane
- GPU Add-on
- Nodes' System Parameter Settings
- Residual Package Versions
- Node Commands
- Node Swap
- nginx-ingress Upgrade
- Managing a Cluster
- Nodes
- Node Pools
-
Workloads
- Overview
- Creating a Workload
-
Configuring a Container
- Configuring Time Zone Synchronization
- Configuring an Image Pull Policy
- Using Third-Party Images
- Setting Container Specifications
- Setting Container Lifecycle Parameters
- Setting Health Check for a Container
- Setting an Environment Variable
- Configuring the Workload Upgrade Policy
- Scheduling Policy (Affinity/Anti-affinity)
- Taints and Tolerations
- Labels and Annotations
- Accessing a Container
- Managing Workloads and Jobs
- Scheduling
-
Network
- Overview
- Container Network Models
- Service
-
Ingresses
- Overview
-
ELB Ingresses
- Creating an ELB Ingress on the Console
- Using kubectl to Create an ELB Ingress
- Configuring ELB Ingresses Using Annotations
- Configuring HTTPS Certificates for ELB Ingresses
- Configuring the Server Name Indication (SNI) for ELB Ingresses
- ELB Ingresses Routing to Multiple Services
- ELB Ingresses Using HTTP/2
- Interconnecting ELB Ingresses with HTTPS Backend Services
- Configuring Timeout for an ELB Ingress
-
Nginx Ingresses
- Creating Nginx Ingresses on the Console
- Using kubectl to Create an Nginx Ingress
- Configuring HTTPS Certificates for Nginx Ingresses
- Configuring URL Rewriting Rules for Nginx Ingresses
- Interconnecting Nginx Ingresses with HTTPS Backend Services
- Nginx Ingresses Using Consistent Hashing for Load Balancing
- Configuring Nginx Ingresses Using Annotations
- DNS
- Container Network Settings
- Cluster Network Settings
- Configuring Intra-VPC Access
- Accessing Public Networks from a Container
- Storage
- Observability
- Namespaces
- ConfigMaps and Secrets
- Auto Scaling
- Add-ons
- Helm Chart
- Permissions
-
FAQs
- Common Questions
- Billing
- Cluster
-
Node
- Node Creation
-
Node Running
- What Should I Do If a Cluster Is Available But Some Nodes Are Unavailable?
- How Do I Log In to a Node Using a Password and Reset the Password?
- How Do I Collect Logs of Nodes in a CCE Cluster?
- What Should I Do If the vdb Disk of a Node Is Damaged and the Node Cannot Be Recovered After Reset?
- What Should I Do If I/O Suspension Occasionally Occurs When SCSI EVS Disks Are Used?
- How Do I Fix an Abnormal Container or Node Due to No Thin Pool Disk Space?
- How Do I Rectify Failures When the NVIDIA Driver Is Used to Start Containers on GPU Nodes?
- Specification Change
- Node Pool
-
Workload
-
Workload Abnormalities
- How Do I Use Events to Fix Abnormal Workloads?
- What Should I Do If Pod Scheduling Fails?
- What Should I Do If a Pod Fails to Pull the Image?
- What Should I Do If Container Startup Fails?
- What Should I Do If a Pod Fails to Be Evicted?
- What Should I Do If a Storage Volume Cannot Be Mounted or the Mounting Times Out?
- What Should I Do If a Workload Remains in the Creating State?
- What Should I Do If Pods in the Terminating State Cannot Be Deleted?
- What Should I Do If a Workload Is Stopped Caused by Pod Deletion?
- What Should I Do If an Error Occurs When Deploying a Service on the GPU Node?
- What Should I Do If Sandbox-Related Errors Are Reported When the Pod Remains in the Creating State?
-
Container Configuration
- When Is Pre-stop Processing Used?
- How Do I Set an FQDN for Accessing a Specified Container in the Same Namespace?
- What Should I Do If Health Check Probes Occasionally Fail?
- How Do I Set the umask Value for a Container?
- What Can I Do If an Error Is Reported When a Deployed Container Is Started After the JVM Startup Heap Memory Parameter Is Specified for ENTRYPOINT in Dockerfile?
- What Is the Retry Mechanism When CCE Fails to Start a Pod?
- Scheduling Policies
-
Others
- What Should I Do If a Scheduled Task Cannot Be Restarted After Being Stopped for a Period of Time?
- What Is a Headless Service When I Create a StatefulSet?
- What Should I Do If Error Message "Auth is empty" Is Displayed When a Private Image Is Pulled?
- Why Cannot a Pod Be Scheduled to a Node?
- What Is the Image Pull Policy for Containers in a CCE Cluster?
- What Can I Do If a Layer Is Missing During Image Pull?
-
Workload Abnormalities
-
Networking
- Network Planning
-
Network Fault
- How Do I Locate a Workload Networking Fault?
- Why Does the Browser Return Error Code 404 When I Access a Deployed Application?
- What Should I Do If a Container Fails to Access the Internet?
- What Should I Do If a Node Fails to Connect to the Internet (Public Network)?
- What Should I Do If an Nginx Ingress Access in the Cluster Is Abnormal After the Add-on Is Upgraded?
-
Storage
- What Are the Differences Among CCE Storage Classes in Terms of Persistent Storage and Multi-node Mounting?
- Can I Add a Node Without a Data Disk?
- What Should I Do If the Host Cannot Be Found When Files Need to Be Uploaded to OBS During the Access to the CCE Service from a Public Network?
- How Can I Achieve Compatibility Between ExtendPathMode and Kubernetes client-go?
- Can CCE PVCs Detect Underlying Storage Faults?
- Namespace
- Chart and Add-on
-
API & kubectl FAQs
- How Can I Access a Cluster API Server?
- Can the Resources Created Using APIs or kubectl Be Displayed on the CCE Console?
- How Do I Download kubeconfig for Connecting to a Cluster Using kubectl?
- How Do I Rectify the Error Reported When Running the kubectl top node Command?
- Why Is "Error from server (Forbidden)" Displayed When I Use kubectl?
- DNS FAQs
- Image Repository FAQs
- Permissions
- Reference
-
Best Practices
- Checklist for Deploying Containerized Applications in the Cloud
- Containerization
- Disaster Recovery
- Security
- Auto Scaling
- Monitoring
- Cluster
- Networking
-
Storage
- Expanding the Storage Space
- Mounting an Object Storage Bucket of a Third-Party Tenant
- Dynamically Creating and Mounting Subdirectories of an SFS Turbo File System
- How Do I Change the Storage Class Used by a Cluster of v1.15 from FlexVolume to CSI Everest?
- Custom Storage Classes
- Enabling Automatic Topology for EVS Disks When Nodes Are Deployed in Different AZs (csi-disk-topology)
- Container
- Permission
- Release
- Migrating Data from CCE 1.0 to CCE 2.0
-
API Reference (Paris Regions)
- Before You Start
- API Overview
- Calling APIs
-
APIs
- API URL
-
Cluster Management
- Creating a Cluster
- Reading a Specified Cluster
- Listing Clusters in a Specified Project
- Updating a Specified Cluster
- Deleting a Cluster
- Hibernating a Cluster
- Waking Up a Cluster
- Obtaining a Cluster Certificate
- Querying a Job
- Binding/Unbinding Public API Server Address
- Obtaining Cluster Access Address
- Node Management
- Node Pool Management
- Add-on Management
- Quota Management
- API Versions
- Kubernetes APIs
- Permissions Policies and Supported Actions
-
Appendix
- Status Code
- Error Codes
- Obtaining a Project ID
- Obtaining the Account ID
- Specifying Add-ons to Be Installed During Cluster Creation
- How to Obtain Parameters in the API URI
- Creating a VPC and Subnet
- Creating a Key Pair
- Node Flavor Description
- Adding a Salt in the password Field When Creating a Node
- Maximum Number of Pods That Can Be Created on a Node
- Node OS
- Data Disk Space Allocation
- Attaching Disks to a Node
- Change History
-
User Guide (Kuala Lumpur Region)
- Service Overview
- CCE Console Upgrade
- Getting Started
- High-Risk Operations
-
Clusters
- Cluster Overview
- Buying a Cluster
- Connecting to a Cluster
- Managing a Cluster
-
Upgrading a Cluster
- Process and Method of Upgrading a Cluster
- Before You Start
- Performing Post-Upgrade Verification
- Migrating Services Across Clusters of Different Versions
-
Troubleshooting for Pre-upgrade Check Exceptions
- Pre-upgrade Check
- Node Restrictions
- Upgrade Management
- Add-ons
- Helm Charts
- SSH Connectivity of Master Nodes
- Node Pools
- Security Groups
- Arm Node Restrictions
- Residual Nodes
- Discarded Kubernetes Resources
- Compatibility Risks
- CCE Agent Versions
- Node CPU Usage
- CRDs
- Node Disks
- Node DNS
- Node Key Directory File Permissions
- kubelet
- Node Memory
- Node Clock Synchronization Server
- Node OS
- Node CPUs
- Node Python Commands
- ASM Version
- Node Readiness
- Node journald
- containerd.sock
- Internal Errors
- Node Mount Points
- Kubernetes Node Taints
- Everest Restrictions
- cce-hpa-controller Restrictions
- Enhanced CPU Policies
- Health of Worker Node Components
- Health of Master Node Components
- Memory Resource Limit of Kubernetes Components
- Discarded Kubernetes APIs
- NetworkManager
- Node ID File
- Node Configuration Consistency
- Node Configuration File
- CoreDNS Configuration Consistency
- sudo
- Key Node Commands
- Mounting of a Sock File on a Node
- HTTPS Load Balancer Certificate Consistency
- Node Mounting
- Login Permissions of User paas on a Node
- Private IPv4 Addresses of Load Balancers
- Historical Upgrade Records
- CIDR Block of the Cluster Management Plane
- GPU Add-on
- Nodes' System Parameters
- Residual Package Version Data
- Node Commands
- Node Swap
- nginx-ingress Upgrade
- Upgrade of Cloud Native Cluster Monitoring
- containerd Pod Restart Risks
- Key GPU Add-on Parameters
- GPU or NPU Pod Rebuild Risks
- ELB Listener Access Control
- Master Node Flavor
- Subnet Quota of Master Nodes
- Node Runtime
- Node Pool Runtime
- Number of Node Images
- Nodes
- Node Pools
-
Workloads
- Overview
- Creating a Workload
-
Configuring a Workload
- Configuring Time Zone Synchronization
- Configuring an Image Pull Policy
- Using Third-Party Images
- Configuring Container Specifications
- Configuring Container Lifecycle Parameters
- Configuring Container Health Check
- Configuring Environment Variables
- Configuring Workload Upgrade Policies
- Scheduling Policies (Affinity/Anti-affinity)
- Configuring Tolerance Policies
- Configuring Labels and Annotations
- Logging In to a Container
- Managing Workloads
- Managing Custom Resources
- Pod Security
- Scheduling
-
Network
- Overview
- Container Network
-
Service
- Overview
- ClusterIP
- NodePort
-
LoadBalancer
- Creating a LoadBalancer Service
- Using Annotations to Balance Load
- Configuring HTTP/HTTPS for a LoadBalancer Service
- Configuring SNI for a LoadBalancer Service
- Configuring HTTP/2 for a LoadBalancer Service
- Configuring Timeout for a LoadBalancer Service
- Configuring Health Check on Multiple Ports of a LoadBalancer Service
- Configuring Passthrough Networking for a LoadBalancer Service
- Enabling ICMP Security Group Rules
- DNAT
- Headless Services
-
Ingresses
- Overview
-
LoadBalancer Ingresses
- Creating a LoadBalancer Ingress on the Console
- Using kubectl to Create a LoadBalancer Ingress
- Configuring a LoadBalancer Ingress Using Annotations
- Configuring an HTTPS Certificate for a LoadBalancer Ingress
- Configuring SNI for a LoadBalancer Ingress
- Routing a LoadBalancer Ingress to Multiple Services
- Configuring HTTP/2 for a LoadBalancer Ingress
- Configuring HTTPS Backend Services for a LoadBalancer Ingress
- Configuring Timeout for a LoadBalancer Ingress
-
Nginx Ingresses
- Creating Nginx Ingresses on the Console
- Using kubectl to Create an Nginx Ingress
- Configuring Nginx Ingresses Using Annotations
- Configuring an HTTPS Certificate for an Nginx Ingress
- Configuring HTTPS Backend Services for an Nginx Ingress
- Configuring Consistent Hashing for Load Balancing of an Nginx Ingress
- DNS
- Configuring Intra-VPC Access
- Accessing the Internet from a Container
- Storage
- Observability
- Auto Scaling
- Namespaces
- ConfigMaps and Secrets
- Add-ons
- Helm Chart
- Permissions
-
Best Practices
- Checklist for Deploying Containerized Applications in the Cloud
- Containerization
- Disaster Recovery
- Security
- Auto Scaling
- Monitoring
- Cluster
- Networking
- Storage
- Container
- Permission
- Release
-
FAQs
- Common Questions
- Cluster
-
Node
- Node Creation
-
Node Running
- What Should I Do If a Cluster Is Available But Some Nodes Are Unavailable?
- How Do I Log In to a Node Using a Password and Reset the Password?
- How Do I Collect Logs of Nodes in a CCE Cluster?
- What Should I Do If the vdb Disk of a Node Is Damaged and the Node Cannot Be Recovered After Reset?
- What Should I Do If I/O Suspension Occasionally Occurs When SCSI EVS Disks Are Used?
- How Do I Fix an Abnormal Container or Node Due to No Thin Pool Disk Space?
- How Do I Rectify Failures When the NVIDIA Driver Is Used to Start Containers on GPU Nodes?
- Specification Change
- OSs
- Node Pool
-
Workload
-
Workload Abnormalities
- How Do I Use Events to Fix Abnormal Workloads?
- What Should I Do If Pod Scheduling Fails?
- What Should I Do If a Pod Fails to Pull the Image?
- What Should I Do If Container Startup Fails?
- What Should I Do If a Pod Fails to Be Evicted?
- What Should I Do If a Storage Volume Cannot Be Mounted or the Mounting Times Out?
- What Should I Do If a Workload Remains in the Creating State?
- What Should I Do If Pods in the Terminating State Cannot Be Deleted?
- What Should I Do If a Workload Is Stopped Caused by Pod Deletion?
- What Should I Do If an Error Occurs When Deploying a Service on the GPU Node?
- Container Configuration
- Scheduling Policies
-
Others
- What Should I Do If a Scheduled Task Cannot Be Restarted After Being Stopped for a Period of Time?
- What Is a Headless Service When I Create a StatefulSet?
- What Should I Do If Error Message "Auth is empty" Is Displayed When a Private Image Is Pulled?
- What Is the Image Pull Policy for Containers in a CCE Cluster?
- What Can I Do If a Layer Is Missing During Image Pull?
-
Workload Abnormalities
-
Networking
- Network Planning
-
Network Fault
- How Do I Locate a Workload Networking Fault?
- Why Does the Browser Return Error Code 404 When I Access a Deployed Application?
- What Should I Do If a Container Fails to Access the Internet?
- What Should I Do If a Node Fails to Connect to the Internet (Public Network)?
- What Should I Do If an Nginx Ingress Access in the Cluster Is Abnormal After the Add-on Is Upgraded?
- Security Hardening
- Network Configuration
-
Storage
- How Do I Expand the Storage Capacity of a Container?
- What Are the Differences Among CCE Storage Classes in Terms of Persistent Storage and Multi-node Mounting?
- Can I Create a CCE Node Without Adding a Data Disk to the Node?
- What Should I Do If the Host Cannot Be Found When Files Need to Be Uploaded to OBS During the Access to the CCE Service from a Public Network?
- How Can I Achieve Compatibility Between ExtendPathMode and Kubernetes client-go?
- Can CCE PVCs Detect Underlying Storage Faults?
- Namespace
-
Chart and Add-on
- What Should I Do If Installation of an Add-on Fails and "The release name is already exist" Is Displayed?
- How Do I Configure the Add-on Resource Quotas Based on Cluster Scale?
- How Can I Clean Up Residual Resources After the NGINX Ingress Controller Add-on in the Unknown State Is Deleted?
- Why TLS v1.0 and v1.1 Cannot Be Used After the NGINX Ingress Controller Add-on Is Upgraded?
-
API & kubectl FAQs
- How Can I Access a Cluster API Server?
- Can the Resources Created Using APIs or kubectl Be Displayed on the CCE Console?
- How Do I Download kubeconfig for Connecting to a Cluster Using kubectl?
- How Do I Rectify the Error Reported When Running the kubectl top node Command?
- Why Is "Error from server (Forbidden)" Displayed When I Use kubectl?
- DNS FAQs
- Image Repository FAQs
- Permissions
-
API Reference (Kuala Lumpur Region)
- Before You Start
- API Overview
- Calling APIs
-
APIs
- API URL
-
Cluster Management
- Creating a Cluster
- Reading a Specified Cluster
- Listing Clusters in a Specified Project
- Updating a Specified Cluster
- Deleting a Cluster
- Hibernating a Cluster
- Waking Up a Cluster
- Obtaining a Cluster Certificate
- Modifying Cluster Specifications
- Querying a Job
- Binding/Unbinding Public API Server Address
- Node Management
- Node Pool Management
- Storage Management
- Add-on Management
- Tag Management
- Configuration Management
-
Chart Management
- Uploading a Chart
- Obtaining a Chart List
- Obtaining a Release List
- Updating a Chart
- Creating a Release
- Deleting a Chart
- Updating a Release
- Obtaining a Chart
- Deleting a Release
- Downloading a Chart
- Obtaining a Release
- Obtaining Chart Values
- Obtaining Historical Records of a Release
- Obtaining the Quota of a User Chart
- Kubernetes APIs
- Permissions Policies and Supported Actions
-
Appendix
- Status Code
- Error Codes
- Obtaining a Project ID
- Obtaining a Domain ID
- Specifying Add-ons to Be Installed During Cluster Creation
- How to Obtain Parameters in the API URI
- Creating a VPC and Subnet
- Creating a Key Pair
- Node Flavor Description
- Adding a Salt in the password Field When Creating a Node
- Maximum Number of Pods That Can Be Created on a Node
- Node OS
- Data Disk Space Allocation
- Attaching Disks to a Node
-
User Guide (Ankara Region)
- Service Overview
- Product Bulletin
- Getting Started
- High-Risk Operations and Solutions
-
Clusters
- Cluster Overview
- Creating a Cluster
- Connecting to a Cluster
-
Upgrading a Cluster
- Upgrade Overview
- Before You Start
- Performing Post-Upgrade Verification
- Migrating Services Across Clusters of Different Versions
-
Troubleshooting for Pre-upgrade Check Exceptions
- Pre-upgrade Check
- Node Restrictions
- Upgrade Management
- Add-ons
- Helm Charts
- SSH Connectivity of Master Nodes
- Node Pools
- Security Groups
- Arm Node Restrictions
- To-Be-Migrated Nodes
- Discarded Kubernetes Resources
- Compatibility Risks
- CCE Agent Versions
- Node CPU Usage
- CRDs
- Node Disks
- Node DNS
- Node Key Directory File Permissions
- Kubelet
- Node Memory
- Node Clock Synchronization Server
- Node OS
- Node CPU Cores
- Node Python Commands
- ASM Version
- Node Readiness
- Node journald
- containerd.sock
- Internal Error
- Node Mount Points
- Kubernetes Node Taints
- Everest Restrictions
- cce-hpa-controller Limitations
- Enhanced CPU Policies
- Health of Worker Node Components
- Health of Master Node Components
- Memory Resource Limit of Kubernetes Components
- Discarded Kubernetes APIs
- Node NetworkManager
- Node ID File
- Node Configuration Consistency
- Node Configuration File
- CoreDNS Configuration Consistency
- sudo Commands of a Node
- Key Commands of Nodes
- Mounting of a Sock File on a Node
- HTTPS Load Balancer Certificate Consistency
- Node Mounting
- Login Permissions of User paas on a Node
- Private IPv4 Addresses of Load Balancers
- Historical Upgrade Records
- CIDR Block of the Cluster Management Plane
- GPU Add-on
- Nodes' System Parameters
- Residual Package Versions
- Node Commands
- Node Swap
- nginx-ingress Upgrade
- Upgrade of Cloud Native Cluster Monitoring
- containerd Pod Restart Risks
- Key GPU Add-on Parameters
- GPU or NPU Pod Rebuild Risks
- ELB Listener Access Control
- Master Node Flavor
- Subnet Quota of Master Nodes
- Node Runtime
- Node Pool Runtime
- Number of Node Images
- Managing a Cluster
- Nodes
- Node Pools
-
Workloads
- Overview
- Creating a Workload
-
Configuring a Container
- Configuring Time Zone Synchronization
- Configuring an Image Pull Policy
- Using Third-Party Images
- Configuring Container Specifications
- Configuring Container Lifecycle Parameters
- Configuring Container Health Check
- Configuring Environment Variables
- Workload Upgrade Policies
- Scheduling Policies (Affinity/Anti-affinity)
- Taints and Tolerations
- Labels and Annotations
- Accessing a Container
- Managing Workloads and Jobs
- Managing Custom Resources
- Scheduling
-
Network
- Overview
- Container Network Models
-
Service
- Overview
- ClusterIP
- NodePort
-
LoadBalancer
- Creating a LoadBalancer Service
- Using Annotations to Balance Load
- Configuring an HTTP or HTTPS Service
- Configuring SNI for a Service
- Configuring HTTP/2 for a Service
- Configuring Timeout for a Service
- Configuring Health Check on Multiple Service Ports
- Enabling Passthrough Networking for LoadBalancer Services
- Enabling ICMP Security Group Rules
- Headless Services
-
Ingresses
- Overview
-
LoadBalancer Ingresses
- Creating a LoadBalancer Ingress on the Console
- Using kubectl to Create a LoadBalancer Ingress
- Configuring a LoadBalancer Ingress Using Annotations
- Configuring an HTTPS Certificate for a LoadBalancer Ingress
- Configuring SNI for a LoadBalancer Ingress
- LoadBalancer Ingresses to Multiple Services
- Configuring HTTP/2 for a LoadBalancer Ingress
- Configuring URL Redirection for a LoadBalancer Ingress
- Configuring URL Rewriting for a LoadBalancer Ingress
- Configuring Timeout for a LoadBalancer Ingress
- Configuring a Custom Header Forwarding Policy for a LoadBalancer Ingress
-
Nginx Ingresses
- Creating Nginx Ingresses on the Console
- Using kubectl to Create an Nginx Ingress
- Configuring Nginx Ingresses Using Annotations
- Configuring HTTPS Certificates for Nginx Ingresses
- Configuring Redirection Rules for an Nginx Ingress
- Configuring URL Rewriting Rules for Nginx Ingresses
- Nginx Ingresses Using Consistent Hashing for Load Balancing
- DNS
- Container Network Settings
- Cluster Network Settings
- Configuring Intra-VPC Access
- Accessing the Internet from a Container
- Storage
- Observability
- Namespaces
- ConfigMaps and Secrets
- Auto Scaling
-
Add-ons
- Overview
- CoreDNS
- CCE Container Storage (Everest)
- CCE Node Problem Detector
- Kubernetes Dashboard
- CCE Cluster Autoscaler
- Nginx Ingress Controller
- Kubernetes Metrics Server
- CCE Advanced HPA
- CCE AI Suite (NVIDIA GPU)
- CCE AI Suite (Ascend NPU)
- Volcano Scheduler
- NodeLocal DNSCache
- Cloud Native Cluster Monitoring
- Cloud Native Logging
- Grafana
- Prometheus
- Helm Chart
- Permissions
-
FAQs
- Common Questions
- Cluster
-
Node
- Node Creation
-
Node Running
- What Should I Do If a Cluster Is Available But Some Nodes Are Unavailable?
- How Do I Log In to a Node Using a Password and Reset the Password?
- How Do I Collect Logs of Nodes in a CCE Cluster?
- What Should I Do If the vdb Disk of a Node Is Damaged and the Node Cannot Be Recovered After Reset?
- What Should I Do If I/O Suspension Occasionally Occurs When SCSI EVS Disks Are Used?
- How Do I Fix an Abnormal Container or Node Due to No Thin Pool Disk Space?
- How Do I Rectify Failures When the NVIDIA Driver Is Used to Start Containers on GPU Nodes?
- Specification Change
- OSs
- Node Pool
-
Workload
-
Workload Abnormalities
- How Do I Use Events to Fix Abnormal Workloads?
- What Should I Do If Pod Scheduling Fails?
- What Should I Do If a Pod Fails to Pull the Image?
- What Should I Do If Container Startup Fails?
- What Should I Do If a Pod Fails to Be Evicted?
- What Should I Do If a Storage Volume Cannot Be Mounted or the Mounting Times Out?
- What Should I Do If a Workload Remains in the Creating State?
- What Should I Do If Pods in the Terminating State Cannot Be Deleted?
- What Should I Do If a Workload Is Stopped Caused by Pod Deletion?
- What Should I Do If an Error Occurs When Deploying a Service on the GPU Node?
- Container Configuration
- Scheduling Policies
-
Others
- What Should I Do If a Scheduled Task Cannot Be Restarted After Being Stopped for a Period of Time?
- What Is a Headless Service When I Create a StatefulSet?
- What Should I Do If Error Message "Auth is empty" Is Displayed When a Private Image Is Pulled?
- Why Cannot a Pod Be Scheduled to a Node?
- What Is the Image Pull Policy for Containers in a CCE Cluster?
- What Can I Do If a Layer Is Missing During Image Pull?
-
Workload Abnormalities
-
Networking
- Network Planning
-
Network Fault
- How Do I Locate a Workload Networking Fault?
- Why Does the Browser Return Error Code 404 When I Access a Deployed Application?
- What Should I Do If a Container Fails to Access the Internet?
- What Should I Do If a Node Fails to Connect to the Internet (Public Network)?
- What Should I Do If an Nginx Ingress Access in the Cluster Is Abnormal After the Add-on Is Upgraded?
- Security Hardening
- Others
-
Storage
- What Are the Differences Among CCE Storage Classes in Terms of Persistent Storage and Multi-node Mounting?
- Can I Add a Node Without a Data Disk?
- What Should I Do If the Host Cannot Be Found When Files Need to Be Uploaded to OBS During the Access to the CCE Service from a Public Network?
- How Can I Achieve Compatibility Between ExtendPathMode and Kubernetes client-go?
- Can CCE PVCs Detect Underlying Storage Faults?
- Namespace
-
Chart and Add-on
- Why Does Add-on Installation Fail and Prompt "The release name is already exist"?
- How Do I Configure the Add-on Resource Quotas Based on Cluster Scale?
- What Should I Do If the Helm Chart Uploaded Before the Tenant Account Name Is Changed Is Abnormal?
- How Can I Clean Up Residual Resources After the NGINX Ingress Controller Add-on in the Unknown State Is Deleted?
- Why TLS v1.0 and v1.1 Cannot Be Used After the NGINX Ingress Controller Add-on Is Upgraded?
-
API & kubectl FAQs
- How Can I Access a Cluster API Server?
- Can the Resources Created Using APIs or kubectl Be Displayed on the CCE Console?
- How Do I Download kubeconfig for Connecting to a Cluster Using kubectl?
- How Do I Rectify the Error Reported When Running the kubectl top node Command?
- Why Is "Error from server (Forbidden)" Displayed When I Use kubectl?
- DNS FAQs
- Permissions
- Reference
-
Best Practices
- Checklist for Deploying Containerized Applications in the Cloud
- Containerization
- Disaster Recovery
- Security
- Auto Scaling
- Monitoring
- Cluster
- Networking
- Storage
- Container
- Permission
- Release
-
API Reference (Ankara Region)
- Before You Start
- API Overview
- Calling APIs
-
APIs
- API URL
-
Cluster Management
- Creating a Cluster
- Reading a Specified Cluster
- Listing Clusters in a Specified Project
- Updating a Specified Cluster
- Deleting a Cluster
- Hibernating a Cluster
- Waking Up a Cluster
- Obtaining a Cluster Certificate
- Modifying Cluster Specifications
- Querying a Job
- Binding/Unbinding Public API Server Address
- Obtaining Cluster Access Address
- Node Management
- Node Pool Management
- Storage Management
- Add-on Management
- Quota Management
- API Versions
- Tag Management
- Configuration Management
-
Chart Management
- Uploading a Chart
- Obtaining a Chart List
- Obtaining a Release List
- Updating a Chart
- Creating a Release
- Deleting a Chart
- Updating a Release
- Obtaining a Chart
- Deleting a Release
- Downloading a Chart
- Obtaining a Release
- Obtaining Chart Values
- Obtaining Historical Records of a Release
- Obtaining the Quota of a User Chart
- Kubernetes APIs
- Permissions and Supported Actions
-
Appendix
- Status Code
- Error Codes
- Obtaining a Project ID
- Obtaining an Account ID
- Specifying Add-ons to Be Installed During Cluster Creation
- How to Obtain Parameters in the API URI
- Creating a VPC and Subnet
- Creating a Key Pair
- Node Flavor Description
- Adding a Salt in the password Field When Creating a Node
- Maximum Number of Pods That Can Be Created on a Node
- Node OS
- Data Disk Space Allocation
- Attaching Disks to a Node
-
User Guide (ME-Abu Dhabi Region)
- General Reference
Copied.
Dynamic Resource Oversubscription
Many services see surges in traffic. To ensure performance and stability, resources are often requested at the maximum needed. However, the surges may ebb very shortly and resources, if not released, are wasted in non-peak hours. Especially for online jobs that request a large quantity of resources to ensure SLA, resource utilization can be as low as it gets.
Resource oversubscription is the process of using idle requested resources. Oversubscribed resources are suitable for deploying offline jobs, which focus on throughput but have low SLA requirements and can tolerate certain failures.
Hybrid deployment of online and offline jobs in a cluster can better utilize cluster resources.

Features
After dynamic resource oversubscription and elastic scaling are enabled in a node pool, oversubscribed resources change rapidly because the resource usage of high-priority applications changes in real time. To prevent frequent node scale-ins and scale-outs, do not consider oversubscribed resources when evaluating node scale-ins.
Hybrid deployment is supported, and CPU and memory resources can be oversubscribed. The key features are as follows:
- Offline jobs preferentially run on oversubscribed nodes.
If both oversubscribed and non-oversubscribed nodes exist, the former will score higher than the latter and offline jobs are preferentially scheduled to oversubscribed nodes.
- Online jobs can use only non-oversubscribed resources if scheduled to an oversubscribed node.
Offline jobs can use both oversubscribed and non-oversubscribed resources of an oversubscribed node.
- In the same scheduling period, online jobs take precedence over offline jobs.
If both online and offline jobs exist, online jobs are scheduled first. When the node resource usage exceeds the upper limit and the node requests exceed 100%, offline jobs will be evicted.
- CPU/Memory isolation is provided by kernels.
CPU isolation: Online jobs can quickly preempt CPU resources of offline jobs and suppress the CPU usage of the offline jobs.
Memory isolation: When system memory resources are used up and OOM Kill is triggered, the kernel evicts offline jobs first.
- kubelet offline jobs admission rules:
After the pod is scheduled to a node, kubelet starts the pod only when the node resources can meet the pod request (predicateAdmitHandler.Admit). kubelet starts the pod when both of the following conditions are met:
- The total request of pods to be started and online running jobs < allocatable nodes
- The total request of pods to be started and online/offline running job < allocatable nodes+oversubscribed nodes
- Resource oversubscription and hybrid deployment:
If only hybrid deployment is used, configure the label volcano.sh/colocation=true for the node and delete the node label volcano.sh/oversubscription or set its value to false.
If the label volcano.sh/colocation=true is configured for a node, hybrid deployment is enabled. If the label volcano.sh/oversubscription=true is configured, resource oversubscription is enabled. The following table lists the available feature combinations after hybrid deployment or resource oversubscription is enabled.Hybrid Deployment Enabled (volcano.sh/colocation=true)
Resource Oversubscription Enabled (volcano.sh/oversubscription=true)
Resource Oversubscription
When Offline Pod Eviction Triggered (Using Annotations to Configure Limits)
No
No
No
None
Yes
No
No
The actual resource usage of a node exceeds the upper limit.
No
Yes
Yes
The actual resource usage of a node exceeds the upper limit and the pod requests on the node exceed 100%.
Yes
Yes
Yes
The actual resource usage of a node exceeds the upper limit.
kubelet Oversubscription
- Cluster version
- v1.19: v1.19.16-r4 or later
- v1.21: v1.21.7-r0 or later
- v1.23: v1.23.5-r0 or later
- v1.25 or later
- Cluster type: CCE Standard
- Node OS: EulerOS 2.9 (kernel-4.18.0-147.5.1.6.h729.6.eulerosv2r9.x86_64) or Huawei Cloud EulerOS 2.0
- Node type: ECS
- Volcano version: 1.7.0 or later
- Before enabling oversubscription, ensure that the overcommit add-on is not enabled on Volcano.
- Modifying the label of an oversubscribed node does not affect the running pods.
- Running pods cannot be converted between online and offline services. To convert services, you need to rebuild pods.
- If the label volcano.sh/oversubscription=true is configured for a node in the cluster, the oversubscription configuration must be added to the Volcano add-on. Otherwise, the scheduling of oversold nodes will be abnormal. Ensure that you have correctly configure labels because the scheduler does not check the add-on and node configurations. For details, see Table 1.
- To disable oversubscription, perform the following operations:
- Remove the volcano.sh/oversubscription label from the oversubscribed node.
- Set over-subscription-resource to false.
- Modify the configmap of Volcano Scheduler named volcano-scheduler-configmap and remove the oversubscription add-on.
- If cpu-manager-policy is set to static core binding on a node, do not assign the QoS class of Guaranteed to offline pods. If core binding is required, change the pods to online pods. Otherwise, offline pods may occupy the CPUs of online pods, causing online pod startup failures, and offline pods fail to be started although they are successfully scheduled.
- If cpu-manager-policy is set to static core binding on a node, do not bind cores to all online pods. Otherwise, online pods occupy all CPU or memory resources, leaving a small number of oversubscribed resources.
If the label volcano.sh/oversubscription=true is configured for a node in the cluster, the oversubscription configuration must be added to the Volcano add-on. Otherwise, the scheduling of oversold nodes will be abnormal. For details about the related configuration, see Table 1.
- Configure the Volcano add-on.
- Use kubectl to access the cluster.
- Install the Volcano add-on and add the oversubscription add-on to volcano-scheduler-configmap. Ensure that the add-on configuration does not contain the overcommit add-on. If - name: overcommit exists, delete this configuration. In addition, set enablePreemptable and enableJobStarving of the gang add-on to false and configure a preemption action.
# kubectl edit cm volcano-scheduler-configmap -n kube-system apiVersion: v1 data: volcano-scheduler.conf: | actions: "allocate, backfill, preempt" # Configure a preemption action. tiers: - plugins: - name: gang enablePreemptable: false enableJobStarving: false - name: priority - name: conformance - name: oversubscription - plugins: - name: drf - name: predicates - name: nodeorder - name: binpack - plugins: - name: cce-gpu-topology-predicate - name: cce-gpu-topology-priority - name: cce-gpu
- Enable node oversubscription.
A label can be configured to use oversubscribed resources only after the oversubscription feature is enabled for a node. Related nodes can be created only in a node pool. To enable the oversubscription feature, perform the following steps:
- Create a node pool.
- Choose Manage in the Operation column of the created node pool.
- On the Manage Configurations page, enable Node oversubscription feature (over-subscription-resource) and click OK.
- Set the node oversubscription label.
The volcano.sh/oversubscription label needs to be configured for an oversubscribed node. If this label is set for a node and the value is true, the node is an oversubscribed node. Otherwise, the node is not an oversubscribed node.
kubectl label node 192.168.0.0 volcano.sh/oversubscription=true
An oversubscribed node also supports the oversubscription thresholds, as listed in Table 2. For example:
kubectl annotate node 192.168.0.0 volcano.sh/evicting-cpu-high-watermark=70
Querying the node information
# kubectl describe node 192.168.0.0 Name: 192.168.0.0 Roles: <none> Labels: ... volcano.sh/oversubscription=true Annotations: ... volcano.sh/evicting-cpu-high-watermark: 70
Table 2 Node oversubscription annotations Name
Description
volcano.sh/evicting-cpu-high-watermark
Upper limit for CPU usage. When the CPU usage of a node exceeds the specified value, offline job eviction is triggered and the node becomes unschedulable.
The default value is 80, indicating that offline job eviction is triggered when the CPU usage of a node exceeds 80%.
volcano.sh/evicting-cpu-low-watermark
Lower limit for CPU usage. After eviction is triggered, the scheduling starts again when the CPU usage of a node is lower than the specified value.
The default value is 30, indicating that scheduling starts again when the CPU usage of a node is lower than 30%.
volcano.sh/evicting-memory-high-watermark
Upper limit for memory usage. When the memory usage of a node exceeds the specified value, offline job eviction is triggered and the node becomes unschedulable.
The default value is 60, indicating that offline job eviction is triggered when the memory usage of a node exceeds 60%.
volcano.sh/evicting-memory-low-watermark
Lower limit for memory usage. After eviction is triggered, the scheduling starts again when the memory usage of a node is lower than the specified value.
The default value is 30, indicating that the scheduling starts again when the memory usage of a node is less than 30%.
volcano.sh/oversubscription-types
Oversubscribed resource type. Options:
- cpu: oversubscribed CPU
- memory: oversubscribed memory
- cpu,memory: oversubscribed CPU and memory
The default value is cpu,memory.
- Create resources at a high- and low-priorityClass, respectively.
cat <<EOF | kubectl apply -f - apiVersion: scheduling.k8s.io/v1 description: Used for high priority pods kind: PriorityClass metadata: name: production preemptionPolicy: PreemptLowerPriority value: 999999 --- apiVersion: scheduling.k8s.io/v1 description: Used for low priority pods kind: PriorityClass metadata: name: testing preemptionPolicy: PreemptLowerPriority value: -99999 EOF
- Deploy online and offline jobs and configure priorityClasses for these jobs.
The volcano.sh/qos-level label needs to be added to annotation to distinguish offline jobs. The value is an integer ranging from -7 to 7. If the value is less than 0, the job is an offline job. If the value is greater than or equal to 0, the job is a high-priority job, that is, online job. You do not need to set this label for online jobs. For both online and offline jobs, set schedulerName to volcano to enable Volcano Scheduler.
NOTE:
The priorities of online/online and offline/offline jobs are not differentiated, and the value validity is not verified. If the value of volcano.sh/qos-level of an offline job is not a negative integer ranging from -7 to 0, the job is processed as an online job.
For an offline job:
kind: Deployment apiVersion: apps/v1 spec: replicas: 4 template: metadata: annotations: metrics.alpha.kubernetes.io/custom-endpoints: '[{"api":"","path":"","port":"","names":""}]' volcano.sh/qos-level: "-1" # Offline job label spec: schedulerName: volcano # Volcano is used. priorityClassName: testing # Configure the testing priorityClass. ...
For an online job:
kind: Deployment apiVersion: apps/v1 spec: replicas: 4 template: metadata: annotations: metrics.alpha.kubernetes.io/custom-endpoints: '[{"api":"","path":"","port":"","names":""}]' spec: schedulerName: volcano # Volcano is used. priorityClassName: production # Configure the production priorityClass. ...
- Run the following command to check the number of oversubscribed resources and the resource usage:
kubectl describe node <nodeIP>
# kubectl describe node 192.168.0.0 Name: 192.168.0.0 Roles: <none> Labels: ... volcano.sh/oversubscription=true Annotations: ... volcano.sh/oversubscription-cpu: 2335 volcano.sh/oversubscription-memory: 341753856 Allocatable: cpu: 3920m memory: 6263988Ki Allocated resources: (Total limits may be over 100 percent, i.e., overcommitted.) Resource Requests Limits -------- -------- ------ cpu 4950m (126%) 4950m (126%) memory 1712Mi (27%) 1712Mi (27%)
In the preceding command, CPU and memory are in the unit of mCPUs and MiB, respectively.
Deployment Example
The following uses an example to describe how to deploy online and offline jobs in hybrid mode.
- Configure a cluster with two nodes, one oversubscribed and the other non-oversubscribed.
# kubectl get node NAME STATUS ROLES AGE VERSION 192.168.0.173 Ready <none> 4h58m v1.19.16-r2-CCE22.5.1 192.168.0.3 Ready <none> 148m v1.19.16-r2-CCE22.5.1
- 192.168.0.173 is an oversubscribed node (with the volcano.sh/oversubscription=true label).
- 192.168.0.3 is a non-oversubscribed node (without the volcano.sh/oversubscription=true label).
# kubectl describe node 192.168.0.173 Name: 192.168.0.173 Roles: <none> Labels: beta.kubernetes.io/arch=amd64 ... volcano.sh/oversubscription=true
- Submit offline job creation requests. If resources are sufficient, all offline jobs will be scheduled to the oversubscribed node.
The offline job template is as follows:
apiVersion: apps/v1 kind: Deployment metadata: name: offline namespace: default spec: replicas: 2 selector: matchLabels: app: offline template: metadata: labels: app: offline annotations: volcano.sh/qos-level: "-1" # Offline job label spec: schedulerName: volcano # Volcano is used. priorityClassName: testing # Configure the testing priorityClass. containers: - name: container-1 image: nginx:latest imagePullPolicy: IfNotPresent resources: requests: cpu: 500m memory: 512Mi limits: cpu: "1" memory: 512Mi imagePullSecrets: - name: default-secret
Offline jobs are scheduled to the oversubscribed node.# kubectl get pod -o wide NAME READY STATUS RESTARTS AGE IP NODE offline-69cdd49bf4-pmjp8 1/1 Running 0 5s 192.168.10.178 192.168.0.173 offline-69cdd49bf4-z8kxh 1/1 Running 0 5s 192.168.10.131 192.168.0.173
- Submit online job creation requests. If resources are sufficient, the online jobs will be scheduled to the non-oversubscribed node.
The online job template is as follows:
apiVersion: apps/v1 kind: Deployment metadata: name: online namespace: default spec: replicas: 2 selector: matchLabels: app: online template: metadata: labels: app: online spec: schedulerName: volcano # Volcano is used. priorityClassName: production # Configure the production priorityClass. containers: - name: container-1 image: resource_consumer:latest imagePullPolicy: IfNotPresent resources: requests: cpu: 1400m memory: 512Mi limits: cpu: "2" memory: 512Mi imagePullSecrets: - name: default-secret
Online jobs are scheduled to the non-oversubscribed node.# kubectl get pod -o wide NAME READY STATUS RESTARTS AGE IP NODE online-ffb46f656-4mwr6 1/1 Running 0 5s 192.168.10.146 192.168.0.3 online-ffb46f656-dqdv2 1/1 Running 0 5s 192.168.10.67 192.168.0.3
- Improve the resource usage of the oversubscribed node and observe whether offline job eviction is triggered.
Deploy online jobs to the oversubscribed node (192.168.0.173).
apiVersion: apps/v1 kind: Deployment metadata: name: online namespace: default spec: replicas: 2 selector: matchLabels: app: online template: metadata: labels: app: online spec: affinity: # Submit an online job to an oversubscribed node. nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: kubernetes.io/hostname operator: In values: - 192.168.0.173 schedulerName: volcano # Volcano is used. priorityClassName: production # Configure the production priorityClass. containers: - name: container-1 image: resource_consumer:latest imagePullPolicy: IfNotPresent resources: requests: cpu: 700m memory: 512Mi limits: cpu: 700m memory: 512Mi imagePullSecrets: - name: default-secret
Submit the online or offline jobs to the oversubscribed node (192.168.0.173) at the same time.# kubectl get pod -o wide NAME READY STATUS RESTARTS AGE IP NODE offline-69cdd49bf4-pmjp8 1/1 Running 0 13m 192.168.10.178 192.168.0.173 offline-69cdd49bf4-z8kxh 1/1 Running 0 13m 192.168.10.131 192.168.0.173 online-6f44bb68bd-b8z9p 1/1 Running 0 3m4s 192.168.10.18 192.168.0.173 online-6f44bb68bd-g6xk8 1/1 Running 0 3m12s 192.168.10.69 192.168.0.173
Check the oversubscribed node with IP address 192.168.0.173. It is found that resources are oversubscribed, where there are 2343 mCPUs and 3073653200 MiB of memory. Additionally, the CPU allocation rate exceeded 100%.# kubectl describe node 192.168.0.173 Name: 192.168.0.173 Roles: <none> Labels: … volcano.sh/oversubscription=true Annotations: … volcano.sh/oversubscription-cpu: 2343 volcano.sh/oversubscription-memory: 3073653200 … Allocated resources: (Total limits may be over 100 percent, i.e., overcommitted.) Resource Requests Limits -------- -------- ------ cpu 4750m (121%) 7350m (187%) memory 3760Mi (61%) 4660Mi (76%) …
Increase the CPU usage of online jobs on the node. Offline job eviction is triggered.# kubectl get pod -o wide NAME READY STATUS RESTARTS AGE IP NODE offline-69cdd49bf4-bwdm7 1/1 Running 0 11m 192.168.10.208 192.168.0.3 offline-69cdd49bf4-pmjp8 0/1 Evicted 0 26m <none> 192.168.0.173 offline-69cdd49bf4-qpdss 1/1 Running 0 11m 192.168.10.174 192.168.0.3 offline-69cdd49bf4-z8kxh 0/1 Evicted 0 26m <none> 192.168.0.173 online-6f44bb68bd-b8z9p 1/1 Running 0 24m 192.168.10.18 192.168.0.173 online-6f44bb68bd-g6xk8 1/1 Running 0 24m 192.168.10.69 192.168.0.173
Handling Suggestions
- After kubelet of the oversubscribed node is restarted, the resource view of Volcano Scheduler is not synchronized with that of kubelet. As a result, OutOfCPU occurs in some newly scheduled jobs, which is normal. After a period of time, Volcano Scheduler can properly schedule online and offline jobs.
- After online and offline jobs are submitted, you are not advised to dynamically change the job type (adding or deleting annotation volcano.sh/qos-level: "-1") because the current kernel does not support the change of an offline job to an online job.
- CCE collects the resource usage (CPU/memory) of all pods running on a node based on the status information in the cgroups system. The resource usage may be different from the monitored resource usage, for example, the resource statistics displayed by running the top command.
- You can add oversubscribed resources (such as CPU and memory) at any time.
You can reduce the oversubscribed resource types only when the resource allocation rate does not exceed 100%.
- If an offline job is deployed on a node ahead of an online job and the online job cannot be scheduled due to insufficient resources, configure a higher priorityClass for the online job than that for the offline job.
- If there are only online jobs on a node and the eviction threshold is reached, the offline jobs that are scheduled to the current node will be evicted soon. This is normal.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot