Help Center/ Cloud Container Engine/ Skill Reference/ Using Huawei Cloud Cloud-Native Skills

Updated on 2026-07-02 GMT+08:00

Using Huawei Cloud Cloud-Native Skills

This section is intended for developers, O&M engineers, and architects who use Cloud Container Engine (CCE) and related cloud services. It describes the capacity positioning, usage, and reference of Huawei Cloud cloud-native Skills.

Skill Overview

What Are Skills?

Skills are open capabilities that convert professional knowledge, operation processes, and best practices into reusable capability units. In AI Agents, Skills are used to extend the professional capabilities of Agents so that Agents can automatically execute complex tasks in specific domains based on predefined processes and rules. The core features of Skills are as follows:

Intent-driven: An Agent automatically understands when to trigger a Skill by reading the Skill's description. You do not need to explicitly specify the time.
Scenario orchestration: A Skill can internally connect multiple steps to automatically collect contexts, analyze them, and output conclusions.
Reusable: A Skill can run on different Agent platforms (web, CLI, and API). You do not need to adapt the Skill to each platform.
Composable: Multiple Skills can be combined based on workflows. Agents automatically select and invoke appropriate Skills based on task requirements.
Security guardrails: Risk constraints are defined within Skills. High-risk operations must be previewed and confirmed by users.

Huawei Cloud cloud-native Skills are encapsulated O&M capabilities of cloud services such as CCE, AOM, LTS, ELB, ECS, and HSS based on scenarios such as fault diagnosis, observability analysis, inspection and governance, and automatic recovery. They enable AI Agents to have professional cloud-native O&M capabilities.

Scenarios

Fault diagnosis: Pod CrashLoopBackOff, node NotReady, Ingress 502, PVC Pending, and other faults
Observable analysis: AOM alarms, LTS logs, Kubernetes events, and pod/node metrics are aggregated to form diagnosis contexts.
Inspection and governance: daily cluster health check, capacity trend prediction, cost optimization suggestions, and availability risk scanning
Automatic recovery: controlled changes such as scaling, cordoning or draining nodes, restarting ECSs, and fixing HSS vulnerabilities
Delivery solution: container migration planning, resource stocktaking, and dependency matrix analysis
Cluster management: CCE cluster upgrade planning, workload management, and UCS cluster management and policy governance

Security Constraints and Risk Levels

Core security constraints
- Do not output AKs/SKs in scripts, logs, or reports.
- Preview all operations, such as deletion, scaling, drain, and reboot, before confirming them.
- Delete temporary kubeconfig files and certificate files after using them.
- Use diagnosis, inspection, and migration planning Skills only for read-only queries and report generation.

Risk levels

Level	Risk Severity	Core Definition	Typical Operation Example	Default Behavior
R3	No risk	Read-only operations that do not change any system status	Query data, obtain statuses, and monitor metrics.	Direct execution
R2	Low risk	Operations that have minimal or no impacts on service continuity and do not involve cost changes	Increase the number of pod replicas and configure HPA rules.	Forbidden by default. Automatic execution is allowed after authorization.
R1	Medium risk	Operations that have minor impacts on service continuity or involve cost changes	Restart abnormal pods, adjust pod resource quotas, and add cluster nodes.	Forbidden by default. Manual approval is required before execution.
R0	High risk	Operations that have major impacts on service continuity and may cause service interruption	Delete clusters.	Forbidden by default. Manual approval is required before execution.

Usage Constraints

Currently, Skills are mainly designed for CCE clusters and their associated cloud services (such as AOM, LTS, ELB, ECS, and HSS).
All change actions are in preview mode by default and are not automatically executed.

Usage Description

Working Principles

A Skill works based on the intent matching mechanism. An Agent reads the description in the header of the SKILL.md file in the Skill directory. When your question matches the description, the Agent automatically triggers the Skill. A Skill defines a complete processing workflow, a list of tools that can be invoked, and risk constraints. The Agent executes tasks step by step based on the Skill's guidance.

For example, when you ask "What should I do if a pod keeps restarting?", the Agent matches the description of pod-failure-diagnoser as follows:

---
name: pod-failure-diagnoser
description: Diagnose CCE Pod failures such as CrashLoopBackOff, ImagePullBackOff, OOMKilled, Pending, Evicted, restart storms, or workload unavailable.
---

The Agent determines that the issue matches the description and automatically triggers pod-failure-diagnoser to execute the diagnosis process.

Obtaining Skills

Huawei Cloud cloud-native Skills are provided through an open repository at GitHub.

Each Skill uses a self-contained directory structure that contains the description and auxiliary files required to run the Skill.

skill-name/
├── SKILL.md       # Skill definition file, which is the only entry
├── references/    # Reference documents
├── scripts/       # Executable scripts
├── templates/     # Template files
└── demo/          # Demonstration examples

Skill Installation

Method 1: Using npx

# Install a single Skill.
npx skills add huaweicloud/huaweicloud-skills --skill <skill-name>

# Install all Skills.
npx skills add huaweicloud/huaweicloud-skills

Method 2: Using the GitHub repository for manual installation

git clone https://github.com/huaweicloud/huaweicloud-skills.git

# Install a specified Skill.
npx skills add <path>/huaweicloud-skills/skills/<skill-name>

The loading paths and integration methods vary depending on the Agent platform. For details, see Platform Integration Example.

Authentication Configuration

Before using Skills related to Huawei Cloud products, configure authentication information based on the target cloud service.

Interactive configuration

Access Key Id: <your AK>
Secret Access Key: <your SK>

AccessKey authentication configuration using KooCLI
```
hcloud configure set --cli-access-key="<your AK>" --cli-secret-key="<your SK>" --cli-mode="AKSK"
```
- Use plaintext AK/SK authentication only in the trusted local test environment to prevent credential leakage.
- The cloud environment must comply with the principle of least privilege and follow the instructions provided in Identity Authentication and Access Control.
- Do not write AKs/SKs into scripts, logs, reports, or code repositories.

Reference

Overview

Huawei Cloud cloud-native Skills are organized around cloud-native resource management and continuous O&M scenarios, covering capability domains such as resource lifecycle, observability and alarms, fault diagnosis and recovery, inspection and governance, solution and delivery, and multi-cloud and multi-cluster management.

Each Skill is provided as an independent directory, including the capability description, application scenarios, and necessary reference documents. You can select a single Skill or combine multiple Skills to complete cross-service and cross-step O&M tasks based on service requirements. The following lists available Skills by capability domain.

Lifecycle and Resource Management

Lifecycle and resource management covers CCE, CCI, and SWR. The product names are used only for grouping. Each row in the following table represents an independent Skill.

CCE

Skill	Directory Path	Function
huawei-cloud-cce-cluster-management	skills/huawei-cloud-cce-cluster-management	Manages the full lifecycle of CCE clusters, node pools, nodes, add-ons, EIPs, and kubeconfig.
cce-cluster-upgrade-planner	skills/cce/cce-cluster-upgrade-planner	Plans the CCE Kubernetes version upgrade and checks the upgrade path, add-on compatibility, different items, and upgrade window.
cce-workload-manager	skills/cce/cce-workload-manager	Manages CCE workloads and Kubernetes resources, including Deployments, StatefulSets, DaemonSets, jobs, CronJobs, HPAs, Services, ingresses, and ConfigMaps.

CCI

Skill	Directory Path	Function
huawei-cloud-cci-instance-management	skills/cci/huawei-cloud-cci-instance-management	Manages CCI, including namespaces, networks, Deployments, StatefulSets, pods, EIPPools, logs, and metrics.

SWR

Skill	Directory Path	Function
huawei-cloud-swr-image-management	skills/swr/huawei-cloud-swr-image-management	Manages SWR namespaces, repositories, tags, login credentials, and quotas.
huawei-cloud-swr-image-governance	skills/swr/huawei-cloud-swr-image-governance	Manages SWR permissions, retention policies, sharing policies, agencies, and immutability rules.
huawei-cloud-swr-image-automation	skills/swr/huawei-cloud-swr-image-automation	Manages SWR image synchronization, triggers, and automatic deployment processes.
huawei-cloud-swr-enterprise-instance	skills/swr/huawei-cloud-swr-enterprise-instance	Manages SWR Enterprise Edition, namespaces, repositories, artifacts, credentials, endpoints, and domain names.

Observability and Intelligent Alarms

Skill	Directory Path	Function
observability-context-builder	skills/observability-context-builder	Aggregates AOM alarms, metrics, LTS logs, pod logs, and Kubernetes events to form diagnosis contexts.
alarm-correlation-engine	skills/alarm-correlation-engine	Performs association analysis on AOM active and historical alarms, deduplicates and merges alarms, groups alarms by severity, and checks alarm rules.
log-analyzer	skills/log-analyzer	Queries and analyzes pod standard output, CCE LogConfig application logs, and LTS logs.
kubernetes-event-analyzer	skills/kubernetes-event-analyzer	Queries and analyzes Kubernetes warning events, repetition patterns, and pod, node, and workload exceptions.
metric-analyzer	skills/metric-analyzer	Queries and analyzes CCE pod, node, and ECS, ELB, EIP, and NAT metrics to identify threshold exceptions.

Fault Diagnosis and Self-Healing

Skill	Directory Path	Function
pod-failure-diagnoser	skills/pod-failure-diagnoser	Diagnoses pod faults such as CrashLoopBackOff, ImagePullBackOff, OOMKilled, Pending, Evicted, and frequent restarts.
workload-failure-diagnoser	skills/workload-failure-diagnoser	Diagnoses Deployment, StatefulSet, and DaemonSet release failures, rolling upgrade suspension, insufficient replicas, and probe exceptions.
node-failure-diagnoser	skills/node-failure-diagnoser	Diagnoses Node NotReady, resource pressure, NPD, CNI, kubelet, and container runtime exceptions.
autoscaling-diagnoser	skills/autoscaling-diagnoser	Diagnoses HPA and Cluster Autoscaler link faults.
network-failure-diagnoser	skills/network-failure-diagnoser	Diagnoses Service, DNS, ingress, NetworkPolicy, ELB, EIP, NAT, and VPC network faults.
storage-failure-diagnoser	skills/storage-failure-diagnoser	Diagnoses PVC, PV, EVS, SFS, OBS, mounting, capacity, and deletion protection faults.
root-cause-analyzer	skills/root-cause-analyzer	Summarizes cross-domain evidence and outputs top root causes, impact scope, confidence, and recovery handover.
change-impact-analyzer	skills/change-impact-analyzer	Analyze the fault impacts caused by release, configuration, network, security policy, and node changes.
dependency-impact-analyzer	skills/dependency-impact-analyzer	Analyzes the fault propagation path and upstream and downstream impacts based on the Service, ingress, pod, and node topologies.
auto-remediation-runner	skills/auto-remediation-runner	Generates and executes controlled recovery actions. All high-risk changes are previewed by default and require explicit confirmation.

Inspection, Governance, and Continuous O&M

Skill	Directory Path	Function
daily-cluster-inspector	skills/daily-cluster-inspector	Performs periodic CCE health checks, quick inspections, and continuous O&M summaries.
availability-risk-scanner	skills/availability-risk-scanner	Scan for HA, AZ distribution, single replica, PDB, probe, affinity, gateway, and resource overcommitment risks.
capacity-trend-forecaster	skills/capacity-trend-forecaster	Analyzes periodic capacity trends, predicts resource bottlenecks, and simulates HPA and node scaling policies.
cost-optimization-advisor	skills/cost-optimization-advisor	Analyzes idle resources, excessive requests, low-usage nodes, and scaling policy optimization opportunities.
ops-report-generator	skills/ops-report-generator	Summarizes inspection, capacity, availability, cost, and on-call contexts to generate weekly, monthly, SLA, capacity, and stability reports.

Solution and Delivery

Skill	Directory Path	Function
cce-cci-bursting-deployer	skills/cce-cci-bursting-deployer	Configures, deploys, and verifies the auto scaling capability from CCE to CCI 2.0, including VPCEP, virtual-kubelet, and smoke testing.
container-migration-planner	skills/container-migration-planner	Counts container platform resources and dependencies, and outputs migration batches, risks, and verification solutions. No real migration is performed.
pressure-test	skills/pressure-test	Builds a full-link pressure test from the k6 client through ELB and nginx-ingress to the service pod, collects observability data, and outputs a performance report.

Multi-Cloud and Multi-Cluster Management

UCS-related Skills are placed in this category and are no longer included in CCE lifecycle management.

Skill	Directory Path	Function
ucs-cluster-onboarding-manager	skills/ucs/ucs-cluster-onboarding-manager	Manages UCS clusters, lifecycle, fleet groups, kubeconfig, and resource quotas.
ucs-policy-governor	skills/ucs/ucs-policy-governor	Manages UCS policy instances, policy definitions, start and stop operations, execution statuses, and fleet compliance audit.

Usage

An Agent automatically matches capabilities based on the description in the SKILL.md file of each Skill. If manual locating is required, you can find the target Skill according to this document and then go to the corresponding directory to view the complete description and reference documents.

Platform Integration Example

Using Skills in OpenCode

OpenCode is an AI programming assistant for terminals. It allows you to load a Skill through the project directory or user directory.

Skill types
- Project-level Skill: Place the Skill directory in the skills/ folder in the root directory of the project.
```
my-project/
├── src/
├── skills/
│   ├── pod-failure-diagnoser/
│   │   ├── SKILL.md
│   │   ├── manifest.json
│   │   ├── skill-profile.yaml
│   │   └── references/
│   ├── node-failure-diagnoser/
│   └── ...
```
  When OpenCode is started, it automatically scans the skills/ folder in the project directory and loads all Skills. You can directly describe the issue in the dialog, and the Agent will automatically match the appropriate Skill based on the description.
- User-level Skill: Place the Skill directory in the user configuration directory. User-level Skills take effect for all projects and are suitable for common O&M Skills.
  - Windows: %USERPROFILE%\.opencode\skills\
  - Linux/macOS: ~/.opencode/skills/

Example

# Go to the project directory.
cd my-project

# Start OpenCode. Skills have been automatically loaded.
opencode

# Describe the issue in the dialog.
> My pod keeps restarting. Can you help me check?
# The Agent automatically triggers pod-failure-diagnoser.

Using Skills in OpenClaw

OpenClaw is an open-source, self-hosted gateway that connects chat applications and channels to AI Agents. You can run the gateway locally or on your own server and extend Agent capabilities through Skills.

OpenClaw can load Skills from the following directories:

Directory	Description
<workspace>/skills/	Skills in the current workspace, suitable for project-level customization
<workspace>/.agents/skills/	Project-level Skills for Agents in the current workspace
~/.agents/skills/	Skills that can be shared by multiple Agents
~/.openclaw/skills/	Skills managed by OpenClaw
skills.load.extraDirs	Skill directories that can be added through configuration

OpenClaw also loads the Skills that come with the installation. You can copy the required Skill directories to the corresponding loading directories. Example:

mkdir -p ~/.agents/skills
cp -R ./skills/pod-failure-diagnoser ~/.agents/skills/
cp -R ./skills/node-failure-diagnoser ~/.agents/skills/

Each Skill directory must contain SKILL.md. After OpenClaw loads Skills, the Agent can select an appropriate Skill based on your intent and execute tasks based on the workflow defined in the Skill.

For details about the positioning, Skill loading sequence, and directory description of OpenClaw, see the OpenClaw documentation and OpenClaw Skills.

Using Skills in Hermes

Hermes is a service orchestration platform for enterprise-class AI Agents. It supports Skill integration through declarative configuration.

Common Issues

When describing an issue, you can refer to the following table to quickly locate the recommended Skill.

Issue Description	Recommended Skill
Pod keeping restart, Pending, and OOMKilled	pod-failure-diagnoser
Release failure, rolling upgrade suspension, and insufficient replicas	workload-failure-diagnoser
Node NotReady, resource pressure, and node vulnerabilities	node-failure-diagnoser
HPA not scaling pods, CA not scaling nodes, and auto scaling not taking effect	autoscaling-diagnoser
Ingress 502, Service unreachable, ELB link exception	network-failure-diagnoser
PVC Pending, FailedMount, and capacity exhaustion	storage-failure-diagnoser
A large number of CCE alarms, which need to be combined for analysis	alarm-correlation-engine
Pod standard output or LTS application log query	log-analyzer
Kubernetes event trend analysis	kubernetes-event-analyzer
Query of CCE pod/node metrics and rankings by resource usage	metric-analyzer
Aggregation of logs, events, metrics, and alarms	observability-context-builder
Service unavailability, requiring comprehensive root cause analysis	root-cause-analyzer
Faults upon release, configuration, network, security policy, or node changes	change-impact-analyzer
Determining entries and upstream and downstream services affected by a service fault	dependency-impact-analyzer
Capacity expansion, restart, draining, and vulnerability fixing	auto-remediation-runner
Daily inspection or periodic health check	daily-cluster-inspector
Cost optimization and excessive request analysis	cost-optimization-advisor
Capacity trend prediction and scaling simulation	capacity-trend-forecaster
Availability risk scanning and PDB/probe check	availability-risk-scanner
Weekly, monthly, and SLA O&M reports	ops-report-generator
Container migration solution and resource stocktaking	container-migration-planner
Auto scaling configuration for scheduling CCE workloads to CCI	cce-cci-bursting-deployer
CCE cluster version upgrade planning	cce-cluster-upgrade-planner
CCE/UCS workload management	cce-workload-manager
UCS cluster management and fleet management	ucs-cluster-onboarding-manager
UCS policy governance and compliance audit	ucs-policy-governor
SWR image lifecycle management	huawei-cloud-swr-image-management
SWR image governance	huawei-cloud-swr-image-governance
SWR image automation	huawei-cloud-swr-image-automation
Pressure test solution and execution	pressure-test

Helpful Links

Document	Description	Path
CCE documentation	CCE documentation	Huawei Cloud CCE Documentation
Open Skill repository	Huawei Cloud cloud-native Skill code repository	huaweicloud/huaweicloud-skills

Next topic: Huawei Cloud Cloud-Native Skill Best Practices

Feedback

Was this page helpful?

Helpful Not helpful

Provide feedback

Thank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.

The system is busy. Please try again later.

For any further questions, feel free to contact us through the chatbot.

Chatbot