Help Center/ Cloud Container Engine/ Skill Reference/ Using Huawei Cloud Cloud-Native Skills
Updated on 2026-06-05 GMT+08:00

Using Huawei Cloud Cloud-Native Skills

This section is intended for developers, O&M engineers, and architects who use Cloud Container Engine (CCE) and related cloud services. It describes the capacity positioning, usage, and reference of Huawei Cloud cloud-native Skills.

Skill Overview

What Are Skills?

Skills are open capabilities that convert professional knowledge, operation processes, and best practices into reusable capability units. In AI Agents, Skills are used to extend the professional capabilities of Agents so that Agents can automatically execute complex tasks in specific domains based on predefined processes and rules. The core features of Skills are as follows:

  • Intent-driven: An Agent automatically understands when to trigger a Skill by reading the Skill's description. You do not need to explicitly specify the time.
  • Scenario orchestration: A Skill can internally connect multiple steps to automatically collect contexts, analyze them, and output conclusions.
  • Reusable: A Skill can run on different Agent platforms (web, CLI, and API). You do not need to adapt the Skill to each platform.
  • Composable: Multiple Skills can be combined based on workflows. Agents automatically select and invoke appropriate Skills based on task requirements.
  • Security guardrails: Risk constraints are defined within Skills. High-risk operations must be previewed and confirmed by users.

Huawei Cloud cloud-native Skills are encapsulated O&M capabilities of cloud services such as CCE, AOM, LTS, ELB, ECS, and HSS based on scenarios such as fault diagnosis, observability analysis, inspection and governance, and automatic recovery. They enable AI Agents to have professional cloud native O&M capabilities.

Scenarios

  • Fault diagnosis: Pod CrashLoopBackOff, node NotReady, Ingress 502, PVC Pending, and other faults
  • Observable analysis: AOM alarms, LTS logs, Kubernetes events, and pod/node metrics are aggregated to form diagnosis contexts.
  • Inspection and governance: daily cluster health check, capacity trend prediction, cost optimization suggestions, and availability risk scanning
  • Automatic recovery: controlled changes such as scaling, cordoning or draining nodes, restarting ECSs, and fixing HSS vulnerabilities
  • Delivery solution: container migration planning, resource stocktaking, and dependency matrix analysis
  • Cluster management: CCE cluster upgrade planning, workload management, and UCS cluster management and policy governance

Security Constraints and Risk Levels

  • Core security constraints
    • Do not output AKs/SKs in scripts, logs, or reports.
    • Preview all operations, such as deletion, scaling, drain, and reboot, before confirming them.
    • Delete temporary kubeconfig files and certificate files after using them.
    • Use diagnosis, inspection, and migration planning Skills only for read-only queries and report generation.
  • Risk levels

    Risk Level

    Example

    Default Behavior

    R0

    list/get/query/analyze

    Direct execution

    R1

    Generating reports, solutions, and dashboards

    Direct execution

    R2

    Restarting abnormal pods and providing suggestions after query

    Preview by default, with configurable automatic execution

    R3

    Scaling, rollback, cordon, and uncordon

    confirm=true

    R4

    Deleting, draining, and hibernating production clusters

    confirm=true and strong risk warning

    R5

    Clearing data and performing irreversible cross-domain deletion

    Forbidden by default

Usage Constraints

  • Currently, Skills are mainly designed for CCE clusters and their associated cloud services (such as AOM, LTS, ELB, ECS, and HSS).
  • All change actions are in preview mode by default and are not automatically executed.

Usage Description

Working Principles

A Skill works based on the intent matching mechanism. An Agent reads the description in the header of the SKILL.md file in the Skill directory. When your question matches the description, the Agent automatically triggers the Skill. A Skill defines a complete processing workflow, a list of tools that can be invoked, and risk constraints. The Agent executes tasks step by step based on the Skill's guidance.

For example, when you ask "What should I do if a pod keeps restarting?", the Agent matches the description of pod-failure-diagnoser as follows:

---
name: pod-failure-diagnoser
description: Diagnose CCE Pod failures such as CrashLoopBackOff, ImagePullBackOff, OOMKilled, Pending, Evicted, restart storms, or workload unavailable.
---

The Agent determines that the issue matches the description and automatically triggers pod-failure-diagnoser to execute the diagnosis process.

Obtaining Skills

Huawei Cloud cloud-native Skills are provided through an open repository at GitHub.

Each Skill uses a self-contained directory structure that contains the description and auxiliary files required to run the Skill.

skill-name/
├── SKILL.md       # Skill definition file, which is the only entry
├── references/    # Reference documents
├── scripts/       # Executable scripts
├── templates/     # Template files
└── demo/          # Demonstration examples

Skill Installation

  • Method 1: Using npx
    # Install a single Skill.
    npx skills add huaweicloud/huaweicloud-skills --skill <skill-name>
    
    # Install all Skills.
    npx skills add huaweicloud/huaweicloud-skills
  • Method 2: Using the GitHub repository for manual installation
    git clone https://github.com/huaweicloud/huaweicloud-skills.git
    
    # Install a specified Skill.
    npx skills add <path>/huaweicloud-skills/skills/<skill-name>

The loading paths and integration methods vary depending on the Agent platform. For details, see Platform Integration Example.

Authentication Configuration

Before using Skills related to Huawei Cloud products, configure authentication information based on the target cloud service.

  • Interactive configuration
    Access Key Id: <your AK>
    Secret Access Key: <your SK>
  • AccessKey authentication configuration using KooCLI
    hcloud configure set --cli-access-key="<your AK>" --cli-secret-key="<your SK>" --cli-mode="AKSK"
    • Use plaintext AK/SK authentication only in the trusted local test environment to prevent credential leakage.
    • The cloud environment must comply with the principle of least privilege and follow the instructions provided in Identity Authentication and Access Control.
    • Do not write AKs/SKs into scripts, logs, reports, or code repositories.

Reference

Overview

Huawei Cloud cloud-native Skills are organized around cloud-native resource management and continuous O&M scenarios, covering capability domains such as resource lifecycle, observability and alarms, fault diagnosis and recovery, inspection and governance, solution and delivery, and multi-cloud and multi-cluster management.

Each Skill is provided as an independent directory, including the capability description, application scenarios, and necessary reference documents. You can select a single Skill or combine multiple Skills to complete cross-service and cross-step O&M tasks based on service requirements. The following lists available Skills by capability domain.

Lifecycle and Resource Management

Lifecycle and resource management covers CCE, CCI, and SWR. The product names are used only for grouping. Each row in the following table represents an independent Skill.

  • CCE

    Skill

    Directory Path

    Function

    huawei-cloud-cce-cluster-management

    skills/huawei-cloud-cce-cluster-management

    Manages the full lifecycle of CCE clusters, node pools, nodes, add-ons, EIPs, and kubeconfig.

    cce-cluster-upgrade-planner

    skills/cce/cce-cluster-upgrade-planner

    Plans the CCE Kubernetes version upgrade and checks the upgrade path, add-on compatibility, different items, and upgrade window.

    cce-workload-manager

    skills/cce/cce-workload-manager

    Manages CCE workloads and Kubernetes resources, including Deployments, StatefulSets, DaemonSets, jobs, CronJobs, HPAs, Services, ingresses, and ConfigMaps.

  • CCI

    Skill

    Directory Path

    Function

    huawei-cloud-cci-instance-management

    skills/cci/huawei-cloud-cci-instance-management

    Manages CCI, including namespaces, networks, Deployments, StatefulSets, pods, EIPPools, logs, and metrics.

  • SWR

    Skill

    Directory Path

    Function

    huawei-cloud-swr-image-management

    skills/swr/huawei-cloud-swr-image-management

    Manages SWR namespaces, repositories, tags, login credentials, and quotas.

    huawei-cloud-swr-image-governance

    skills/swr/huawei-cloud-swr-image-governance

    Manages SWR permissions, retention policies, sharing policies, agencies, and immutability rules.

    huawei-cloud-swr-image-automation

    skills/swr/huawei-cloud-swr-image-automation

    Manages SWR image synchronization, triggers, and automatic deployment processes.

    huawei-cloud-swr-enterprise-instance

    skills/swr/huawei-cloud-swr-enterprise-instance

    Manages SWR Enterprise Edition, namespaces, repositories, artifacts, credentials, endpoints, and domain names.

Observability and Intelligent Alarms

Skill

Directory Path

Function

observability-context-builder

skills/observability-context-builder

Aggregates AOM alarms, metrics, LTS logs, pod logs, and Kubernetes events to form diagnosis contexts.

alarm-correlation-engine

skills/alarm-correlation-engine

Performs association analysis on AOM active and historical alarms, deduplicates and merges alarms, groups alarms by severity, and checks alarm rules.

log-analyzer

skills/log-analyzer

Queries and analyzes pod standard output, CCE LogConfig application logs, and LTS logs.

kubernetes-event-analyzer

skills/kubernetes-event-analyzer

Queries and analyzes Kubernetes warning events, repetition patterns, and pod, node, and workload exceptions.

metric-analyzer

skills/metric-analyzer

Queries and analyzes CCE pod, node, and ECS, ELB, EIP, and NAT metrics to identify threshold exceptions.

Fault Diagnosis and Self-Healing

Skill

Directory Path

Function

pod-failure-diagnoser

skills/pod-failure-diagnoser

Diagnoses pod faults such as CrashLoopBackOff, ImagePullBackOff, OOMKilled, Pending, Evicted, and frequent restarts.

workload-failure-diagnoser

skills/workload-failure-diagnoser

Diagnoses Deployment, StatefulSet, and DaemonSet release failures, rolling upgrade suspension, insufficient replicas, and probe exceptions.

node-failure-diagnoser

skills/node-failure-diagnoser

Diagnoses Node NotReady, resource pressure, NPD, CNI, kubelet, and container runtime exceptions.

autoscaling-diagnoser

skills/autoscaling-diagnoser

Diagnoses HPA and Cluster Autoscaler link faults.

network-failure-diagnoser

skills/network-failure-diagnoser

Diagnoses Service, DNS, ingress, NetworkPolicy, ELB, EIP, NAT, and VPC network faults.

storage-failure-diagnoser

skills/storage-failure-diagnoser

Diagnoses PVC, PV, EVS, SFS, OBS, mounting, capacity, and deletion protection faults.

root-cause-analyzer

skills/root-cause-analyzer

Summarizes cross-domain evidence and outputs top root causes, impact scope, confidence, and recovery handover.

change-impact-analyzer

skills/change-impact-analyzer

Analyze the fault impacts caused by release, configuration, network, security policy, and node changes.

dependency-impact-analyzer

skills/dependency-impact-analyzer

Analyzes the fault propagation path and upstream and downstream impacts based on the Service, ingress, pod, and node topologies.

auto-remediation-runner

skills/auto-remediation-runner

Generates and executes controlled recovery actions. All high-risk changes are previewed by default and require explicit confirmation.

Inspection, Governance, and Continuous O&M

Skill

Directory Path

Function

daily-cluster-inspector

skills/daily-cluster-inspector

Performs periodic CCE health checks, quick inspections, and continuous O&M summaries.

availability-risk-scanner

skills/availability-risk-scanner

Scan for HA, AZ distribution, single replica, PDB, probe, affinity, gateway, and resource overcommitment risks.

capacity-trend-forecaster

skills/capacity-trend-forecaster

Analyzes periodic capacity trends, predicts resource bottlenecks, and simulates HPA and node scaling policies.

cost-optimization-advisor

skills/cost-optimization-advisor

Analyzes idle resources, excessive requests, low-usage nodes, and scaling policy optimization opportunities.

ops-report-generator

skills/ops-report-generator

Summarizes inspection, capacity, availability, cost, and on-call contexts to generate weekly, monthly, SLA, capacity, and stability reports.

Solution and Delivery

Skill

Directory Path

Function

cce-cci-bursting-deployer

skills/cce-cci-bursting-deployer

Configures, deploys, and verifies the auto scaling capability from CCE to CCI 2.0, including VPCEP, virtual-kubelet, and smoke testing.

container-migration-planner

skills/container-migration-planner

Counts container platform resources and dependencies, and outputs migration batches, risks, and verification solutions. No real migration is performed.

Skill for full-link pressure test

skills/pressure-test

Builds a full-link pressure test from the k6 client through ELB and nginx-ingress to the service pod, collects observability data, and outputs a performance report.

Multi-Cloud and Multi-Cluster Management

UCS-related Skills are placed in this category and are no longer included in CCE lifecycle management.

Skill

Directory Path

Function

ucs-cluster-onboarding-manager

skills/ucs/ucs-cluster-onboarding-manager

Manages UCS clusters, lifecycle, fleet groups, kubeconfig, and resource quotas.

ucs-policy-governor

skills/ucs/ucs-policy-governor

Manages UCS policy instances, policy definitions, start and stop operations, execution statuses, and fleet compliance audit.

Usage

An Agent automatically matches capabilities based on the description in the SKILL.md file of each Skill. If manual locating is required, you can find the target Skill according to this document and then go to the corresponding directory to view the complete description and reference documents.

Platform Integration Example

Using Skills in OpenCode

OpenCode is an AI programming assistant for terminals. It allows you to load a Skill through the project directory or user directory.

  • Skill types
    • Project-level Skill: Place the Skill directory in the skills/ folder in the root directory of the project.
      my-project/
      ├── src/
      ├── skills/
      │   ├── pod-failure-diagnoser/
      │   │   ├── SKILL.md
      │   │   ├── manifest.json
      │   │   ├── skill-profile.yaml
      │   │   └── references/
      │   ├── node-failure-diagnoser/
      │   └── ...

      When OpenCode is started, it automatically scans the skills/ folder in the project directory and loads all Skills. You can directly describe the issue in the dialog, and the Agent will automatically match the appropriate Skill based on the description.

    • User-level Skill: Place the Skill directory in the user configuration directory. User-level Skills take effect for all projects and are suitable for common O&M Skills.
      • Windows: %USERPROFILE%\.opencode\skills\
      • Linux/macOS: ~/.opencode/skills/
  • Example
    # Go to the project directory.
    cd my-project
    
    # Start OpenCode. Skills have been automatically loaded.
    opencode
    
    # Describe the issue in the dialog.
    > My pod keeps restarting. Can you help me check?
    # The Agent automatically triggers pod-failure-diagnoser.

Using Skills in OpenClaw

OpenClaw is an open-source, self-hosted gateway that connects chat applications and channels to AI Agents. You can run the gateway locally or on your own server and extend Agent capabilities through Skills.

OpenClaw can load Skills from the following directories:

Directory

Description

<workspace>/skills/

Skills in the current workspace, suitable for project-level customization

<workspace>/.agents/skills/

Project-level Skills for Agents in the current workspace

~/.agents/skills/

Skills that can be shared by multiple Agents

~/.openclaw/skills/

Skills managed by OpenClaw

skills.load.extraDirs

Skill directories that can be added through configuration

OpenClaw also loads the Skills that come with the installation. You can copy the required Skill directories to the corresponding loading directories. Example:

mkdir -p ~/.agents/skills
cp -R ./skills/pod-failure-diagnoser ~/.agents/skills/
cp -R ./skills/node-failure-diagnoser ~/.agents/skills/

Each Skill directory must contain SKILL.md. After OpenClaw loads Skills, the Agent can select an appropriate Skill based on your intent and execute tasks based on the workflow defined in the Skill.

For details about the positioning, Skill loading sequence, and directory description of OpenClaw, see the OpenClaw documentation and OpenClaw Skills.

Using Skills in Hermes

Hermes is a service orchestration platform for enterprise-class AI Agents. It supports Skill integration through declarative configuration.

Common Issues

When describing an issue, you can refer to the following table to quickly locate the recommended Skill.

Issue Description

Recommended Skill

Pod keeping restart, Pending, and OOMKilled

pod-failure-diagnoser

Release failure, rolling upgrade suspension, and insufficient replicas

workload-failure-diagnoser

Node NotReady, resource pressure, and node vulnerabilities

node-failure-diagnoser

HPA not scaling pods, CA not scaling nodes, and auto scaling not taking effect

autoscaling-diagnoser

Ingress 502, Service unreachable, ELB link exception

network-failure-diagnoser

PVC Pending, FailedMount, and capacity exhaustion

storage-failure-diagnoser

A large number of CCE alarms, which need to be combined for analysis

alarm-correlation-engine

Pod standard output or LTS application log query

log-analyzer

Kubernetes event trend analysis

kubernetes-event-analyzer

Query of CCE pod/node metrics and rankings by resource usage

metric-analyzer

Aggregation of logs, events, metrics, and alarms

observability-context-builder

Service unavailability, requiring comprehensive root cause analysis

root-cause-analyzer

Faults upon release, configuration, network, security policy, or node changes

change-impact-analyzer

Determining entries and upstream and downstream services affected by a service fault

dependency-impact-analyzer

Capacity expansion, restart, draining, and vulnerability fixing

auto-remediation-runner

Daily inspection or periodic health check

daily-cluster-inspector

Cost optimization and excessive request analysis

cost-optimization-advisor

Capacity trend prediction and scaling simulation

capacity-trend-forecaster

Availability risk scanning and PDB/probe check

availability-risk-scanner

Weekly, monthly, and SLA O&M reports

ops-report-generator

Container migration solution and resource stocktaking

container-migration-planner

Auto scaling configuration for scheduling CCE workloads to CCI

cce-cci-bursting-deployer

CCE cluster version upgrade planning

cce-cluster-upgrade-planner

CCE/UCS workload management

cce-workload-manager

UCS cluster management and fleet management

ucs-cluster-onboarding-manager

UCS policy governance and compliance audit

ucs-policy-governor

SWR image lifecycle management

huawei-cloud-swr-image-management

SWR image governance

huawei-cloud-swr-image-governance

SWR image automation

huawei-cloud-swr-image-automation

Pressure test solution and execution

Skill for full-link pressure test

Helpful Links

Document

Description

Path

CCE documentation

CCE documentation

Huawei Cloud CCE Documentation

Open Skill repository

Huawei Cloud cloud-native skill code repository

huaweicloud/huaweicloud-skills