Help Center/ High Performance Computing/ User Guide/ Overview/ HPC Management and Scheduling Plug-in
Updated on 2025-05-22 GMT+08:00

HPC Management and Scheduling Plug-in

Product Overview

The HPC management and scheduling plug-in is an end-to-end one-stop Huawei Cloud cluster resource usage and management platform that is developed based on Slurm. It provides one-click cluster delivery on a visualized interaction interface. It integrates SFS Turbo file systems to provide high-performance shared storage. You can perform operations on cluster users, compute resources, and service jobs on the UI. The plug-in supports quick modeling and computing in scenarios such as structural mechanics, fluid analysis, thermal simulation, and gene sequencing.

Core Functions

Function

Description

Partition Management

Divides logical resource pools and isolates resources of different teams or projects.

Cluster Management

Creates, destroys, and manages compute resources and monitors cluster metrics.

Topology Management

Defines the physical topology structure (such as racks and switches) of a cluster and optimizes job scheduling policies.

Job Management

Submits tasks based on user requirements and queries task logs, job status, completion time, and scheduled nodes.

Job Templates

Sets standard job configurations and submits tasks in one click.

Elastic Resource Supply

Configures at least one scaling policy for each partition to automatically scale in or out compute nodes based on the policy.

Elastic Job Scheduling

Schedules jobs based on policies, such as by priority, first in, first out (FIFO), and backfill scheduling.

Quota Management

Restricts resource usages of users or groups by QoS, accounts, and partitions to ensure fair access and prioritized use.

Data Management

Mounts SFS Turbo file systems to provide high-performance shared storage. Files smaller than 1 GB can be uploaded and downloaded on the cockpit UI.

Tag Management

Adds tags to nodes for fine-grained management of resources in the same partition.

Auditing Management

Logs user operations and resource usages.

Cluster O&M

Allows you to view node processes, system configurations, environment variables, and downloaded logs.

User Management

Has a built-in administrator account that can be used to create and delete common users. These users are assigned different roles to access the cluster UIs.

System Architecture and Deployment Requirements

Architecture Topology

  • Management and control nodes
    • Master node: has 16 vCPUs, 32-GB memory, and a 300-GB disk and is responsible for cluster scheduling, user management, and audit log storage
    • SFS Turbo: provides a shared file system. The mount path is /mnt/sfs_turbo_1.
  • Compute nodes: Pay-per-use or yearly/monthly compute nodes are created on the cockpit UI or using elastic policies.

Deployment Requirements

Component

Configuration Requirements

Master node

16 vCPUs, 32-GB memory, a 300-GB SSD disk, and associated with an EIP

SFS Turbo

On-demand capacity expansion by at least 1 TB and bandwidth of at least 1 Gbit/s

Compute nodes

You can create compute nodes on the cockpit UI and select specifications as required.

Deployment

For details, see Deployment.

Other Advanced Features