Updated on 2024-11-11 GMT+08:00

Usage Process

ModelArts Lite Cluster offers hosted Kubernetes clusters with pre-installed AI development and acceleration plug-ins. These elastic clusters allow you to access AI resources and tasks in a cloud-native environment. You can directly manage nodes and Kubernetes clusters within the resource pools. This document shows how to get started.

Figure 1 Resource pool architecture

This figure shows Lite Cluster architecture. To use Lite Cluster, start by purchasing a CCE cluster. Lite Cluster then manages resource nodes within this CCE cluster. After you purchase a Lite cluster on the ModelArts console, ModelArts manages the CCE cluster within a resource pool and creates compute nodes (BMSs/ECSs) based on the specifications you set. These nodes are then managed by CCE, and ModelArts installs necessary plug-ins (such as npuDriver and os-node-agent) in the CCE cluster. Once you have acquired a Lite Cluster resource pool, you can configure resources and upload data to the cloud storage service. When you require cluster resources, you can use the kubectl tool or Kubernetes APIs to submit jobs. Additionally, ModelArts offers scaling and driver upgrade to streamline cluster resource management.

Figure 2 Usage process

To use Lite Cluster, follow these steps:

  1. Resource subscription: Apply for the required specifications, configure permissions, and purchase Lite Cluster resources on the ModelArts console. For details, see Enabling Lite Cluster Resources.
  2. Resource configuration: After acquiring resources, set up network, storage, and drivers. For details, see Configuring Lite Cluster Resources.
  3. Resource usage: Once configured, use cluster resources for training and inference. For details, see Using Lite Cluster Resources.
  4. Resource management: Lite Cluster provides scaling and driver upgrades. You can manage resources on the ModelArts console. For details, see Managing Lite Server Resources.
Table 1 Terms

Term

Description

Container

Containers, rooted in Linux, are lightweight virtualization technologies that isolate processes and resources. Docker popularized containers by making them portable across different machines. It simplifies the packaging of both applications and the applications' repository and dependencies. Even an OS file system can be packaged into a simple portable package that can be used on any other machine that runs Docker.

Kubernetes

Kubernetes is an open-source system for automating deployment, scaling, and management of containerized applications. To use Lite Cluster, familiarity with Kubernetes is essential. For details, see Kubernetes Basics.

CCE

Cloud Container Engine (CCE) is a Kubernetes cluster hosting service for enterprises. It manages containerized applications and offers scalable, high-performance solutions for deploying and managing cloud native applications. For details, see What Is CCE?.

BMS

Combining VM scalability with physical server performance, BMS provides dedicated cloud servers. These servers are designed to meet the demands of computing performance and data security for core databases, critical applications, high-performance computing (HPC), and big data.

ECS

Elastic Cloud Server (ECS) provides scalable, on-demand cloud servers for secure, flexible, and efficient application environments, ensuring reliable, uninterrupted services.

os-node-agent

The os-node-agent plug-in is installed by default on ModelArts Lite Kubernetes cluster nodes, allowing for node management. For example:

  • Driver upgrades: The plug-in downloads and updates or rolls back driver versions.
  • Fault detection: It periodically checks for node faults.
  • Metric collection: The plug-in gathers key monitoring data, such as GPU and NPU usage, and sends it to AOM on the tenant side.
  • Node O&M: After authorization, the plug-in runs diagnosis scripts for fault identification and demarcation.