Updated on 2024-04-30 GMT+08:00

Resource Pool

ModelArts Resource Pools

When using ModelArts for AI development, you can use either of the following resource pools:

  • Dedicated resource pool: It delivers more controllable resources and cannot be shared with other users. Create a dedicated resource pool and select it during AI development. The dedicated resource pool can be an elastic cluster or an elastic BMS.
    • Elastic cluster: It can be Standard or Lite.
      • In a Standard elastic cluster, exclusive computing resources are provided, with which you can deliver instances during job training, model deployment, and environment development on ModelArts.
      • A Lite elastic cluster provides hosted Kubernetes clusters with mainstream AI development plug-ins and acceleration plug-ins for Kubernetes resource users. You can operate the nodes and Kubernetes clusters in the resource pool with provided AI Native resources and tasks.
    • Elastic BMS: It provides different models of xPU BMSs. You can access an elastic BMS through an EIP and install GPU- and NPU-related drivers and software on a specified OS image. To meet the routine training requirements of algorithm engineers, SFS and OBS can be used to store and read data.
  • Public Resource Pool: provides large-scale public computing clusters, which are allocated based on job parameter settings. Resources are isolated by job. You can use ModelArts public resource pools to deliver training jobs, deploy models, or run DevEnviron instances and will be billed on a pay-per-use basis.

Differences Between Dedicated Resource Pools and Public Resource Pools

  • Dedicated resource pools provide dedicated computing clusters and network resources for users. The dedicated resource pools of different users are physically isolated, while public resource pools are only logically isolated. Compared with public resource pools, dedicated resource pools feature better performance in isolation and security.
  • When a dedicated resource pool is used for creating jobs and the resources are sufficient, the jobs will not be queued. When a public resource pool is used for creating jobs, there is a high probability that the jobs will be queued.
  • A dedicated resource pool is accessible to your network. All running jobs in the pool can access storage and resources in your network. For example, if you select a dedicated resource pool with an accessible network when creating a training job, you can access SFS data after the training job is created.
  • Dedicated resource pools allow you to customize the runtime environment of physical nodes, for example, you can upgrade GPU or Ascend drivers. This function is not supported by public resource pools.