ModelArts
ModelArts
Todos os resultados de "
" dentro deste produto
Todos os resultados de "
" dentro deste produto
Visão geral de serviço
Infográficos
O que é o ModelArts?
O que é o ModelArts?
Funções
Conhecimento básico
Introdução ao ciclo de vida de desenvolvimento da IA
Conceitos básicos de desenvolvimento de IA
Conceitos comuns do ModelArts
Introdução às ferramentas de desenvolvimento
Treinamento de modelos
Implementação de modelos
Serviços relacionados
Como acessar o ModelArts?
Gerenciamento de permissões
Segurança
Responsabilidades compartilhadas
Identificação e gerenciamento de ativos
Autenticação de identidade e controle de acesso
Proteção de dados
Auditoria e registro em logs
Resiliência de serviço
Recuperação de falhas
Gerenciamento de atualização
Certificados
Fronteira da segurança
Cotas
Preparações
Criação de um ID da HUAWEI e ativação dos serviços da Huawei Cloud
Logon no console de gerenciamento do ModelArts
Configuração da autorização de acesso (configuração global)
Criação de um bucket do OBS
Ativação de recursos do ModelArts
Recursos do ModelArts
Pagamento por uso
DevEnviron
Introdução ao DevEnviron
Cenários de aplicações
Gerenciamento de instâncias do notebook
Criação de uma instância de notebook
Acesso a uma instância de notebook
Pesquisa, inicialização, interrupção ou exclusão de uma instância de notebook
Alteração de uma imagem de instância de notebook
Alteração do flavor de uma instância de notebook
Seleção de armazenamento no DevEnviron
Montagem dinâmica de um sistema de arquivos paralelo do OBS
Expansão dinâmica da capacidade de disco EVS
Modificação da configuração de SSH para uma instância de notebook
Exibição das instâncias do notebook de todos os usuários do IAM em uma conta de locatário
Exibição de eventos do notebook
Relatório de alarme do diretório de cache do notebook
JupyterLab
Processo de operação no JupyterLab
Visão geral do JupyterLab e operações comuns
Plug-in de parametrização de código
Uso do SDK do ModelArts
Uso do plug-in Git
Treinamento de modelo visualizado
Introdução à visualização do trabalho de treinamento
Trabalhos de visualização do MindInsight
Visualização de trabalhos do TensorBoard
Upload e download de dados no notebook
Upload de arquivos para JupyterLab
Cenários
Upload de arquivos de um caminho local para JupyterLab
Cenários e entradas de upload
Upload de um arquivo local com menos de 100 MB para JupyterLab
Upload de um arquivo local com um tamanho variando de 100 MB a 5 GB para JupyterLab
Upload de um arquivo local maior que 5 GB para JupyterLab
Clonagem de um repositório de código aberto no GitHub
Upload de arquivos do OBS para JupyterLab
Upload de arquivos remotos para JupyterLab
Download de um arquivo do JupyterLab para um caminho local
IDE local
Processo de operação em um IDE Local
IDE local (PyCharm)
Conexão a uma instância de notebook por meio do PyCharm Toolkit
PyCharm Toolkit
Baixa e instalação do PyCharm Toolkit
Conexão a uma instância de notebook por meio do PyCharm Toolkit
Conexão manual a uma instância de notebook por meio do PyCharm
Envio de um trabalho de treinamento usando PyCharm Toolkit
Envio de um trabalho de treinamento (nova versão)
Interrupção de um trabalho de treinamento
Exibição de logs de treinamento
Upload de dados para uma instância de notebook usando o PyCharm
IDE local (VS Code)
Conexão a uma instância de notebook por meio do VS Code
Instalação do VS Code
Conexão a uma instância de notebook por meio do VS Code Toolkit
Conexão manual a uma instância de notebook por meio do VS Code
Depuração remota no VS Code
Upload e download de arquivos no VS Code
IDE local (acesso usando SSH)
Referência de comandos da CLI do ModelArts
Visão geral da CLI do ModelArts
(Opcional) Instalação local de ma-cli
Completamento automático para comandos de ma-cli
Autenticação de ma-cli
Comando de criação de imagem de ma-cli
Comando de criação de imagem de ma-cli
Obtenção de um modelo de criação de imagem
Carregamento de um modelo de criação de imagem
Obtenção de imagens do ModelArts registradas
Criação de uma imagem no notebook do ModelArts
Obtenção de caches de criação de imagens no notebook do ModelArts
Limpeza de caches de criação de imagens no notebook do ModelArts
Registro de imagens do SWR com o gerenciamento de imagens do ModelArts.
Cancelamento de registro de uma imagem registrada do gerenciamento de imagens do ModelArts
Depuração de uma imagem do SWR em um ECS
Uso do comando ma-cli ma-job para enviar um trabalho de treinamento do ModelArts
Visão geral do comando ma-cli ma-job
Obtenção de trabalhos de treinamento do ModelArts
Envio de um trabalho de treinamento do ModelArts
Obtenção de registros de trabalho de treinamento do ModelArts
Obtenção de eventos de trabalho de treinamento do ModelArts
Obtenção de mecanismos de IA do ModelArts para treinamento
Obtenção de especificações de recursos do ModelArts para treinamento
Interrupção de um trabalho de treinamento do ModelArts
Uso do comando ma-cli dli-job para enviar um trabalho do Spark de DLI
Visão geral
Consulta de trabalhos do Spark de DLI
Envio de um trabalho do Spark de DLI
Consulta de logs de execução do Spark de DLI
Consulta de filas do DLI
Obtenção de recursos do grupo de DLI
Upload de arquivos locais ou arquivos do OBS para um grupo do DLI
Interrupção de um trabalho do Spark de DLI
Uso de ma-cli para copiar dados do OBS
Gerenciamento de recursos
Pool de recursos
Cluster elástico
Atualizações abrangentes das funções de gerenciamento do pool de recursos do ModelArts
Criação de um pool de recursos
Exibição de detalhes sobre um pool de recursos
Redimensionamento de um pool de recursos
Definição de uma política de renovação
Modificação da política de expiração
Migração do espaço de trabalho
Alteração de tipos de trabalho suportados por um pool de recursos
Atualização de um driver de pool de recursos
Exclusão de um pool de recursos
Status anormal de um pool de recursos dedicados
Rede do ModelArts
Nós do ModelArts
Logs de auditoria
Principais operações gravadas pelo CTS
Visualização de logs de auditoria
Monitoramento de recursos
Visão geral
Uso do Grafana para exibir as métricas de monitoramento do AOM
Procedimento
Instalação e configuração do Grafana
Instalação e configuração do Grafana no Windows
Instalação e configuração do Grafana no Linux
Instalação e configuração do Grafana em uma instância de notebook
Configuração de uma fonte de dados do Grafana
Uso do Grafana para configurar painéis e visualizar dados métricos
Exibição de todas as métricas de monitoramento do ModelArts no console do AOM
Contêineres do Docker com ModelArts
Gerenciamento de imagens
Uso de uma imagem predefinida
Imagens predefinidas no notebook
Imagens de base do notebook
Lista de imagens de base do notebook
Imagem de base do notebook com PyTorch x86
Imagem de base do notebook com Tensorflow (x86)
Imagem de base do notebook com MindSpore x86
Imagem de base do notebook com imagem dedicada personalizada (x86)
Imagem de base de treinamento
Imagens de base de treinamento disponíveis
Imagem de base de treinamento (PyTorch)
Imagem de base de treinamento (TensorFlow)
Imagem de base de treinamento (Horovod)
Imagem de base de treinamento (MPI)
Início do treinamento com uma imagem predefinida
PyTorch
TensorFlow
Horovod/MPI/MindSpore-GPU
Imagens da base de inferência
Imagens de base de inferência disponíveis
Imagens de base de inferência com TensorFlow (CPU/GPU)
Imagens de base de inferência com PyTorch (CPU/GPU)
Imagens de base de inferência com MindSpore (CPU/GPU)
Uso de imagens personalizadas em instâncias de notebook
Registro de uma imagem no ModelArts
Criação de uma imagem personalizada
Salvamento de uma instância de notebook como uma imagem personalizada
Salvamento de uma imagem de ambiente notebook
Uso de uma imagem personalizada para criar uma instância de notebook
Criação e uso de uma imagem personalizada no notebook
Cenários e processos de aplicação
Etapa 1 Criar uma imagem personalizada
Etapa 2 Registrar uma nova imagem
Etapa 3 Usar uma nova imagem para criar um ambiente de desenvolvimento
Criação de uma imagem personalizada em um ECS e sua utilização no notebook
Cenários e processos de aplicação
Etapa 1 Preparar um servidor de Docker e configurar um ambiente
Etapa 2 Criar uma imagem personalizada
Etapa 3 Registrar uma nova imagem
Etapa 5 Criar e iniciar um ambiente de desenvolvimento
Uso de uma imagem personalizada para treinar modelos (treinamento de modelo)
Visão geral
Exemplo: criar uma imagem personalizada para treinamento
Exemplo: criar uma imagem personalizada para treinamento (PyTorch + CPU/GPU)
Exemplo: criar uma imagem personalizada para treinamento (MPI + CPU/GPU)
Exemplo: criar uma imagem personalizada para treinamento (Horovod-PyTorch e GPUs)
Exemplo: criar uma imagem personalizada para treinamento (MindSpore e GPUs)
Exemplo: criar uma imagem personalizada para treinamento (TensorFlow e GPUs)
Preparação de uma imagem de treinamento
Especificações para imagens personalizadas para trabalhos de treinamento
Migração de uma imagem para o treinamento do ModelArts
Uso de uma imagem de base para criar uma imagem de treinamento
Instalação de MLNX_OFED em uma imagem de contêiner
Criação de um algoritmo usando uma imagem personalizada
Uso de uma imagem personalizada para criar um trabalho de treinamento baseado em CPU ou GPU
Processo de solução de problemas
Uso de uma imagem personalizada para criar aplicações de IA para implementação de inferência
Especificações de imagem personalizada para criar aplicações de IA
Criação de uma imagem personalizada e uso dela para criar uma aplicação de IA
Perguntas frequentes
Como acessar o SWR e carregar imagens para ele?
Como configurar variáveis de ambiente para uma imagem?
Como usar o Docker para iniciar uma imagem salva usando uma instância de notebook?
Como configurar uma fonte de Conda em um ambiente de desenvolvimento de notebook?
Quais são as versões de software suportadas para uma imagem personalizada?
Histórico de modificações
Melhores práticas
Pool de recursos
Cluster elástico
Atualizações abrangentes das funções de gerenciamento do pool de recursos do ModelArts
Criação de um pool de recursos
Exibição de detalhes sobre um pool de recursos
Redimensionamento de um pool de recursos
Definição de uma política de renovação
Modificação da política de expiração
Migração do espaço de trabalho
Alteração de tipos de trabalho suportados por um pool de recursos
Atualização de um driver de pool de recursos
Exclusão de um pool de recursos
Status anormal de um pool de recursos dedicados
Rede do ModelArts
Nós do ModelArts
Logs de auditoria
Principais operações gravadas pelo CTS
Visualização de logs de auditoria
Monitoramento de recursos
Visão geral
Uso do Grafana para exibir as métricas de monitoramento do AOM
Procedimento
Instalação e configuração do Grafana
Instalação e configuração do Grafana no Windows
Instalação e configuração do Grafana no Linux
Instalação e configuração do Grafana em uma instância de notebook
Configuração de uma fonte de dados do Grafana
Uso do Grafana para configurar painéis e visualizar dados métricos
Exibição de todas as métricas de monitoramento do ModelArts no console do AOM
What's New
Function Overview
Product Bulletin
Product Bulletin
Billing
Overview
Billing Modes
Overview
Yearly/Monthly
Pay-per-Use
Billing Item (ModelArts Standard)
Data Management
Development Environment
Model Training
Model Management
Inference Deployment
Dedicated Resource Pool
Billing Item (ModelArts Studio)
Model Inference Billing Items
Billing Examples
Changing the Billing Mode
Renewal
Overview
Manual Renewal
Auto-Renewal
Bills
About Arrears
Stopping Billing
Cost Management
Billing FAQs
How Do I View the ModelArts Jobs Being Billed?
How Do I View ModelArts Expenditure Details?
How Do I Stop Billing If I Do Not Use ModelArts?
Billing FAQs About ModelArts Standard Data Management
What Should I Do to Avoid Unnecessary Billing After I Label Datasets and Exit?
How Are Training Jobs Billed?
Why Does Billing Continue After All Projects Are Deleted?
Historical Documents to Be Brought Offline
Package
Getting Started
How to Use ModelArts
Building a Handwritten Digit Recognition Model with ModelArts Standard
Practices for Beginners
ModelArts User Guide (Standard)
ModelArts Standard Usage
ModelArts Standard Preparations
Configuring Access Authorization for ModelArts Standard
Configuring Agency Authorization for ModelArts with One Click
Creating an IAM User and Granting ModelArts Permissions
Creating and Managing a Workspace
Creating an OBS Bucket for ModelArts to Store Data
ModelArts Standard Resource Management
About ModelArts Standard Resource Pools
Creating a Standard Dedicated Resource Pool
Managing Standard Dedicated Resource Pools
Viewing Details About a Standard Dedicated Resource Pool
Resizing a Standard Dedicated Resource Pool
Upgrading the Standard Dedicated Resource Pool Driver
Rectifying a Faulty Node in a Standard Dedicated Resource Pool
Modifying the Job Types Supported by a Standard Dedicated Resource Pool
Migrating Standard Dedicated Resource Pools and Networks to Other Workspaces
Configuring the Standard Dedicated Resource Pool to Access the Internet
Using TMS Tags to Manage Resources by Group
Managing Logical Subpools of a Standard Dedicated Resource Pool
Releasing Standard Dedicated Resource Pools and Deleting the Network
Managing Standard Dedicated Resource Pool Plug-ins
Overview
Node Fault Detection (ModelArts Node Agent)
ModelArts Metric Collector
AI Suite (NV GPU)
AI Suite (Modelarts Device Plugin)
Volcano Scheduler
NodeLocal DNSCache
Cloud Native Log Collection
kube-prometheus-stack
Using Workflows for Low-Code AI Development
What Is Workflow?
Managing a Workflow
Searching for a Workflow
Viewing the Running Records of a Workflow
Managing a Workflow
Retrying, Stopping, or Running a Workflow Phase
Workflow Development Command Reference
Core Concepts of Workflow Development
Configuring Workflow Parameters
Configuring the Input and Output Paths of a Workflow
Creating Workflow Phases
Creating a Dataset Phase
Creating a Dataset Labeling Phase
Creating a Dataset Import Phase
Creating a Dataset Release Phase
Creating a Training Job Phase
Creating a Model Registration Phase
Creating a Service Deployment Phase
Creating a Multi-Branch Workflow
Multi-Branch Workflow
Creating a Condition Phase to Control Branch Execution
Configuring Phase Parameters to Control Branch Execution
Configuring Multi-Branch Phase Data
Creating a Workflow
Publishing a Workflow
Publishing a Workflow to ModelArts
Publishing a Workflow to AI Gallery
Advanced Workflow Capabilities
Using Big Data Capabilities (MRS) in a Workflow
Specifying Certain Phases to Run in a Workflow
Using Notebook for AI Development and Debugging
Application Scenarios
Creating a Notebook Instance (Default Page)
Creating a Notebook Instance (New Page)
Managing Notebook Instances
Searching for a Notebook Instance
Updating a Notebook Instance
Starting, Stopping, or Deleting a Notebook Instance
Saving a Notebook Instance
Dynamically Expanding EVS Disk Capacity
Dynamically Mounting an OBS Parallel File System
Viewing Notebook Events
Notebook Cache Directory Alarm Reporting
Using a Notebook Instance for AI Development Through JupyterLab
Using JupyterLab to Develop and Debug Code Online
Common Functions of JupyterLab
Using Git to Clone the Code Repository in JupyterLab
Creating a Scheduled Job in JupyterLab
Uploading Files to JupyterLab
Uploading Files from a Local Path to JupyterLab
Cloning GitHub Open-Source Repository Files to JupyterLab
Uploading OBS Files to JupyterLab
Uploading Remote Files to JupyterLab
Downloading a File from JupyterLab to a Local PC
Using MindInsight Visualization Jobs in JupyterLab
Using TensorBoard Visualization Jobs in JupyterLab
Using Notebook Instances Remotely Through PyCharm
Connecting to a Notebook Instance Through PyCharm Toolkit
Manually Connecting to a Notebook Instance Through PyCharm
Uploading Data to a Notebook Instance Through PyCharm
Using Notebook Instances Remotely Through VS Code
Connecting to a Notebook Instance Through VS Code
Connecting to a Notebook Instance Through VS Code Toolkit
Manually Connecting to a Notebook Instance Through VS Code
Uploading and Downloading Files in VS Code
Using a Notebook Instance Remotely with SSH
ModelArts CLI Command Reference
ModelArts CLI Commands
(Optional) Installing ma-cli Locally
Autocompletion for ma-cli Commands
ma-cli Authentication
ma-cli image Commands for Building Images
ma-cli ma-job Commands for Training Jobs
ma-cli dli-job Commands for Submitting DLI Spark Jobs
Using ma-cli to Copy OBS Data
Using MoXing Commands in a Notebook Instance
MoXing Framework Functions
Using MoXing in Notebook
Introducing MoXing Framework
Mapping Between mox.file and Local APIs and Switchover
Sample Code for Common Operations
Sample Code for Advanced MoXing Usage
Preparing and Processing Data
Preparing Data
Creating a ModelArts Dataset
Importing Data to a ModelArts Dataset
Introduction to Data Importing
Importing Data from OBS
Introduction to Importing Data from OBS
Importing Data from an OBS Path to ModelArts
Specifications for Importing Data from an OBS Directory
Importing a Manifest File to ModelArts
Specifications for Importing a Manifest File
Importing Data from MRS to ModelArts
Importing Data from Local Files
Labeling ModelArts Data
Scenarios
Manual Labeling
Creating a Manual Labeling Job
Labeling Images
Labeling Text
Labeling Audio
Labeling Video
Managing Labeling Data
Auto Labeling
Creating an Auto Labeling Job
Hard Examples of an Auto Labeling Job
Auto Grouping for Labeling Jobs
Team Labeling
Using Team Labeling
Creating and Managing Teams
Creating a Team Labeling Job
Reviewing and Accepting Team Labeling Results
Managing Teams and Team Members
Managing Labeling Jobs
Publishing ModelArts Data
Analyzing ModelArts Data Characteristics
Exporting Data from a ModelArts Dataset
Exporting Data from ModelArts to OBS
Exporting Data as a New Dataset
Getting Started: Creating an Object Detection Dataset
Using ModelArts Standard to Train Models
Model Training Process
Preparing Model Training Code
Starting Training Using a Preset Image's Boot File
Developing Code for Training Using a Preset Image
Developing Code for Training Using a Custom Image
Configuring Password-free SSH Mutual Trust Between Instances for a Training Job Created Using a Custom Image
Preparing a Model Training Image
Creating an Algorithm
Creating a Production Training Job
Creating a Production Training Job (New Version)
Distributed Model Training
Overview
Creating a Single-Node Multi-PU Distributed Training Job (DataParallel)
Creating a Multiple-Node Multi-PU Distributed Training Job (DistributedDataParallel)
Example: Creating a DDP Distributed Training Job (PyTorch + GPU)
Example: Creating a DDP Distributed Training Job (PyTorch + NPU)
Enabling Dynamic Route Acceleration for Training Jobs
Incremental Model Training
Automatic Model Tuning (AutoSearch)
Overview
Creating a Training Job for Automatic Model Tuning
High Model Training Reliability
Training Job Fault Tolerance Check
Training Log Failure Analysis
Detecting Training Job Suspension
Training Job Restart Upon Suspension
Resumable Training
Enabling Unconditional Auto Restart
Configuring Supernode Affinity Group Instances
Managing Model Training Jobs
Viewing Training Job Details
Visualizing the Training Job Process
Viewing the Resource Usage of a Training Job
Viewing the Model Evaluation Result
Viewing Training Job Events
Viewing Training Job Logs
Priority of a Training Job
Using Cloud Shell to Debug a Production Training Job
Saving the Image of a Debug Training Job
Copying, Stopping, or Deleting a Training Job
Managing Environment Variables of a Training Container
Viewing Training Job Tags
Managing Training Experiments
Viewing Monitoring Metrics of a Training Job
Using ModelArts Standard to Deploy Models for Inference and Prediction
Overview
Creating a Model
Creation Methods
Importing a Meta Model from a Training Job
Importing a Meta Model from OBS
Importing a Meta Model from a Container Image
Model Creation Specifications
Model Package Structure
Specifications for Editing a Model Configuration File
Specifications for Writing a Model Inference Code File
Specifications for Using a Custom Engine to Create a Model
Examples of Custom Scripts
Deploying a Model as Real-Time Inference Jobs
Deploying and Using Real-Time Inference
Deploying a Model as a Real-Time Service
Authentication Methods for Accessing Real-time Services
Accessing a Real-Time Service Through Token-based Authentication
Accessing a Real-Time Service Through AK/SK-based Authentication
Accessing a Real-Time Service Through App Authentication
Accessing a Real-Time Service Through Different Channels
Accessing a Real-Time Service Through a Public Network
Accessing a Real-Time Service Through a VPC Channel
Accessing a Real-Time Service Through a VPC High-Speed Channel
Accessing a Real-Time Service Using Different Protocols
Accessing a Real-Time Service Using WebSocket
Accessing a Real-Time Service Using Server-Sent Events
Deploying a Model as a Batch Inference Service
Managing ModelArts Models
Viewing ModelArts Model Details
Viewing ModelArts Model Events
Managing ModelArts Model Versions
Managing a Synchronous Real-Time Service
Viewing Details About a Real-Time Service
Viewing Events of a Real-Time Service
Managing the Lifecycle of a Real-Time Service
Modifying a Real-Time Service
Viewing Performance Metrics of a Real-Time Service on Cloud Eye
Integrating a Real-Time Service API into the Production Environment
Configuring Auto Restart upon a Real-Time Service Fault
Managing Batch Inference Jobs
Viewing Details About a Batch Service
Viewing Events of a Batch Service
Managing the Lifecycle of a Batch Service
Modifying a Batch Service
Creating a Custom Image for ModelArts Standard
Applications of Custom Images
Preset Images Supported by ModelArts
ModelArts Preset Image Updates
ModelArts Unified Images
Preset Dedicated Images in Notebook Instances
Preset Dedicated Images for Training
Preset Dedicated Images for Inference
Creating a Custom Image for a Notebook Instance
Creating a Custom Image
Creating a Custom Image on ECS and Using It
Creating a Custom Image Using Dockerfile
Creating a Custom Image Using the Image Saving Function
Creating a Custom Image for Model Training
Creating a Custom Training Image
Creating a Custom Training Image Using a Preset Image
Migrating Existing Images to ModelArts
Creating a Custom Training Image (PyTorch + Ascend)
Creating a Custom Training Image (PyTorch + CPU/GPU)
Creating a Custom Training Image (MPI + CPU/GPU)
Creating a Custom Training Image (Tensorflow + GPU)
Creating a Custom Training Image (MindSpore + Ascend)
Creating a Custom Image for Inference
Creating a Custom Image for a Model
Creating a Custom Image on ECS
Monitoring ModelArts Standard Resources
Overview
Viewing Monitoring Metrics on the ModelArts Console
Viewing All ModelArts Monitoring Metrics on the AOM Console
Using Grafana to View AOM Monitoring Metrics
Installing and Configuring Grafana
Installing and Configuring Grafana on Windows
Installing and Configuring Grafana on Linux
Installing and Configuring Grafana on a Notebook Instance
Configuring a Grafana Data Source
Configuring a Dashboard to View Metric Data
Using CTS to Audit ModelArts Standard
ModelArts Standard Key Operations Traced by CTS
Viewing ModelArts Standard Audit Logs
ModelArts Studio (MaaS) User Guide
ModelArts Studio (MaaS) Usage
Configuring ModelArts Studio (MaaS) Access Authorization
Creating an IAM User and Granting ModelArts Studio (MaaS) Permissions
Configuring ModelArts Agency Authorization for Using ModelArts Studio (MaaS)
Configuring the Missing ModelArts Studio (MaaS) Permissions
Preparing ModelArts Studio (MaaS) Resources
ModelArts Studio (MaaS) Real-Time Inference Services
Viewing a Built-in Model in ModelArts Studio (MaaS)
Deploying a Model Service in ModelArts Studio (MaaS)
Managing My Services in ModelArts Studio (MaaS)
Starting, Stopping, Periodically Starting or Stopping, or Deleting a Service in ModelArts Studio (MaaS)
Scaling Model Service Instances in ModelArts Studio (MaaS)
Modifying the QPS of a Model Service in ModelArts Studio (MaaS)
Upgrading a Model Service in ModelArts Studio (MaaS)
Calling a Model Service in ModelArts Studio (MaaS)
ModelArts Studio (MaaS) API Call Specifications
Sending a Chat Request (Chat/POST)
Obtaining the Model List (Models/GET)
Error Codes
Creating a Multi-Turn Dialogue in ModelArts Studio (MaaS)
ModelArts Studio (MaaS) Management and Statistics
Managing API Keys in ModelArts Studio (MaaS)
ModelArts User Guide (Lite Server)
Before You Start
Using Lite Server
High-Risk Operations
Mapping Between Compute Resources and Image Versions
Provisioning Lite Server Resources (Old Version)
Provisioning Lite Server Resources (New Version)
Configuring Lite Server Resources
Configuration Process
Configuring the Network
Configuring the Storage
Configuring the Software Environment
Configuring the Software Environment on the NPU Server
Using Lite Server Resources
Collecting and Uploading NPU Logs
Collecting and Uploading GPU Logs
Managing Lite Server Resources
Viewing Lite Server Details
Starting or Stopping the Lite Server
Synchronizing the Lite Server Status
Changing or Resetting the Lite Server OS
Creating a Lite Server OS
Lite Server Hot Standby Nodes
Modifying a Lite Server Name
Authorizing the Repair of Lite Server Nodes
Releasing Lite Server Resources
Lite Server Plug-in Management
Managing Lite Server AI Plug-ins
Upgrading the Ascend Driver and Firmware Version on Lite Server
Lite Server Node Fault Diagnosis
One-Click Pressure Test for Lite Server Nodes
Managing Lite Server Supernodes
Expanding and Reducing Lite Server Supernodes
Periodic Stress Test on Lite Server Supernodes
Enabling HCCL Communication Operator-Level Re-execution for Supernodes
Monitoring Lite Server Resources
Using Cloud Eye to Monitor NPU Resources of a Single Lite Server Node
Using Cloud Eye to Monitor the Health Status of Snt9B23 Supernodes
Managing CloudPond NPU Resources for Lite Server
Using CTS to Audit Lite Server Operations
ModelArts User Guide (Lite Cluster)
Before You Start
Using Lite Cluster
High-Risk Operations
Software Versions Required by Different Models
Enabling Lite Cluster Resources
Configuring Lite Cluster Resources
Configuring the Lite Cluster Environment
Configuring the Lite Cluster Network
Configuring kubectl
Configuring Lite Cluster Storage
(Optional) Configuring the Driver
(Optional) Configuring Image Pre-provisioning
Using Lite Cluster Resources
Using Snt9B for Distributed Training in a Lite Cluster Resource Pool
Performing PyTorch NPU Distributed Training In a ModelArts Lite Resource Pool Using Ranktable-based Route Planning
Using Snt9B for Inference in a Lite Cluster Resource Pool
Using Ascend FaultDiag to Diagnose Logs in the ModelArts Lite Cluster Resource Pool
Mounting an SFS Turbo File System to a Lite Cluster
Managing Lite Cluster Resources
Managing Lite Cluster Resources
Managing Lite Cluster Resource Pools
Managing Lite Cluster Node Pools
Managing Lite Cluster Nodes
Resizing a Lite Cluster Resource Pool
Upgrading the Lite Cluster Resource Pool Driver
Upgrading the Driver of a Lite Cluster Resource Pool Node
Monitoring Lite Cluster Resources
Viewing Lite Cluster Metrics on AOM
Viewing Lite Cluster Metrics Using Prometheus
Releasing Lite Cluster Resources
Lite Cluster Plug-in Management
Overview
Node Fault Detection (ModelArts Node Agent)
ModelArts Metric Collector
AI Suite (ModelArts Device Plugin)
Volcano Scheduler
Cluster Autoscaler
ModelArts User Guide (AI Gallery)
AI Gallery
Free Assets
My Gallery
Subscription & Use
Searching for and Adding an Asset to Favorites
Subscribing to Free Algorithms
Subscribing to a Workflow
Publish & Share
Publishing a Free Algorithm
Publishing a Free Model
API Reference
Before You Start
API Overview
Calling APIs
Making an API Request
Authentication
Response
Development Environment Management
Creating a Notebook Instance
Querying Notebook Instances
Querying All Notebook Instances
Querying Details of a Notebook Instance
Updating a Notebook Instance
Deleting a Notebook Instance
Saving a Running Instance as a Container Image
Querying the List of Valid Specifications Supported by Notebook Instances
Querying the List of Switchable Specifications Supported by Notebook Instances
Querying the Available Duration of a Running Notebook Instance
Prolonging a Notebook Instance
Starting a Notebook Instance
Stopping a Notebook Instance
Obtaining the Notebook Instances with OBS Storage Mounted
OBS Storage Mounting
Obtaining Details About a Notebook Instance with OBS Storage Mounted
Unmounting OBS Storage from a Notebook Instance
Querying Supported Images
Registering a Custom Image
Querying the User Image List
Obtaining Details of an Image
Deleting an Image
Training Management
Creating an Algorithm
Querying the Algorithm List
Querying Algorithm Details
Modifying an Algorithm
Deleting an Algorithm
Creating a Training Job
Querying the Details About a Training Job
Modifying the Description of a Training Job
Deleting a Training Job
Terminating a Training Job
Querying the Logs of a Specified Task in a Given Training Job (Preview)
Querying the Logs of a Specified Task in a Training Job (OBS Link)
Querying the Running Metrics of a Specified Task in a Training Job
Querying a Training Job List
Obtaining the Events of a Training Job
Obtaining the General Specifications Supported by a Training Job
Obtaining the Preset AI Frameworks Supported by a Training Job
App Authentication Management
Obtaining the App List
Creating Apps
Obtaining App Details
Deleting an App
Adding an App Code
Resetting an App Code
Deleting an App Code
Resetting an AppSecret
Obtaining the List of APIs Bound to an App
Registering an API and Authorizing the API to an App
Deleting an API
Authorizing an API to an App
Updating API Authorization
Canceling the Authorization of an API to an App
Obtaining API Authorization Relationships
Creates an API.
API query
Querying APIs and Apps
Check whether the app exists.
Service Management
Updating the Service Through the Patch Operation
Obtaining Service Monitoring
Obtaining Services
Deploying Services
Obtaining Supported Service Deployment Specifications
Obtaining Service Details
Updating Service Configurations
Deleting a Service
Updating a Single Property of a Model Service
Obtaining Service Event Logs
Obtaining Service Update Logs
Adding a Resource Tag
Deleting Resource Tags
Obtaining Inference Service Tags
Obtaining an Inference VPC Access Channel
Resource Management
Querying OS Configuration Parameters
Querying a Plug-in Template
Obtaining Nodes in a Resource Pool
Deleting nodes in batches
Querying a Trace List
Creating Network Resources
Obtaining Network Resources
Obtaining a Network Resource
Deleting a Network Resource
Updating a Network Resource
Querying the Real-Time Resource Usage
Creating Resource Pools
Obtaining Resource Pools
Obtaining a Resource Pool
Deleting a Resource Pool
Updating a Resource Pool
Monitoring a Resource Pool
Resource Pool Statistics
Obtaining Resource Specifications
Obtain Jobs in a Resource Pool
Querying dedicated resource pool Job Statistics
DevServer Management
Obtaining All DevServer Instances of a User
Creating a DevServer Instance
Obtaining DevServer Instance Details
Deleting DevServer Instances
Synchronizing the Status of All DevServer Instances of a User in Real Time
Starting DevServer Instances
Stopping DevServer Instances
Creating DevServer Supernode Tags
Deleting DevServer Supernode Tags
Obtaining the DevServer Supernode Tags
Reinstalling the OS Image of the DevServer Server
Changing the OS Image of the DevServer Server
Changing the OS Image of the DevServer Supernode Server
Obtaining Details About All Supernode Instances of a User
Deleting a DevServer Supernode Instance
Restarting a DevServer Instance
Starting a DevServer Supernode Server
Stopping a DevServer Supernode Server
Authorization Management
Viewing an Authorization List
Configuring Authorization
Deleting Authorization
Creating a ModelArts Agency
Workspace Management
Querying Details About a Workspace
Modifying a Workspace
Deleting a Workspace
Querying a Workspace Quota
Modifying a Workspace Quota
Querying a Workspace List
Creating a Workspace
Quota Management
Obtaining OS Quotas
Resource Tag Management
Obtaining All Tags of Resource Pools
Obtaining Tags of a Resource Pool
Node Pool Management
Obtaining Node Pools
Creating a Node Pool
Obtaining Details About a Specified Node Pool
Updating a Node Pool
Deleting a Node Pool
Obtaining Nodes in a Node Pool
Node Management
Locking Node Functions in Batches
Unlocking Node Functions in Batches
Changing the Node Specifications
AI Application Management
Obtaining the Model Runtime
Querying the AI Application List
Creating an AI Application
Obtaining Details About an AI Application
Deleting an AI application
Application Authentication Management
Querying the API Authentication Information of an Application
Use Cases
Creating a Development Environment Instance
Using PyTorch to Create a Training Job (New-Version Training)
Managing ModelArts Authorization
Permissions Policies and Supported Actions
Introduction
Data Management Permissions
DevEnviron Permissions
Training Job Permissions
Model Management Permissions
Service Management Permissions
Appendix
Status Code
Error Codes
Obtaining a Project ID and Name
Obtaining an Account Name and ID
Obtaining a Username and ID
Historical APIs
Data Management (Old Version)
Querying the Dataset List
Creating a Dataset
Querying Details About a Dataset
Modifying a Dataset
Deleting a Dataset
Obtaining Dataset Statistics
Querying the Monitoring Data of a Dataset
Querying the Dataset Version List
Creating a Dataset Labeling Version
Querying Details About a Dataset Version
Deleting a Dataset Labeling Version
Obtaining a Sample List
Adding Samples in Batches
Deleting Samples in Batches
Obtaining Details About a Sample
Obtaining Sample Search Condition
Obtaining a Sample List of a Team Labeling Task by Page
Obtaining Details About a Team Labeling Sample
Querying the Dataset Label List
Creating a Dataset Label
Modifying Labels in Batches
Deleting Labels in Batches
Updating a Label by Label Names
Deleting a Label and the Files that Only Contain the Label
Updating Sample Labels in Batches
Querying the Team Labeling Task List of a Dataset
Creating a Team Labeling Task
Querying Details About a Team Labeling Task
Starting a Team Labeling Task
Updating a Team Labeling Task
Deleting a Team Labeling Task
Creating a Team Labeling Acceptance Task
Querying the Report of a Team Labeling Acceptance Task
Updating Status of a Team Labeling Acceptance Task
Querying Details About Team Labeling Task Statistics
Querying Details About the Progress of a Team Labeling Task Member
Querying the Team Labeling Task List by a Team Member
Submitting Sample Review Comments of an Acceptance Task
Reviewing Team Labeling Results
Updating Labels of Team Labeling Samples in Batches
Querying the Labeling Team List
Creating a Labeling Team
Querying Details About a Labeling Team
Updating a Labeling Team
Deleting a Labeling Team
Sending an Email to a Labeling Team Member
Querying the List of All Labeling Team Members
Querying the List of Labeling Team Members
Creating a Labeling Team Member
Deleting Labeling Team Members in Batches
Querying Details About Labeling Team Members
Updating a Labeling Team Member
Deleting a Labeling Team Member
Querying the Dataset Import Task List
Creating an Import Task
Querying Details About a Dataset Import Task
Querying the Dataset Export Task List
Creating a Dataset Export Task
Querying the Status of a Dataset Export Task
Synchronizing a Dataset
Querying the Status of a Dataset Synchronization Task
Obtaining an Auto Labeling Sample List
Querying Details About an Auto Labeling Sample
Obtaining an Auto Labeling Task List by Page
Starting Intelligent Tasks
Obtaining Details About an Auto Labeling Task
Stopping an Intelligent Task
Querying the List of a Processing Task
Creating a Processing Task
Querying Details About a Processing Task
Updating a Processing Task
Deleting a Processing Task
DevEnviron (Old Version)
Creating a Development Environment Instance
Obtaining Development Environment Instances
Obtaining Details About a Development Environment Instance
Modifying the Description of a Development Environment Instance
Deleting a Development Environment Instance
Managing a Development Environment Instance
Training Management (Old Version)
Training Jobs
Creating a Training Job
Querying a Training Job List
Querying the Details About a Training Job Version
Deleting a Version of a Training Job
Obtaining Training Job Versions
Creating a Version of a Training Job
Stopping a Training Job
Modifying the Description of a Training Job
Deleting a Training Job
Obtaining the Name of a Training Job Log File
Querying a Built-in Algorithm
Querying Training Job Logs
Training Job Parameter Configuration
Creating a Training Job Configuration
Querying a List of Training Job Configurations
Modifying a Training Job Configuration
Deleting a Training Job Configuration
Querying the Details About a Training Job Configuration
Visualization Jobs
Creating a Visualization Job
Querying a Visualization Job List
Querying the Details About a Visualization Job
Modifying the Description of a Visualization Job
Deleting a Visualization Job
Stopping a Visualization Job
Restarting a Visualization Job
Resource and Engine Specifications
Querying Job Resource Specifications
Querying Job Engine Specifications
Job Statuses
SDK Reference
Before You Start
SDK Overview
Getting Started
(Optional) Installing the ModelArts SDK Locally
Session Authentication
(Optional) Session Authentication
Authentication Using the Username and Password
AK/SK-based Authentication
OBS Management
Overview of OBS Management
Transferring Files (Recommended)
Uploading a File to OBS
Uploading a Folder to OBS
Downloading a File from OBS
Downloading a Folder from OBS
Data Management
Managing Datasets
Querying a Dataset List
Creating a Dataset
Querying Details About a Dataset
Modifying a Dataset
Deleting a Dataset
Managing Dataset Versions
Obtaining a Dataset Version List
Creating a Dataset Version
Querying Details About a Dataset Version
Deleting a Dataset Version
Managing Samples
Querying a Sample List
Querying Details About a Sample
Deleting Samples in a Batch
Managing Dataset Import Tasks
Querying a Dataset Import Task List
Creating a Dataset Import Task
Querying the Status of a Dataset Import Task
Managing Export Tasks
Querying a Dataset Export Task List
Creating a Dataset Export Task
Querying the Status of a Dataset Export Task
Managing Manifest Files
Overview of Manifest Management
Parsing a Manifest File
Creating and Saving a Manifest File
Parsing a Pascal VOC File
Creating and Saving a Pascal VOC File
Managing Labeling Jobs
Creating a Labeling Job
Obtaining the Labeling Job List of a Dataset
Obtaining Details About a Labeling Job
Training Management (New Version)
Training Jobs
Creating a Training Job
Debugging a Training Job
Using the SDK to Debug a Multi-Node Distributed Training Job
Using the SDK to Debug a Single-Node Training Job
Obtaining Training Jobs
Obtaining the Details About a Training Job
Modifying the Description of a Training Job
Deleting a Training Job
Terminating a Training Job
Obtaining Training Logs
Obtaining the Runtime Metrics of a Training Job
APIs for Resources and Engine Specifications
Obtaining Resource Flavors
Obtaining Engine Types
Training Management (Old Version)
Training Jobs
Creating a Training Job
Debugging a Training Job
Querying the List of Training Jobs
Querying the Details About a Training Job
Modifying the Description of a Training Job
Obtaining the Name of a Training Job Log File
Querying Training Job Logs
Deleting a Training Job
Training Job Versions
Creating a Training Job Version
Querying the List of Training Job Versions
Querying the Details About a Training Job Version
Stopping a Training Job Version
Deleting a Training Job Version
Training Job Parameter Configuration
Creating a Training Job Configuration
Querying the List of Training Job Parameter Configuration Objects
Querying the List of Training Job Configurations
Querying the Details About a Training Job Configuration
Modifying a Training Job Configuration
Deleting a Training Job Configuration
Visualization Jobs
Creating a Visualization Job
Querying the List of Visualization Job Objects
Querying the List of Visualization Jobs
Querying the Details About a Visualization Job
Modifying the Description of a Visualization Job
Stopping a Visualization Job
Restarting a Visualization Job
Deleting a Visualization Job
Resource and Engine Specifications
Querying a Built-in Algorithm
Querying the List of Resource Flavors
Querying the List of Engine Types
Job Statuses
Model Management
Debugging a Model
Importing a Model
Obtaining Models
Obtaining Model Objects
Obtaining Details About a Model
Deleting a Model
Service Management
Service Management Overview
Deploying a Local Service for Debugging
Deploying a Real-Time Service
Obtaining Details About a Service
Testing an Inference Service
Obtaining Services
Obtaining Service Objects
Updating Service Configurations
Obtaining Service Monitoring Information
Obtaining Service Logs
Delete a Service
Change History
FAQs
Permissions
What Do I Do If a Message Indicating Insufficient Permissions Is Displayed When I Use ModelArts?
How Do I Isolate IAM Users on a Notebook Instance?
How Do I Obtain an Access Key?
Storage
How Do I View All Files Stored in OBS on ModelArts?
Standard Workflow
How Do I Locate Workflow Running Errors?
ModelArts Standard Data Preparation
Is There a File Size Limit for Images to Be Added to a ModelArts Dataset?
How Do I Import Local Labeled Data to ModelArts?
Where Are the Data Labeling Results Stored in ModelArts?
How Do I Download Labeling Results from ModelArts to a Local PC?
Why Can't Team Members Receive Emails for a Team Labeling Task in ModelArts?
How Is Data Distributed Between Team Members During Team Labeling in ModelArts?
How Do I Merge Two Datasets in ModelArts?
Why Are Images Displayed in Different Angles Under the Same Account in ModelArts?
Do I Need to Train the Model Again Using the Data Newly Added After Auto Labeling Is Complete in ModelArts?
How Do I Split an Image Dataset into Training and Validation Sets in ModelArts?
Can I Customize Labels During Object Detection Labeling in ModelArts?
What Should I Do If I Can't Find a New Dataset Version in ModelArts?
How Do I Split a Dataset in ModelArts?
How Do I Delete Images from a Dataset in ModelArts?
ModelArts Standard Notebook
Is the Keras Engine Supported by ModelArts Notebook Instances?
How Do I Upload a File from a Notebook Instance to OBS or Download a File from OBS to a Notebook Instance in ModelArts?
Where Is Data Uploaded from a ModelArts Notebook Instance?
How Do I Copy Data from Notebook A to Notebook B in ModelArts?
How Do I Rename an OBS File on a ModelArts Notebook Instance?
How Do I Use the pandas Library to Process Data in OBS Buckets on a ModelArts Notebook Instance?
How Do I Access the OBS Bucket of Another Account from a ModelArts Notebook Instance?
What Is the Default Working Directory of JupyterLab on ModelArts Notebook Instances?
How Do I Check the CUDA Version Used by a ModelArts Notebook Instance?
How Do I Obtain the External IP Address of the Local Host from a ModelArts Notebook Instance?
Is There a Proxy for ModelArts Notebook Instances? How Do I Disable It?
How Do I Customize Engine IPython Kernel If the Built-in Engines of ModelArts Notebook Instances Do Not Meet My Requirements?
What Should I Do If It Is Unstable to Install the Remote Plug-in on a ModelArts Notebook Instance?
How Do I Connect to a Restarted ModelArts Notebook Instance?
What Should I Do If the Source Code Cannot Be Accessed When I Use VS Code to Debug Code on a ModelArts Notebook Instance?
How Do I View Remote Logs Using VS Code on a ModelArts Notebook Instance?
How Do I Open the VS Code Configuration File settings.json on a ModelArts Notebook Instance?
How Do I Set the Background Color of VS Code to Bean Green on a ModelArts Notebook Instance?
How Do I Configure the Default Plug-in Remotely Installed for VS Code on a ModelArts Notebook Instance?
How Do I Install a Local Plug-in Remotely or a Remote Plug-in Locally in ModelArts VS Code?
How Do I Use Multiple Ascend Cards for Debugging on a ModelArts Notebook Instance?
Why Are the Training Speeds Similar When Different Resource Flavors Are Used for Training on ModelArts Notebook Instances?
How Do I Perform Incremental Training When Using MoXing on a ModelArts Notebook Instance?
How Do I View the GPU Usage on a ModelArts Notebook Instance?
How Can I Print the GPU Usage in Code on a ModelArts Notebook Instance?
What Are the Relationships Among JupyterLab Directories, Terminal Files, and OBS Files on ModelArts Notebook Instances?
How Do I Use ModelArts Datasets on a ModelArts Notebook Instance?
pip and Common Commands
What Are the Sizes of the /cache Directories for Resources with Varying Specifications on ModelArts Notebook Instances?
What Is the Impact of Resource Overcommitment on ModelArts Notebook Instances?
How Do I Install External Libraries in a Notebook Instance?
How Do I Handle Unstable Internet Access Speed in ModelArts Notebook?
Can I Use GDB in a Notebook Instance?
ModelArts Standard Model Training
What Should I Do If the Model Trained in ModelArts Is Underfitting?
How Do I Obtain a Trained Model in ModelArts?
How Do I Obtain RANK_TABLE_FILE for Distributed Training in ModelArts?
How Do I Configure Input and Output Data for Model Training in ModelArts?
How Do I Improve Training Efficiency While Reducing Interaction with OBS in ModelArts?
How Do I Define Path Variables When Using MoXing to Copy Data in ModelArts?
How Do I Create a Training Job That References a Third-Party Dependency Package in ModelArts?
How Do I Install C++ Dependent Libraries During ModelArts Training?
How Do I Check Whether a Folder Copy Is Complete During Job Training in ModelArts?
How Do I Load Some Well Trained Parameters During Job Training in ModelArts?
What Should I Do If I Cannot Access the Folder Using os.system ('cd xxx') During Training in ModelArts?
How Do I Obtain the Dependency File Path from Training Code in ModelArts?
How Do I Obtain the Actual File Path in a Training Container in ModelArts?
What Are the Sizes of the /cache Directories for Resources with Varying Specifications in Training Jobs in ModelArts?
Why Do Training Jobs Have Two Hyperparameter Directories /work and /ma-user in ModelArts?
How Do I View the Resource Usage of a Training Job in ModelArts?
How Do I Download a Well Trained Model in ModelArts or Migrate It to Another Account?
What Should I Do If RuntimeError: Socket Timeout Is Displayed During Distributed Process Group Initialization using torchrun?
What Should I Do If an Error Is Reported Indicating that the .so File in the $ANACONDA_DIR/envs/$DEFAULT_CONDA_ENV_NAME/lib Directory Cannot Be Found During Training?
ModelArts Standard Inference Deployment
How Do I Import a Keras .h5 Model to ModelArts?
How Do I Edit the Installation Package Dependency Parameters in the Model Configuration File When Importing a Model to ModelArts?
How Do I Change the Default Port When I Create a Real-Time Service Using a Custom Image in ModelArts?
Does ModelArts Support Multi-Model Import?
What Are the Restrictions on the Image Size for Importing AI Applications to ModelArts?
What Are the Differences Between Real-Time Services and Batch Services in ModelArts?
Why Can't I Select Ascend Snt3 Resources When Deploying Models in ModelArts?
Can I Locally Deploy Models Trained on ModelArts?
What Is the Maximum Size of a ModelArts Real-Time Service Prediction Request Body?
How Do I Prevent Python Dependency Package Conflicts in a Custom Prediction Script When Deploying a Real-Time Service in ModelArts?
How Do I Speed Up Real-Time Service Prediction in ModelArts?
Can a New-Version AI Application Still Use the Original API in ModelArts?
What Is the Format of a Real-Time Service API in ModelArts?
How Do I Fill in the Request Header and Request Body When a ModelArts Real-Time Service Is Running?
ModelArts Standard Images
How Do I Use the Image Customized by a User Under a Different Tenant Account to Create a Notebook Instance?
How Do I Log In to SWR and Upload Images to It?
How Do I Configure Environment Variables for an Image in a Dockerfile?
How Do I Start a Container Using a Docker Image?
How Do I Configure a Conda Source on a ModelArts Notebook Instance?
What Are the Software Version Requirements for a Custom Image?
Why Is an Image Reported as Larger Than 35 GB When I'm Saving It But Its Size Is Displayed as 13 GB in SWR?
How Do I Prevent the Save Failure of a Custom Image Larger Than 35 GB?
How Do I Reduce the Size of the Target Image Created on the Local PC or ECS?
Will an Oversized Image Become Smaller If I Uninstall and Reinstall Its Packages?
What Do I Do If Error "ModelArts.6787" Is Reported When I Register an Image in ModelArts?
How Do I Set the Default Kernel?
ModelArts Standard Dedicated Resource Pools
Can I Use ECSs to Create a Dedicated Resource Pool for ModelArts?
Can I Deploy Multiple Services on One Dedicated Resource Pool Node in ModelArts?
What Are the Differences Between Public Resource Pools and Dedicated Resource Pools in ModelArts?
Why Does a Job in ModelArts Stay in the Pending State?
Why Can I View the Deleted Dedicated Resource Pools That Failed to Be Created on the ModelArts Console?
How Do I Add a VPC Peering Connection Between a Dedicated Resource Pool and an SFS in ModelArts?
ModelArts Studio (MaaS)
How Long Does It Take for an API Key to Become Valid After It Is Created in MaaS?
Can I Use a MaaS API Key Across Regions?
What Are the Format Requirements for Configuring the Model Service API URL in MaaS?
How Do I Obtain the Model Name in MaaS?
API/SDK
Can ModelArts APIs or SDKs Be Used to Download Models to a Local PC?
Does ModelArts Use the OBS API to Access OBS Files over an Intranet or the Internet?
History
How Do I Upload Data to OBS?
Which AI Frameworks Does ModelArts Support?
How Does ModelArts Use Tags to Manage Resources by Group?
How Do I View ModelArts Expenditure Details?
What Do I Do If the VS Code Window Is Not Displayed?
What Do I Do If a Remote Connection Failed After VS Code Is Opened?
What Do I Do If Error Message "Could not establish connection to xxx" Is Displayed During a Remote Connection?
What Do I Do If Error Message "Bad owner or permissions on C:\Users\Administrator/.ssh/config" or "Connection permission denied (publickey)" Is Displayed?
What Do I Do If Error Message "ssh: connect to host xxx.pem port xxxxx: Connection refused" Is Displayed?
What Do I Do If Error Message "no such identity: C:/Users/xx /test.pem: No such file or directory" Is Displayed?
What Are the Precautions for Switching Training Jobs from the Old Version to the New Version?
Troubleshooting
General Issues
OBS Errors on ModelArts
ModelArts.7211: Restricted Account
DevEnviron
Environment Configuration Faults
Disk Space Used Up
An Error Is Reported When Conda Is Used to Install Keras 2.3.1 in Notebook
Error "HTTP error 404 while getting xxx" Is Reported During Dependency Installation in a Notebook
The numba Library Has Been Installed in a Notebook Instance and Error "import numba ModuleNotFoundError: No module named 'numba'" Is Reported
Failed to Save Files in JupyterLab
"Server Connection Error" Is Displayed After the Kernelgateway Process Is Stopped
SSH Access Is Occasionally Denied, and the Error Message "Not allowed at this time" Is Displayed
Instance Faults
Failed to Create a Notebook Instance and JupyterProcessKilled Is Displayed in Events
Failed to Access a Notebook Instance
An Error Is Displayed Indicating No Space Left After the pip install Command Is Executed
Code Can Be Run But Cannot Be Saved and Error Message "save error" Is Displayed
A Request Timeout Error Is Reported When the Open Button of a Notebook Instance Is Clicked
ModelArts.6333 Error Occurs
What Can I Do If a Message Is Displayed Indicating that the Token Does Not Exist or Is Lost When I Open a Notebook Instance?
Code Running Failures
An Error Occurs When You Run Code on a Notebook Instance Because No File Is Found in /tmp
Notebook Instance Failed to Run Code
"dead kernel" Is Displayed and the Instance Breaks Down When Training Code Is Run
cudaCheckError Occurs During Training
What Do I Do If Insufficient Space Is Displayed in DevEnviron?
Notebook Instance Breaks Down When opencv.imshow Is Used
Path of a Text File Generated in the Windows OS Cannot Be Found on a Notebook Instance
What Do I Do If No Kernel Is Displayed After a Notebook File Is Created?
JupyterLab Plug-in Faults
Invalid Git Plug-in Password
Failures to Access the Development Environment Through VS Code
VS Code Window Is Not Displayed
Remote Connection Failed After VS Code Is Opened
Failed to Connect to the Development Environment Via VS Code
Error Message "Could not establish connection to xxx" Is Displayed During a Remote Connection
Connection to a Remote Development Environment Remains in the "Setting up SSH Host xxx: Downloading VS Code Server locally" State for More Than 10 Minutes
What Do I Do If the Connection to a Remote Development Environment Remains in the State of "Setting up SSH Host xxx: Copying VS Code Server to host with scp" for More Than 10 Minutes?
Connection to a Remote Development Environment Remains in the State of "ModelArts Remote Connect: Connecting to instance xxx..." for More Than 10 Minutes
Remote Connection Is in the Retry State
Error Message "The VS Code Server failed to start" Is Displayed
Error Message "Permissions for 'x:/xxx.pem' are too open" Is Displayed
Error Message "Bad owner or permissions on C:\Users\Administrator/.ssh/config" Is Displayed
Error Message "Connection permission denied (publickey)" Is Displayed
What Do I Do If Error Message "ssh: connect to host xxx.pem port xxxxx: Connection refused" Is Displayed?
What Do I Do If Error Message "ssh: connect to host ModelArts-xxx port xxx: Connection timed out" Is Displayed?
Error Message "Load key "C:/Users/xx/test1/xxx.pem": invalid format" Is Displayed
Error Message "An SSH installation couldn't be found" or "Could not establish connection to instance xxx: 'ssh' ..." Is Displayed
Error Message "no such identity: C:/Users/xx /test.pem: No such file or directory" Is Displayed
Error Message "Host key verification failed" or "Port forwarding is disabled" Is Displayed
Error Message "Failed to install the VS Code Server" or "tar: Error is not recoverable: exiting now" Is Displayed
Error Message "XHR failed" Is Displayed During VS Code's Connection to a Remote Notebook Instance
VS Code Connection Automatically Disconnected If No Operation Is Performed for a Long Time
Remote Connection Takes a Long Time After VS Code Is Automatically Upgraded
Error Message "Connection reset" Is Displayed During an SSH Connection
Notebook Instance Is Frequently Disconnected or Stuck After It Is Connected with MobaXterm Using SSH
Error Message "Missing GLIBC, Missing required dependencies" Is Displayed When VS Code Is Used to Connect to a Development Environment
Error Message Is Displayed Indicating That ms-vscode-remote.remot-sdh Is Uninstalled Due to a Reported Issue When VSCode-huawei Is Used
Instance Directory in VS Code Does Not Match That on the Cloud When VS Code Is Used to Connect to an Instance
Custom Image Faults
Faults of Custom Images on Notebook Instances
What If the Error Message "there are processes in 'D' status, please check process status using'ps -aux' and kill all the 'D' status processes" or "Buildimge,False,Error response from daemon,Cannot pause container xxx" Is Displayed When I Save an Image?
What Do I Do If Error "container size %dG is greater than threshold %dG" Is Displayed When I Save an Image?
What Do I Do If Error "too many layers in your image" Is Displayed When I Save an Image?
What Do I Do If Error "The container size (xG) is greater than the threshold (25G)" Is Reported When I Save an Image?
Error Message "BuildImage,True,Commit successfully|PushImage,False,Task is running." Is Displayed When an Image Is Saved
No Kernel Is Displayed After a Notebook Instance Created Using a Custom Image Is Started
Some Extra Packages Are Found in the Conda Environment Built Using a Custom Image
Failed to Create a Custom Image Using ma-cli and an Error Is Displayed Indicating that the File Does Not Exist
Error Message "Unexpected error from cudaGetDeviceCount" Is Displayed When Torch Is Used
Unable to Access a Notebook Instance Created Using an Old Image
Other Faults
Failed to Open the checkpoints Folder in Notebook
Failed to Use a Purchased Dedicated Resource Pool to Create New-Version Notebook Instances
Error Message "Permission denied" Is Displayed When the tensorboard Command Is Used to Open a Log File on a Notebook Instance
Training Jobs
OBS Operation Issues
Failed to Read Files
Error Message Is Displayed Repeatedly When a TensorFlow-1.8 Job Is Connected to OBS
TensorFlow Stops Writing TensorBoard to OBS When the Size of Written Data Reaches 5 GB
Error "Unable to connect to endpoint" Error Occurs When a Model Is Saved
Error Message "BrokenPipeError: Broken pipe" Is Displayed When OBS Data Is Copied
Error Message "ValueError: Invalid endpoint: obs.xxxx.com" Is Displayed in Logs
Error Message "errorMessage:The specified key does not exist" Displayed in Logs
In-Cloud Migration Adaptation Issues
Failed to Import a Module
Error Message "No module named .*" Is Displayed in Training Job Logs
Failed to Install a Third-Party Package
Failed to Download the Code Directory
Error Message "No such file or directory" Is Printed in Training Job Logs
Failed to Find the .so File During Training
ModelArts Training Job Failed to Parse Parameters and an Error Is Displayed in the Log
Training Output Path Is Used by Another Job
Error Message "RuntimeError: std:exception" Is Displayed for a PyTorch 1.0 Engine
Error Message "retCode=0x91, [the model stream execute failed]" Displayed in MindSpore Logs
Error Occurred When Pandas Reads Data from an OBS File If MoXing Is Used to Adapt to an OBS Path
Error Message "Please upgrade numpy to >= xxx to use this pandas version" Is Displayed in Logs
Reinstalled CUDA Version Does Not Match the One in the Target Image
Error ModelArts.2763 Occurred During Training Job Creation
Error Message "AttributeError: module '***' has no attribute '***'" Is Displayed Training Job Logs
System Container Exits Unexpectedly
Hard Faults Due to Space Limit
Downloading Files Timed Out or No Space Left for Reading Data
Insufficient Container Space for Copying Data
Error Message "No space left" Displayed When a TensorFlow Multi-node Job Downloads Data to /cache
Size of the Log File Has Reached the Limit
Error Message "write line error" Is Displayed in Logs
Error Message "No space left on device" Is Displayed in Logs
Training Job Failed Due to OOM
Insufficient Disk Space
Internet Access Issues
Error Message "Network is unreachable" Is Displayed in Logs
URL Connection Timed Out in a Running Training Job
Permission Issues
Error "stat:403 reason:Forbidden" Is Displayed in Logs When a Training Job Accesses OBS
Error Message "Permission denied" Is Displayed in Logs
GP Issues
Error Message "No CUDA-capable device is detected" Is Displayed in Logs
Error Message "RuntimeError: connect() timed out" Is Displayed in Logs
Error Message "cuda runtime error (10) : invalid device ordinal at xxx" Is Displayed in Logs
Error Message "RuntimeError: Cannot re-initialize CUDA in forked subprocess" Is Displayed in Logs
No GP Detected in a Training Job
Service Code Issues
Error Message "pandas.errors.ParserError: Error tokenizing data. C error: Expected .* fields" Is Displayed in Logs
Error Message "max_pool2d_with_indices_out_cuda_frame failed with error code 0" Is Displayed in Logs
Training Job Failed with Error Code 139
Debugging Training Code in a Development Environment
Error Message "'(slice(0, 13184, None), slice(None, None, None))' is an invalid key" Is Displayed in Logs
Error Message "DataFrame.dtypes for data must be int, float or bool" Is Displayed in Logs
Error Message "CUDNN_STATUS_NOT_SUPPORTED" Is Displayed in Logs
Error Message "Out of bounds nanosecond timestamp" Is Displayed in Logs
Error Message "Unexpected keyword argument passed to optimizer" Is Displayed in Logs
Error Message "no socket interface found" Is Displayed in Logs
Error Message "Runtimeerror: Dataloader worker (pid 46212) is killed by signal: Killed BP" Displayed in Logs
Error Message "AttributeError: 'NoneType' object has no attribute 'dtype'" Displayed in Logs
Error Message "No module name 'unidecode'" Is Displayed in Logs
Distributed TensorFlow Cannot Use tf.variable
When MXNet Creates kvstore, the Program Is Blocked and No Error Is Reported
ECC Error Occurs in the Log, Causing Training Job Failure
Training Job Failed Because the Maximum Recursion Depth Is Exceeded
Training Using a Built-in Algorithm Failed Due to a bndbox Error
Training Job Status Is Reviewing Job Initialization
Training Job Process Exits Unexpectedly
Stopped Training Job Process
Training Job Suspensions
Locating Training Job Suspension
Data Replication Suspension
Suspension Before Training
Suspension During Training
Suspension in the Last Training Epoch
Running a Training Job Failed
Troubleshooting a Training Job Failure
An NCCL Error Occurs When a Training Job Fails to Be Executed
Troubleshooting Process
A Training Job Created Using a Custom Image Is Always in the Running State
Failed to Find the Boot File When a Training Job Is Created Using a Custom Image
Running a Job Failed Due to Persistently Rising Memory Usage
Training Jobs Created in a Dedicated Resource Pool
No Cloud Storage Name or Mount Path Displayed on the Page for Creating a Training Job
Storage Volume Failed to Be Mounted to the Pod During Training Job Creation
Training Performance Issues
Training Performance Deteriorated
Inference Deployment
Model Management
Failed to Create a Model
Suspended Account or Insufficient Permission to Import Models
Failed to Build an Image or Import a File During Model Creation
Failed to Obtain the Directory Structure in the Target Image When Creating a Model Through OBS
Failed to Obtain Certain Logs on the ModelArts Log Query Page
Failed to Download a pip Package When a Model Is Created Using OBS
Failed to Use a Custom Image to Create a Model
Insufficient Disk Space Is Displayed When a Service Is Deployed After a Model Is Imported
Error Occurred When a Created Model Is Deployed as a Service
Invalid Runtime Dependency Configured in an Imported Custom Image
Garbled Characters Displayed in a Model Name Returned When Model Details Are Obtained Through an API
Failed to Import a Model Due to Oversized Model or Image
A Single Model File to Be Imported Exceeds the Size Limit (5 GB)
Creating a Model Failed Due to Image Building Timeout
Service Deployment
Error Occurred When a Custom Image Model Is Deployed as a Real-Time Service
Alarm Status of a Deployed Real-Time Service
Failed to Start a Service
Failed to Pull an Image When a Service Is Deployed, Started, Upgraded, or Modified
Image Restarts Repeatedly When a Service Is Deployed, Started, Upgraded, or Modified
Container Health Check Fails When a Service Is Deployed, Started, Upgraded, or Modified
Resources Are Insufficient When a Service Is Deployed, Started, Upgraded, or Modified
Error Occurred When a CV2 Model Package Is Used to Deploy a Real-Time Service
Service Is Consistently Being Deployed
A Started Service Is Intermittently in the Alarm State
Failed to Deploy a Service and Error "No Module named XXX" Occurred
Insufficient Permission to or Unavailable Input/Output OBS Path of a Batch Service
Error "No CUDA runtime is found" Occurred When a Real-Time Service Is Deployed
What Can I Do if the Memory Is Insufficient?
ModelArts.3520 The Number of Real-Time Services Cannot Exceed 11
"pod has unbound immediate PersistentVolumeClaims" Is Displayed During Service Deployment
Service Prediction
Service Prediction Failed
Error "APIG.XXXX" Occurred in a Prediction Failure
Error ModelArts.4206 Occurred in Real-Time Service Prediction
Error ModelArts.4302 Occurred in Real-Time Service Prediction
Error ModelArts.4503 Occurred in Real-Time Service Prediction
Error MR.0105 Occurred in Real-Time Service Prediction
Method Not Allowed
Request Timed Out
Error Occurred When an API Is Called for Deploying a Model Created Using a Custom Image
Error "DL.0105" Occurred During Real-Time Inference
MoXing
Error Occurs When MoXing Is Used to Copy Data
How Do I Disable the Warmup Function of the Mox?
Pytorch Mox Logs Are Repeatedly Generated
Failed to Perform Local Fine Tuning on the Checkpoint Generated by moxing.tensorflow
Copying Data Using MoXing Is Slow and the Log Is Repeatedly Printed in a Training Job
Failed to Access a Folder Using MoXing and Read the Folder Size Using get_size
APIs or SDKs
"ERROR: Could not install packages due to an OSError" Occurred During ModelArts SDK Installation
Error Occurred During Service Deployment After the Target Path to a File Downloaded Through a ModelArts SDK Is Set to a File Name
A Training Job Created Using an API Is Abnormal
Execution of a huaweicloud.com API Times Out
Resource Pool
Failed to Create a Resource Pool
Faulty Nodes in a Standard Resource Pool
Lite Cluster
Failed to Create a Resource Pool
How Do I Locate and Rectify a Node Fault in a Cluster Resource Pool?
All Privilege Pool Data Is Displayed as 0%
A Reset Node Cannot Be Used
How Do I Automatically Restore Services When Cluster Node Faults Occur?
Videos
More Documents
Preparations (To Be Offline)
Creating a Huawei ID and Enabling Huawei Cloud Services
Logging In to the ModelArts Management Console
Configuring Access Authorization (Global Configuration)
Creating an OBS Bucket
Enabling ModelArts Resources
ModelArts Resources
Pay-Per-Use
DevEnviron
Introduction to DevEnviron
Application Scenarios
Managing Notebook Instances
Creating a Notebook Instance
Accessing a Notebook Instance
Searching for, Starting, Stopping, or Deleting a Notebook Instance
Changing a Notebook Instance Image
Changing the Flavor of a Notebook Instance
Selecting Storage in DevEnviron
Dynamically Mounting an OBS Parallel File System
Dynamically Expanding EVS Disk Capacity
Modifying the SSH Configuration for a Notebook Instance
Viewing the Notebook Instances of All IAM Users Under One Tenant Account
Viewing Notebook Events
Notebook Cache Directory Alarm Reporting
JupyterLab
Operation Process in JupyterLab
JupyterLab Overview and Common Operations
Code Parametrization Plug-in
Using ModelArts SDK
Using the Git Plug-in
Visualized Model Training
Introduction to Training Job Visualization
MindInsight Visualization Jobs
TensorBoard Visualization Jobs
Uploading and Downloading Data in Notebook
Uploading Files to JupyterLab
Scenarios
Uploading Files from a Local Path to JupyterLab
Upload Scenarios and Entries
Uploading a Local File Less Than 100 MB to JupyterLab
Uploading a Local File with a Size Ranging from 100 MB to 5 GB to JupyterLab
Uploading a Local File Larger Than 5 GB to JupyterLab
Cloning an Open-Source Repository in GitHub
Uploading OBS Files to JupyterLab
Uploading Remote Files to JupyterLab
Downloading a File from JupyterLab to a Local Path
Local IDE
Operation Process in a Local IDE
Local IDE (PyCharm)
Connecting to a Notebook Instance Through PyCharm Toolkit
PyCharm Toolkit
Downloading and Installing PyCharm Toolkit
Connecting to a Notebook Instance Through PyCharm Toolkit
Manually Connecting to a Notebook Instance Through PyCharm
Submitting a Training Job Using PyCharm Toolkit
Submitting a Training Job (New Version)
Stopping a Training Job
Viewing Training Logs
Uploading Data to a Notebook Instance Using PyCharm
Local IDE (VS Code)
Connecting to a Notebook Instance Through VS Code
Installing VS Code
Connecting to a Notebook Instance Through VS Code Toolkit
Manually Connecting to a Notebook Instance Through VS Code
Remotely Debugging in VS Code
Uploading and Downloading Files in VS Code
Local IDE (Accessed Using SSH)
ModelArts CLI Command Reference
ModelArts CLI Overview
(Optional) Installing ma-cli Locally
Autocompletion for ma-cli Commands
ma-cli Authentication
ma-cli Image Building Command
ma-cli Image Building Command
Obtaining an Image Creation Template
Loading an Image Creation Template
Obtaining Registered ModelArts Images
Creating an Image in ModelArts Notebook
Obtaining Image Creation Caches in ModelArts Notebook
Clearing Image Creation Caches in ModelArts Notebook
Registering SWR Images with ModelArts Image Management
Deregistering a Registered Image from ModelArts Image Management
Debugging an SWR Image on an ECS
Using the ma-cli ma-job Command to Submit a ModelArts Training Job
ma-cli ma-job Command Overview
Obtaining ModelArts Training Jobs
Submitting a ModelArts Training Job
Obtaining ModelArts Training Job Logs
Obtaining ModelArts Training Job Events
Obtaining ModelArts AI Engines for Training
Obtaining ModelArts Resource Specifications for Training
Stopping a ModelArts Training Job
Using the ma-cli dli-job Command to Submit a DLI Spark Job
Overview
Querying DLI Spark Jobs
Submitting a DLI Spark Job
Querying DLI Spark Run Logs
Querying DLI Queues
Obtaining DLI Group Resources
Uploading Local Files or OBS Files to a DLI Group
Stopping a DLI Spark Job
Using ma-cli to Copy OBS Data
Model Development (To Be Offline)
Introduction to Model Development
Preparing Data
Preparing Algorithms
Introduction to Algorithm Preparation
Using a Preset Image (Custom Script)
Overview
Developing a Custom Script
Creating an Algorithm
Using Custom Images
Viewing Algorithm Details
Searching for an Algorithm
Deleting an Algorithm
Performing a Training
Creating a Training Job
Viewing Training Job Details
Viewing Training Job Events
Training Job Logs
Introduction to Training Job Logs
Common Logs
Viewing Training Job Logs
Locating Faults by Analyzing Training Logs
Cloud Shell
Logging In to a Training Container Using Cloud Shell
Keeping a Training Job Running
Preventing Cloud Shell Session from Disconnection
Viewing the Resource Usage of a Training Job
Evaluation Results
Viewing Training Tags
Viewing Fault Recovery Details
Viewing Environment Variables of a Training Container
Stopping, Rebuilding, or Searching for a Training Job
Releasing Training Job Resources
Advanced Training Operations
Automatic Recovery from a Training Fault
Training Fault Tolerance Check
Unconditional Auto Restart
Resumable Training and Incremental Training
Detecting Training Job Suspension
Priority of a Training Job
Permission to Set the Highest Job Priority
Distributed Training
Distributed Training Functions
Single-Node Multi-Card Training Using DataParallel
Multi-Node Multi-Card Training Using DistributedDataParallel
Distributed Debugging Adaptation and Code Example
Sample Code of Distributed Training
Example of Starting PyTorch DDP Training Based on a Training Job
Automatic Model Tuning (AutoSearch)
Introduction to Hyperparameter Search
Search Algorithm
Bayesian Optimization (SMAC)
TPE Algorithm
Simulated Annealing Algorithm
Creating a Hyperparameter Search Job
Image Management
Image Management
Using a Preset Image
Unified Mirroring
Images Preset in Notebook
Notebook Base Images
Notebook Base Image List
PyTorch (x86)-powered Notebook Base Image
Tensorflow (x86)-powered Notebook Base Image
MindSpore (x86)-powered Notebook Base Image
Custom Dedicated Image (x86)-powered Notebook Base Image
Training Base Image
Available Training Base Images
Training Base Image (PyTorch)
Training Base Image (TensorFlow)
Training Base Image (Horovod)
Training Base Image (MPI)
Starting Training with a Preset Image
PyTorch
TensorFlow
Horovod/MPI/MindSpore-GPU
Inference Base Images
Available Inference Base Images
TensorFlow (CPU/GPU)-powered Inference Base Images
PyTorch (CPU/GPU)-powered Inference Base Images
MindSpore (CPU/GPU)-powered Inference Base Images
Using Custom Images in Notebook Instances
Constraints on Custom Images in Notebook Instances
Registering an Image in ModelArts
Creating a Custom Image
Saving a Notebook Instance as a Custom Image
Saving a Notebook Environment Image
Using a Custom Image to Create a Notebook Instance
Creating and Using a Custom Image in Notebook
Application Scenarios and Process
Step 1 Creating a Custom Image
Step 2 Registering a New Image
Step 3 Using a New Image to Create a Development Environment
Creating a Custom Image on an ECS and Using It in Notebook
Application Scenarios and Process
Step 1 Preparing a Docker Server and Configuring an Environment
Step 2 Creating a Custom Image
Step 3 Registering a New Image
Step 5 Creating and Starting a Development Environment
Troubleshooting for Custom Images in Notebook Instances
Using a Custom Image to Train Models (Model Training)
Overview
Example: Creating a Custom Image for Training
Example: Creating a Custom Image for Training (PyTorch + CPU/GPU)
Example: Creating a Custom Image for Training (MPI + CPU/GPU)
Example: Creating a Custom Image for Training (Horovod-PyTorch and GPUs)
Example: Creating a Custom Image for Training (MindSpore and GPUs)
Example: Creating a Custom Image for Training (TensorFlow and GPUs)
Preparing a Training Image
Specifications for Custom Images for Training Jobs
Migrating an Image to ModelArts Training
Using a Base Image to Create a Training Image
Installing MLNX_OFED in a Container Image
Creating an Algorithm Using a Custom Image
Using a Custom Image to Create a CPU- or GPU-based Training Job
Troubleshooting Process
Using a Custom Image to Create AI applications for Inference Deployment
Custom Image Specifications for Creating AI Applications
Creating a Custom Image and Using It to Create an AI Application
FAQs
How Can I Log In to SWR and Upload Images to It?
How Do I Configure Environment Variables for an Image?
How Do I Use Docker to Start an Image Saved Using a Notebook Instance?
How Do I Configure a Conda Source in a Notebook Development Environment?
What Are Supported Software Versions for a Custom Image?
Why Does an Error Occur When I Try to Save an Image That Is Reported as Larger Than 35 GB, Even Though It Is Only Displayed as 13 GB in SWR?
How Do I Ensure That an Image Can be Saved Correctly Without Being Too Large?
How Do I Reduce the Size of an Image Created Locally or on ECS?
Will an Image Be Smaller If I Uninstall and Repackage It or Simply Delete Existing Datasets from the Image?
What Do I Do If Error "ModelArts.6787" Is Reported When I Register an Image on ModelArts?
Modification History
Model Inference (To Be Offline)
Introduction to Inference
Managing AI Applications
Introduction to AI Application Management
Creating an AI Application
Importing a Meta Model from a Training Job
Importing a Meta Model from a Template
Importing a Meta Model from OBS
Importing a Meta Model from a Container Image
Viewing the AI Application List
Viewing Details About an AI Application
Managing AI Application Versions
Viewing Events of an AI Application
Deploying an AI Application as a Service
Deploying AI Applications as Real-Time Services
Deploying as a Real-Time Service
Viewing Service Details
Testing the Deployed Service
Accessing Real-Time Services
Accessing a Real-Time Service
Authentication Mode
Access Authenticated Using a Token
Access Authenticated Using an AK/SK
Access Authenticated Using an Application
Access Mode
Accessing a Real-Time Service (Public Network Channel)
Accessing a Real-Time Service (VPC High-Speed Channel)
Accessing a Real-Time Service Through WebSocket
Server-Sent Events
Integrating a Real-Time Service
Cloud Shell
Deploying AI Applications as Batch Services
Deploying as a Batch Service
Viewing Details About a Batch Service
Viewing the Batch Service Prediction Result
Upgrading a Service
Starting, Stopping, Deleting, or Restarting a Service
Viewing Service Events
Inference Specifications
Model Package Specifications
Introduction to Model Package Specifications
Specifications for Editing a Model Configuration File
Specifications for Writing Model Inference Code
Model Templates
Introduction to Model Templates
Templates
TensorFlow-based Image Classification Template
TensorFlow-py27 General Template
TensorFlow-py36 General Template
MXNet-py27 General Template
MXNet-py36 General Template
PyTorch-py27 General Template
PyTorch-py36 General Template
Caffe-CPU-py27 General Template
Caffe-GPU-py27 General Template
Caffe-CPU-py36 General Template
Caffe-GPU-py36 General Template
Arm-Ascend Template
Input and Output Modes
Built-in Object Detection Mode
Built-in Image Processing Mode
Built-in Predictive Analytics Mode
Undefined Mode
Examples of Custom Scripts
TensorFlow
TensorFlow 2.1
PyTorch
Caffe
XGBoost
PySpark
Scikit-learn
ModelArts Monitoring on Cloud Eye
ModelArts Metrics
Setting Alarm Rules
Viewing Monitoring Metrics
Resource Management
Resource Pool
Elastic Cluster
Comprehensive Upgrades to ModelArts Resource Pool Management Functions
Creating a Resource Pool
Viewing Details About a Resource Pool
Resizing a Resource Pool
Setting a Renewal Policy
Modifying the Expiration Policy
Migrating the Workspace
Changing Job Types Supported by a Resource Pool
Upgrading a Resource Pool Driver
Deleting a Resource Pool
Abnormal Status of a Dedicated Resource Pool
ModelArts Network
ModelArts Nodes
Audit Logs
Key Operations Recorded by CTS
Viewing Audit Logs
Monitoring Resources
Overview
Using Grafana to View AOM Monitoring Metrics
Procedure
Installing and Configuring Grafana
Installing and Configuring Grafana on Windows
Installing and Configuring Grafana on Linux
Installing and Configuring Grafana on a Notebook Instance
Configuring a Grafana Data Source
Using Grafana to Configure Dashboards and View Metric Data
Viewing All ModelArts Monitoring Metrics on the AOM Console
Data Preparation and Analytics
Introduction to Data Preparation
Getting Started
Creating a Dataset
Dataset Overview
Creating a Dataset
Modifying a Dataset
Importing Data
Introduction to Data Importing
Importing Data from OBS
Introduction to Importing Data from OBS
Importing Data from an OBS Path
Specifications for Importing Data from an OBS Directory
Importing a Manifest File
Specifications for Importing a Manifest File
Importing Data from DLI
Importing Data from MRS
Importing Data from DWS
Importing Data from Local Files
Data Analysis and Preview
Auto Grouping
Data Filtering
Data Feature Analysis
Labeling Data
Publishing Data
Introduction to Data Publishing
Publishing a Data Version
Managing Data Versions
Exporting Data
Introduction to Exporting Data
Exporting Data to a New Dataset
Exporting Data to OBS
Data Labeling (To Be Offline)
Introduction to Data Labeling
Manual Labeling
Creating a Labeling Job
Image Labeling
Image Classification
Object Detection
Image Segmentation
Text Labeling
Text Classification
Named Entity Recognition
Text Triplet
Audio Labeling
Sound Classification
Speech Labeling
Speech Paragraph Labeling
Video Labeling
Viewing Labeling Jobs
Viewing My Created Labeling Jobs
Viewing My Participated Labeling Jobs
Auto Labeling
Creating an Auto Labeling Job
Confirming Hard Examples
Team Labeling
Team Labeling Overview
Creating and Managing Teams
Managing Teams
Managing Team Members
Creating a Team Labeling Job
Logging In to ModelArts
Starting a Team Labeling Job
Reviewing Team Labeling Results
Accepting Team Labeling Results