- What's New
- Function Overview
- Service Overview
- Preparations
-
DevEnviron
- Introduction to DevEnviron
- Application Scenarios
-
Managing Notebook Instances
- Creating a Notebook Instance
- Accessing a Notebook Instance
- Searching for, Starting, Stopping, or Deleting a Notebook Instance
- Selecting Storage in DevEnviron
- Changing a Notebook Instance Image
- Dynamically Expanding EVS Disk Capacity
- Changing the Flavor of a Notebook Instance
- Modifying the SSH Configuration for a Notebook Instance
- Viewing the Notebook Instances of All IAM Users Under One Tenant Account
-
JupyterLab
- Operation Process in JupyterLab
- JupyterLab Overview and Common Operations
- Code Parametrization Plug-in
- Using ModelArts SDK
- Using the Git Plug-in
- Uploading and Downloading Data in Notebook
-
Local IDE
- Operation Process in a Local IDE
- Local IDE (PyCharm)
-
Local IDE (VS Code)
- Connecting to a Notebook Instance Through VS Code
- Installing VS Code
- Connecting to a Notebook Instance Through VS Code with One Click
- Connecting to a Notebook Instance Through VS Code Toolkit
- Manually Connecting to a Notebook Instance Through VS Code
- Remotely Debugging in VS Code
- Uploading and Downloading Files in VS Code
- Local IDE (Accessed Using SSH)
-
ModelArts CLI Command Reference
- ModelArts CLI Overview
- (Optional) Installing ma-cli Locally
- Autocompletion for ma-cli Commands
- ma-cli Authentication
-
ma-cli Image Building Command
- ma-cli Image Building Command
- Obtaining an Image Creation Template
- Loading an Image Creation Template
- Obtaining Registered ModelArts Images
- Creating an Image in ModelArts Notebook
- Obtaining Image Creation Caches in ModelArts Notebook
- Clearing Image Creation Caches in ModelArts Notebook
- Registering SWR Images with ModelArts Image Management
- Deregistering a Registered Image from ModelArts Image Management
- Debugging an SWR Image on an ECS
-
Using the ma-cli ma-job Command to Submit a ModelArts Training Job
- ma-cli ma-job Command Overview
- Obtaining ModelArts Training Jobs
- Submitting a ModelArts Training Job
- Obtaining ModelArts Training Job Logs
- Obtaining ModelArts Training Job Events
- Obtaining ModelArts AI Engines for Training
- Obtaining ModelArts Resource Specifications for Training
- Stopping a ModelArts Training Job
- Using ma-cli to Copy OBS Data
-
Model Development
- Introduction to Model Development
- Preparing Data
- Preparing Algorithms
-
Performing a Training
- Creating a Training Job
- Reviewing Training Job Details
- Training Job Logs
- Viewing Training Job Events
- Viewing the Resource Usage of a Training Job
- Evaluation Results
- Viewing Environment Variables of a Training Container
- Stopping, Rebuilding, or Searching for a Training Job
- CloudShell
- Releasing Training Job Resources
- Training Experiment
- Advanced Training Operations
- Visualized Model Training
- Distributed Training
-
Model Inference
- Introduction to Inference
- Managing AI Applications
-
Deploying an AI Application as a Service
- Deploying AI Applications as Real-Time Services
- Deploying AI Applications as Batch Services
- Upgrading a Service
- Starting, Stopping, Deleting, or Restarting a Service
- Viewing Service Events
- Inference Specifications
- ModelArts Monitoring on Cloud Eye
-
Docker Containers with ModelArts
- Image Management
- Using Custom Images in Notebook Instances
- Using a Custom Image to Train Models (New-Version Training)
- Using a Custom Image to Create AI applications for Inference Deployment
- FAQs
-
Resource Management
- Resource Pool
-
Elastic Cluster
- Comprehensive Upgrades to ModelArts Resource Pool Management Functions
- Creating a Resource Pool
- Viewing Details About a Resource Pool
- Resizing a Resource Pool
- Migrating the Workspace
- Changing Job Types Supported by a Resource Pool
- Upgrading a Resource Pool Driver
- Deleting a Resource Pool
- Abnormal Status of a Dedicated Resource Pool
- ModelArts Network
- Monitoring Resources
-
SDK Reference
- Before You Start
- SDK Overview
- Getting Started
- (Optional) Installing the ModelArts SDK Locally
- Session Authentication
- OBS Management
- Data Management
-
Training Management
- Training Jobs
- APIs for Resources and Engine Specifications
- Model Management
- Service Management
-
API Reference
- Before You Start
- API Overview
- Calling APIs
-
DevEnviron Management
- Querying Notebook Instances
- Creating a Notebook Instance
- Querying Details of a Notebook Instance
- Updating a Notebook Instance
- Deleting a Notebook Instance
- Saving a Running Instance as a Container Image
- Obtaining the Available Flavors
- Querying Flavors Available for a Notebook Instance
- Querying the Available Duration of a Running Notebook Instance
- Prolonging a Notebook Instance
- Starting a Notebook Instance
- Stopping a Notebook Instance
- Obtaining the Notebook Instances with OBS Storage Mounted
- OBS Storage Mounting
- Obtaining Details About a Notebook Instance with OBS Storage Mounted
- Unmounting OBS Storage from a Notebook Instance
- Querying Supported Images
- Registering a Custom Image
- Obtaining User Image Groups
- Obtaining Details of an Image
- Deleting an Image
-
Training Management
- Creating an Algorithm
- Querying the Algorithm List
- Querying Algorithm Details
- Modifying an Algorithm
- Deleting an Algorithm
- Creating a Training Job
- Querying the Details About a Training Job
- Modifying the Description of a Training Job
- Deleting a Training Job
- Terminating a Training Job
- Querying the Logs of a Specified Task in a Given Training Job (Preview)
- Querying the Logs of a Specified Task in a Training Job (OBS Link)
- Querying the Running Metrics of a Specified Task in a Training Job
- Querying a Training Job List
- Obtaining the General Specifications Supported by a Training Job
- Obtaining the Preset AI Frameworks Supported by a Training Job
- AI Application Management
- Service Management
- Resource Management
- Authorization Management
- Use Cases
- Common Parameters
-
FAQs
-
General Issues
- What Is ModelArts?
- What Are the Relationships Between ModelArts and Other Services?
- What Are the Differences Between ModelArts and DLS?
- How Do I Obtain an Access Key?
- How Do I Upload Data to OBS?
- What Do I Do If the System Displays a Message Indicating that the AK/SK Pair Is Unavailable?
- How Do I Use ModelArts to Train Models Based on Structured Data?
- What Are Regions and AZs?
- How Do I Check Whether ModelArts and an OBS Bucket Are in the Same Region?
- How Do I View All Files Stored in OBS on ModelArts?
- Where Are Datasets of ModelArts Stored in a Container?
- What Are the Functions of ModelArts Training and Inference?
- Can AI-assisted Identification of ModelArts Identify a Specific Label?
- Why Is the Job Still Queued When Resources Are Sufficient?
-
Notebook (New Version)
- Constraints
-
Data Upload or Download
- How Do I Upload a File from a Notebook Instance to OBS or Download a File from OBS to a Notebook Instance?
- How Do I Upload Local Files to a Notebook Instance?
- How Do I Import Large Files to a Notebook Instance?
- Where Will the Data Be Uploaded to?
- How Do I Download Files from a Notebook Instance to a Local Computer?
- How Do I Copy Data from Development Environment Notebook A to Notebook B?
- Data Storage
-
Environment Configurations
- How Do I Check the CUDA Version Used by a Notebook Instance?
- How Do I Enable the Terminal Function in DevEnviron of ModelArts?
- How Do I Install External Libraries in a Notebook Instance?
- How Do I Obtain the External IP Address of My Local PC?
- How Can I Resolve Abnormal Font Display on a ModelArts Notebook Accessed from iOS?
- Is There a Proxy for Notebook? How Do I Disable It?
-
Notebook Instances
- What Do I Do If I Cannot Access My Notebook Instance?
- What Should I Do When the System Displays an Error Message Indicating that No Space Left After I Run the pip install Command?
- What Do I Do If "Read timed out" Is Displayed After I Run pip install?
- What Do I Do If the Code Can Be Run But Cannot Be Saved, and the Error Message "save error" Is Displayed?
-
Code Execution
- What Do I Do If a Notebook Instance Won't Run My Code?
- Why Does the Instance Break Down When dead kernel Is Displayed During Training Code Running?
- What Do I Do If cudaCheckError Occurs During Training?
- What Should I Do If DevEnviron Prompts Insufficient Space?
- Why Does the Notebook Instance Break Down When opencv.imshow Is Used?
- Why Cannot the Path of a Text File Generated in Windows OS Be Found In a Notebook Instance?
- What Do I Do If Files Fail to Be Saved in JupyterLab?
-
Failures to Access the Development Environment Through VS Code
- What Do I Do If the VS Code Window Is Not Displayed?
- What Do I Do If a Remote Connection Failed After VS Code Is Opened?
- What Do I Do If Error Message "Could not establish connection to xxx" Is Displayed During a Remote Connection?
- What Do I Do If the Connection to a Remote Development Environment Remains in "Setting up SSH Host xxx: Downloading VS Code Server locally" State for More Than 10 Minutes?
- What Do I Do If the Connection to a Remote Development Environment Remains in the State of "Setting up SSH Host xxx: Downloading VS Code Server locally" for More Than 10 Minutes?
- What Do I Do If the Connection to a Remote Development Environment Remains in the State of "ModelArts Remote Connect: Connecting to instance xxx..." for More Than 10 Minutes?
- What Do I Do If a Remote Connection Is in the Retry State?
- What Do I Do If Error Message "The VS Code Server failed to start" Is Displayed?
- What Do I Do If Error Message "Permissions for 'x:/xxx.pem' are too open" Is Displayed?
- What Do I Do If Error Message "Bad owner or permissions on C:\Users\Administrator/.ssh/config" or "Connection permission denied (publickey)" Is Displayed?
- What Do I Do If Error Message "ssh: connect to host xxx.pem port xxxxx: Connection refused" Is Displayed?
- What Do I Do If Error Message "ssh: connect to host ModelArts-xxx port xxx: Connection timed out" Is Displayed?
- What Do I Do If Error Message "Load key "C:/Users/xx/test1/xxx.pem": invalid format" Is Displayed?
- What Do I Do If Error Message "An SSH installation couldn't be found" or "Could not establish connection to instance xxx: 'ssh' ..." Is Displayed?
- What Do I Do If Error Message "no such identity: C:/Users/xx /test.pem: No such file or directory" Is Displayed?
- What Do I Do If Error Message "Host key verification failed" or "Port forwarding is disabled" Is Displayed?
- What Do I Do If Error Message "Failed to install the VS Code Server" or "tar: Error is not recoverable: exiting now" Is Displayed?
- What Do I Do If Error Message "XHR failed" Is Displayed When a Remote Notebook Instance Is Accessed Through VS Code?
- What Do I Do for an Automatically Disconnected VS Code Connection If No Operation Is Performed for a Long Time?
- What Do I Do If It Takes a Long Time to Set Up a Remote Connection After VS Code Is Automatically Upgraded?
- What Do I Do If Error Message "Connection reset" Is Displayed During an SSH Connection?
- What Can I Do If a Notebook Instance Is Frequently Disconnected or Stuck After I Use MobaXterm to Connect to the Notebook Instance in SSH Mode?
-
Others
- How Do I Use Multiple Ascend Cards for Debugging in a Notebook Instance?
- Why Is the Training Speed Similar When Different Notebook Flavors Are Used?
- How Do I Perform Incremental Training When Using MoXing?
- How Do I View GPU Usage on the Notebook?
- How Can I Obtain GPU Usage Through Code?
- Which Real-Time Performance Indicators of an Ascend Chip Can I View?
- What Are the Relationships Between Files Stored in JupyterLab, Terminal, and OBS?
- How Do I Migrate Data from an Old-Version Notebook Instance to a New-Version One?
- How Do I Use the Datasets Created on ModelArts in a Notebook Instance?
- pip and Common Commands
- What Are Sizes of the /cache Directories for Different Notebook Specifications in DevEnviron?
-
Training Jobs
-
Functional Consulting
- What Are the Solutions to Underfitting?
- What Are the Precautions for Switching Training Jobs from the Old Version to the New Version?
- How Do I Obtain a Trained ModelArts Model?
- What Is TensorBoard Used for in Model Visualization Jobs?
- How Do I Obtain RANK_TABLE_FILE on ModelArts for Distributed Training?
- How Do I Obtain the CUDA and cuDNN Versions of a Custom Image?
- How Do I Obtain a MoXing Installation File?
- In a Multi-Node Training, the TensorFlow PS Node Functioning as a Server Will Be Continuously Suspended. How Does ModelArts Determine Whether the Training Is Complete? Which Node Is a Worker?
- How Do I Install MoXing for a Custom Image of a Training Job?
- Reading Data During Training
-
Compiling the Training Code
- How Do I Create a Training Job When a Dependency Package Is Referenced by the Model to Be Trained?
- What Is the Common File Path for Training Jobs?
- How Do I Install a Library That C++ Depends on?
- How Do I Check Whether a Folder Copy Is Complete During Job Training?
- How Do I Load Some Well Trained Parameters During Job Training?
- How Do I Obtain Training Job Parameters from the Boot File of the Training Job?
- Why Can't I Use os.system ('cd xxx') to Access the Corresponding Folder During Job Training?
- How Do I Invoke a Shell Script in a Training Job to Execute the .sh File?
- How Do I Obtain the Dependency File Path to be Used in Training Code?
- What Is the File Path If a File in the model Directory Is Referenced in a Custom Python Package?
-
Creating a Training Job
- What Can I Do If the Message "Object directory size/quantity exceeds the limit" Is Displayed When I Create a Training Job?
- What Are Precautions for Setting Training Parameters?
- What Are Sizes of the /cache Directories for Different Resource Specifications in the Training Environment?
- Is the /cache Directory of a Training Job Secure?
- Why Is a Training Job Always Queuing?
- Managing Training Job Versions
-
Viewing Job Details
- How Do I Check Resource Usage of a Training Job?
- How Do I Access the Background of a Training Job?
- Is There Any Conflict When Models of Two Training Jobs Are Saved in the Same Directory of a Container?
- Only Three Valid Digits Are Retained in a Training Output Log. Can the Value of loss Be Changed?
- Can a Trained Model Be Downloaded or Migrated to Another Account? How Do I Obtain the Download Path?
-
Functional Consulting
-
Service Deployment
-
Model Management
-
Importing Models
- How Do I Import the .h5 Model of Keras to ModelArts?
- How Do I Edit the Installation Package Dependency Parameters in a Model Configuration File When Importing a Model?
- How Do I Change the Default Port to Create a Real-Time Service Using a Custom Image?
- Does ModelArts Support Multi-Model Import?
- Restrictions on the Size of an Image for Importing an AI Application
-
Importing Models
-
Service Deployment
-
Functional Consulting
- What Types of Services Can Models Be Deployed as on ModelArts?
- What Are the Differences Between Real-Time Services and Batch Services?
- What Is the Maximum Size of a Prediction Request Body?
- How Do I Select Compute Node Specifications for Deploying a Service?
- What Is the CUDA Version for Deploying a Service on GPUs?
- Real-Time Services
-
Functional Consulting
-
Model Management
-
API/SDK
- Can ModelArts APIs or SDKs Be Used to Download Models to a Local PC?
- What Installation Environments Do ModelArts SDKs Support?
- Does ModelArts Use the OBS API to Access OBS Files over an Intranet or the Internet?
- How Do I Obtain a Job Resource Usage Curve After I Submit a Training Job by Calling an API?
- How Do I View the Old-Version Dedicated Resource Pool List Using the SDK?
-
Using PyCharm Toolkit
- What Should I Do If an Error Occurs During Toolkit Installation?
- What Should I Do If an Error Occurs When I Edit a Credential in PyCharm Toolkit?
- Why Cannot I Start Training?
- What Should I Do If Error "xxx isn't existed in train_version" Occurs When a Training Job Is Submitted?
- What Should I Do If Error "Invalid OBS path" Occurs When a Training Job Is Submitted?
- What Should I Do If an Error Occurs During Service Deployment?
- How Do I View Error Logs of PyCharm Toolkit?
- How Do I Use PyCharm ToolKit to Create Multiple Jobs for Simultaneous Training?
- What Should I Do If "Error occurs when accessing to OBS" Is Displayed When PyCharm ToolKit Is Used?
-
General Issues
-
Troubleshooting
- General Issues
-
DevEnviron
- Environment Configuration Faults
-
Instance Faults
- What Do I Do If I Cannot Access My Notebook Instance?
- What Should I Do When the System Displays an Error Message Indicating that No Space Left After I Run the pip install Command?
- What Do I Do If the Code Can Be Run But Cannot Be Saved, and the Error Message "save error" Is Displayed?
- ModelArts.6333 Error Occurs
- What Can I Do If a Message Is Displayed Indicating that the Token Does Not Exist or Is Lost When I Open a Notebook Instance?
-
Code Running Failures
- Error Occurs When Using a Notebook Instance to Run Code, Indicating That No File Is Found in /tmp
- What Do I Do If a Notebook Instance Won't Run My Code?
- Why Does the Instance Break Down When dead kernel Is Displayed During Training Code Running?
- What Do I Do If cudaCheckError Occurs During Training?
- What Do I Do If Insufficient Space Is Displayed in DevEnviron?
- Why Does the Notebook Instance Break Down When opencv.imshow Is Used?
- Why Cannot the Path of a Text File Generated in Windows OS Be Found In a Notebook Instance?
- What Do I Do If No Kernel Is Displayed After a Notebook File Is Created?
- JupyterLab Plug-in Faults
-
Save an Image Failures
- What If the Error Message "there are processes in 'D' status, please check process status using'ps -aux' and kill all the 'D' status processes" or "Buildimge,False,Error response from daemon,Cannot pause container xxx" Is Displayed When I Save an Image?
- What Do I Do If Error "container size %dG is greater than threshold %dG" Is Displayed When I Save an Image?
- What Do I Do If Error "too many layers in your image" Is Displayed When I Save an Image?
- What Do I Do If Error "The container size (xG) is greater than the threshold (25G)" Is Reported When I Save an Image?
- Other Faults
-
Training Jobs
-
OBS Operation Issues
- Failed to Correctly Read Files
- Error Message Is Displayed Repeatedly When a TensorFlow-1.8 Job Is Connected to OBS
- TensorFlow Stops Writing TensorBoard to OBS When the Size of Written Data Reaches 5 GB
- Error "Unable to connect to endpoint" Error Occurs When a Model Is Saved
- What Do I Do If Error Message "No such file or directory" Is Displayed in Training Job Logs?
- Error Message "BrokenPipeError: Broken pipe" Displayed When OBS Data Is Copied
- Error Message "ValueError: Invalid endpoint: obs.xxxx.com" Displayed in Logs
- Error Message "errorMessage:The specified key does not exist" Displayed in Logs
-
In-Cloud Migration Adaptation Issues
- Failed to Import a Module
- Error Message "No module named .*" Displayed in Training Job Logs
- Failed to Install a Third-Party Package
- Failed to Download the Code Directory
- Error Message "No such file or directory" Displayed in Training Job Logs
- Failed to Find the .so File During Training
- Failed to Parse Parameters and Log Error Occurs
- Training Output Path Is Used by Another Job
- Failed to Find the Boot File When a Training Job Is Created Using a Custom Image
- Error Message "RuntimeError: std::exception" Displayed for a PyTorch 1.0 Engine
- Error Message "retCode=0x91, [the model stream execute failed]" Displayed in MindSpore Logs
- Error Occurred When Pandas Reads Data from an OBS File If MoXing Is Used to Adapt to an OBS Path
- Error Message "Please upgrade numpy to >= xxx to use this pandas version" Displayed in Logs
- Reinstalled CUDA Version Does Not Match the One in the Target Image
- Error ModelArts.2763 Occurred During Training Job Creation
- Error Message "AttributeError: module '***' has no attribute '***'" Displayed Training Job Logs
- System Container Exits Unexpectedly
-
Memory Limit Issues
- Downloading Files Timed Out or No Space Left for Reading Data
- Insufficient Container Space for Copying Data
- Error Message "No space left" Displayed When a TensorFlow Multi-node Job Downloads Data to /cache
- Size of the Log File Has Reached the Limit
- Error Message "write line error" Displayed in Logs
- Error Message "No space left on device" Displayed in Logs
- Training Job Failed Due to OOM
- Common Issues Related to Insufficient Disk Space and Solutions
- Internet Access Issues
- Permission Issues
-
GPU Issues
- Error Message "No CUDA-capable device is detected" Displayed in Logs
- Error Message "RuntimeError: connect() timed out" Displayed in Logs
- Error Message "cuda runtime error (10) : invalid device ordinal at xxx" Displayed in Logs
- Error Message "RuntimeError: Cannot re-initialize CUDA in forked subprocess" Displayed in Logs
- No GPU Is Found for a Training Job
-
Service Code Issues
- Error Message "pandas.errors.ParserError: Error tokenizing data. C error: Expected .* fields" Displayed in Logs
- Error Message "max_pool2d_with_indices_out_cuda_frame failed with error code 0" Displayed in Logs
- Training Job Failed with Error Code 139
- Debugging Training Code in the Cloud Environment If a Training Job Failed
- Error Message "'(slice(0, 13184, None), slice(None, None, None))' is an invalid key" Displayed in Logs
- Error Message "DataFrame.dtypes for data must be int, float or bool" Displayed in Logs
- Error Message "CUDNN_STATUS_NOT_SUPPORTED" Displayed in Logs
- Error Message "Out of bounds nanosecond timestamp" Displayed in Logs
- Error Message "Unexpected keyword argument passed to optimizer" Displayed in Logs
- Error Message "no socket interface found" Displayed in Logs
- Error Message "Runtimeerror: Dataloader worker (pid 46212) is killed by signal: Killed BP" Displayed in Logs
- Error Message "AttributeError: 'NoneType' object has no attribute 'dtype'" Displayed in Logs
- Error Message "No module name 'unidecode'" Displayed in Logs
- Distributed Tensorflow Cannot Use tf.variable
- When MXNet Creates kvstore, the Program Is Blocked and No Error Is Reported
- ECC Error Occurs in the Log, Causing Training Job Failure
- Training Job Failed Because the Maximum Recursion Depth Is Exceeded
- Training Using a Built-in Algorithm Failed Due to a bndbox Error
- Training Job Status Is Reviewing Job Initialization
- Training Job Process Exits Unexpectedly
- Stopped Training Job Process
- Training Job Suspended
- Training Jobs Created in a Dedicated Resource Pool
- Training Performance Issues
-
OBS Operation Issues
-
Inference Deployment
-
AI Application Management
- Creating an AI Application Failed
- Failed to Build an Image or Import a File When an IAM user Creates an AI Application
- Obtaining the Directory Structure in the Target Image When Importing an AI Application Through OBS
- Failed to Obtain Certain Logs on the ModelArts Log Query Page
- Failed to Download a pip Package When an AI Application Is Created Using OBS
- Failed to Use a Custom Image to Create an AI application
- Insufficient Disk Space Is Displayed When a Service Is Deployed After an AI Application Is Imported
- Error Occurred When a Created AI Application Is Deployed as a Service
- Invalid Runtime Dependency Configured in an Imported Custom Image
- Garbled Characters Displayed in an AI Application Name Returned When AI Application Details Are Obtained Through an API
- The Model or Image Exceeded the Size Limit for AI Application Import
- A Single Model File Exceeded the Size Limit (5 GB) for AI Application Import
- Creating an AI Application Failed Due to Image Building Timeout
-
Service Deployment
- Error Occurred When a Custom Image Model Is Deployed as a Real-Time Service
- Alarm Status of a Deployed Real-Time Service
- Failed to Start a Service
- What Do I Do If an Image Fails to Be Pulled When a Service Is Deployed, Started, Upgraded, or Modified?
- What Do I Do If an Image Restarts Repeatedly When a Service Is Deployed, Started, Upgraded, or Modified?
- What Do I Do If a Container Health Check Fails When a Service Is Deployed, Started, Upgraded, or Modified?
- What Do I Do If Resources Are Insufficient When a Service Is Deployed, Started, Upgraded, or Modified?
- Error Occurred When a CV2 Model Package Is Used to Deploy a Real-Time Service
- Service Is Consistently Being Deployed
- A Started Service Is Intermittently in the Alarm State
- Failed to Deploy a Service and Error "No Module named XXX" Occurred
- Insufficient Permission to or Unavailable Input/Output OBS Path of a Batch Service
-
Service Prediction
- Service Prediction Failed
- Error "APIG.XXXX" Occurred in a Prediction Failure
- Error ModelArts.4206 Occurred in Real-Time Service Prediction
- Error ModelArts.4302 Occurred in Real-Time Service Prediction
- Error ModelArts.4503 Occurred in Real-Time Service Prediction
- Error MR.0105 Occurred in Real-Time Service Prediction
- Method Not Allowed
- Request Timed Out
- Error Occurred When an API Is Called for Deploying a Model Created Using a Custom Image
-
AI Application Management
-
MoXing
- Error Occurs When MoXing Is Used to Copy Data
- How Do I Disable the Warmup Function of the Mox?
- Pytorch Mox Logs Are Repeatedly Generated
- Does moxing.tensorflow Contain the Entire TensorFlow? How Do I Perform Local Fine Tune on the Generated Checkpoint?
- Copying Data Using MoXing Is Slow and the Log Is Repeatedly Printed in a Training Job
- Failed to Access a Folder Using MoXing and Read the Folder Size Using get_size
- APIs or SDKs
-
Best Practices
-
Permissions Management
- Basic Concepts
- Permission Management Mechanisms
-
Configuration Practices in Typical Scenarios
- Assigning Permissions to Individual Users for Using ModelArts
- Separately Assigning Permissions to Administrators and Developers
- Viewing the Notebook Instances of All IAM Users Under One Tenant Account
- Logging In to a Training Container Using Cloud Shell
- Prohibiting a User from Using a Public Resource Pool
- Model Development (Custom Algorithms in Training Jobs of the New Version)
- Model Inference
-
Permissions Management
- Videos
- Data Labeling
- Data Preparation and Analytics
- Data Processing
-
Tool Guide (Cloud Alliance scenario)
- PyCharm Toolkit
- Preparations
- Connecting to a Notebook Instance Through PyCharm Toolkit
- PyCharm Toolkit (Latest Version)
- OBS-based Upload and Download
-
FAQs
- What Should I Do If an Error Occurs During ToolKit Installation?
- An Error Occurs When You Edit a Credential in PyCharm Toolkit
- Why Cannot I Start Training?
- What Should I Do If Error "xxx isn't existed in train_version" Occurs When a Training Job Is Submitted
- What Should I Do If an Error Occurs When I Submit a Training Job
- What Should I Do If an Error Occurs During Service Deployment
- How Do I View Error Logs of PyCharm ToolKit?
Obtaining the Real-Time Resource Usage
Function
This API is used to obtain the real-time usage of all resource pools in the current project.
URI
GET /v2/{project_id}/metrics/runtime/pools
Parameter |
Mandatory |
Type |
Description |
---|---|---|---|
project_id |
Yes |
String |
Project ID. For details, see Obtaining a Project ID and Name. |
Request Parameters
None
Response Parameters
Status code: 200
Parameter |
Type |
Description |
---|---|---|
apiVersion |
String |
Resource version. Options:
|
kind |
String |
Resource type. Options:
|
items |
Array of MetricsItem objects |
Metric list |
Parameter |
Type |
Description |
---|---|---|
table |
table object |
Resource list |
metadata |
ResourceMetricsMetadata object |
Metadata of resource metrics |
Parameter |
Type |
Description |
---|---|---|
allocated |
Allocated object |
Allocated resources |
capacity |
Capacity object |
Total resource capacity |
Parameter |
Type |
Description |
---|---|---|
value |
Value object |
Resource amount |
timestamp |
String |
UTC time, in the format of yyyy-MM-dd'T'HH:mm:ss'Z' |
window |
String |
Statistics interval. 1s indicates 1 second, 1m indicates 1 minute, and 1h indicates 1 hour. |
Parameter |
Type |
Description |
---|---|---|
value |
Value object |
Resource amount |
maxValue |
Value object |
Maximum number of elastic resources |
timestamp |
String |
UTC time, in the format of yyyy-MM-dd'T'HH:mm:ss'Z' |
window |
String |
Statistics interval. 1s indicates 1 second, 1m indicates 1 minute, and 1h indicates 1 hour. |
Example Requests
None
Example Responses
Status code: 200
OK
{ "apiVersion" : "v2", "kind" : "PoolMetricsList", "items" : [ { "table" : { "allocated" : { "value" : { "cpu" : 5, "memory" : "15548Mi", "nvidia.com/t4" : "1073m" }, "timestamp" : "2022-03-30T07:09:10Z", "window" : "1m" }, "capacity" : { "value" : { "cpu" : 16, "memory" : "64Gi", "nvidia.com/t4" : 2 }, "maxValue" : { "cpu" : 16, "memory" : "64Gi", "nvidia.com/t4" : 2 }, "timestamp" : "2022-03-30T07:09:10Z", "window" : "1m" } }, "metadata" : { "name" : "hougang-rse-pool" } } ] }
Status Codes
Status Code |
Description |
---|---|
200 |
OK |
Error Codes
See Error Codes.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.