Esta página aún no está disponible en su idioma local. Estamos trabajando arduamente para agregar más versiones de idiomas. Gracias por tu apoyo.
- What's New
- Function Overview
- Service Overview
- Preparations
-
DevEnviron
- Introduction to DevEnviron
- Application Scenarios
-
Managing Notebook Instances
- Creating a Notebook Instance
- Accessing a Notebook Instance
- Searching for, Starting, Stopping, or Deleting a Notebook Instance
- Selecting Storage in DevEnviron
- Changing a Notebook Instance Image
- Dynamically Expanding EVS Disk Capacity
- Changing the Flavor of a Notebook Instance
- Modifying the SSH Configuration for a Notebook Instance
- Viewing the Notebook Instances of All IAM Users Under One Tenant Account
-
JupyterLab
- Operation Process in JupyterLab
- JupyterLab Overview and Common Operations
- Code Parametrization Plug-in
- Using ModelArts SDK
- Using the Git Plug-in
- Uploading and Downloading Data in Notebook
-
Local IDE
- Operation Process in a Local IDE
- Local IDE (PyCharm)
-
Local IDE (VS Code)
- Connecting to a Notebook Instance Through VS Code
- Installing VS Code
- Connecting to a Notebook Instance Through VS Code with One Click
- Connecting to a Notebook Instance Through VS Code Toolkit
- Manually Connecting to a Notebook Instance Through VS Code
- Remotely Debugging in VS Code
- Uploading and Downloading Files in VS Code
- Local IDE (Accessed Using SSH)
-
ModelArts CLI Command Reference
- ModelArts CLI Overview
- (Optional) Installing ma-cli Locally
- Autocompletion for ma-cli Commands
- ma-cli Authentication
-
ma-cli Image Building Command
- ma-cli Image Building Command
- Obtaining an Image Creation Template
- Loading an Image Creation Template
- Obtaining Registered ModelArts Images
- Creating an Image in ModelArts Notebook
- Obtaining Image Creation Caches in ModelArts Notebook
- Clearing Image Creation Caches in ModelArts Notebook
- Registering SWR Images with ModelArts Image Management
- Deregistering a Registered Image from ModelArts Image Management
- Debugging an SWR Image on an ECS
-
Using the ma-cli ma-job Command to Submit a ModelArts Training Job
- ma-cli ma-job Command Overview
- Obtaining ModelArts Training Jobs
- Submitting a ModelArts Training Job
- Obtaining ModelArts Training Job Logs
- Obtaining ModelArts Training Job Events
- Obtaining ModelArts AI Engines for Training
- Obtaining ModelArts Resource Specifications for Training
- Stopping a ModelArts Training Job
- Using ma-cli to Copy OBS Data
-
Model Development
- Introduction to Model Development
- Preparing Data
- Preparing Algorithms
-
Performing a Training
- Creating a Training Job
- Reviewing Training Job Details
- Training Job Logs
- Viewing Training Job Events
- Viewing the Resource Usage of a Training Job
- Evaluation Results
- Viewing Environment Variables of a Training Container
- Stopping, Rebuilding, or Searching for a Training Job
- CloudShell
- Releasing Training Job Resources
- Training Experiment
- Advanced Training Operations
- Visualized Model Training
- Distributed Training
-
Model Inference
- Introduction to Inference
- Managing AI Applications
-
Deploying an AI Application as a Service
- Deploying AI Applications as Real-Time Services
- Deploying AI Applications as Batch Services
- Upgrading a Service
- Starting, Stopping, Deleting, or Restarting a Service
- Viewing Service Events
- Inference Specifications
- ModelArts Monitoring on Cloud Eye
-
Docker Containers with ModelArts
- Image Management
- Using Custom Images in Notebook Instances
- Using a Custom Image to Train Models (New-Version Training)
- Using a Custom Image to Create AI applications for Inference Deployment
- FAQs
-
Resource Management
- Resource Pool
-
Elastic Cluster
- Comprehensive Upgrades to ModelArts Resource Pool Management Functions
- Creating a Resource Pool
- Viewing Details About a Resource Pool
- Resizing a Resource Pool
- Migrating the Workspace
- Changing Job Types Supported by a Resource Pool
- Upgrading a Resource Pool Driver
- Deleting a Resource Pool
- Abnormal Status of a Dedicated Resource Pool
- ModelArts Network
- Monitoring Resources
-
SDK Reference
- Before You Start
- SDK Overview
- Getting Started
- (Optional) Installing the ModelArts SDK Locally
- Session Authentication
- OBS Management
- Data Management
-
Training Management
- Training Jobs
- APIs for Resources and Engine Specifications
- Model Management
- Service Management
-
API Reference
- Before You Start
- API Overview
- Calling APIs
-
DevEnviron Management
- Querying Notebook Instances
- Creating a Notebook Instance
- Querying Details of a Notebook Instance
- Updating a Notebook Instance
- Deleting a Notebook Instance
- Saving a Running Instance as a Container Image
- Obtaining the Available Flavors
- Querying Flavors Available for a Notebook Instance
- Querying the Available Duration of a Running Notebook Instance
- Prolonging a Notebook Instance
- Starting a Notebook Instance
- Stopping a Notebook Instance
- Obtaining the Notebook Instances with OBS Storage Mounted
- OBS Storage Mounting
- Obtaining Details About a Notebook Instance with OBS Storage Mounted
- Unmounting OBS Storage from a Notebook Instance
- Querying Supported Images
- Registering a Custom Image
- Obtaining User Image Groups
- Obtaining Details of an Image
- Deleting an Image
-
Training Management
- Creating an Algorithm
- Querying the Algorithm List
- Querying Algorithm Details
- Modifying an Algorithm
- Deleting an Algorithm
- Creating a Training Job
- Querying the Details About a Training Job
- Modifying the Description of a Training Job
- Deleting a Training Job
- Terminating a Training Job
- Querying the Logs of a Specified Task in a Given Training Job (Preview)
- Querying the Logs of a Specified Task in a Training Job (OBS Link)
- Querying the Running Metrics of a Specified Task in a Training Job
- Querying a Training Job List
- Obtaining the General Specifications Supported by a Training Job
- Obtaining the Preset AI Frameworks Supported by a Training Job
- AI Application Management
- Service Management
- Resource Management
- Authorization Management
- Use Cases
- Common Parameters
-
FAQs
-
General Issues
- What Is ModelArts?
- What Are the Relationships Between ModelArts and Other Services?
- What Are the Differences Between ModelArts and DLS?
- How Do I Obtain an Access Key?
- How Do I Upload Data to OBS?
- What Do I Do If the System Displays a Message Indicating that the AK/SK Pair Is Unavailable?
- How Do I Use ModelArts to Train Models Based on Structured Data?
- What Are Regions and AZs?
- How Do I Check Whether ModelArts and an OBS Bucket Are in the Same Region?
- How Do I View All Files Stored in OBS on ModelArts?
- Where Are Datasets of ModelArts Stored in a Container?
- What Are the Functions of ModelArts Training and Inference?
- Can AI-assisted Identification of ModelArts Identify a Specific Label?
- Why Is the Job Still Queued When Resources Are Sufficient?
-
Notebook (New Version)
- Constraints
-
Data Upload or Download
- How Do I Upload a File from a Notebook Instance to OBS or Download a File from OBS to a Notebook Instance?
- How Do I Upload Local Files to a Notebook Instance?
- How Do I Import Large Files to a Notebook Instance?
- Where Will the Data Be Uploaded to?
- How Do I Download Files from a Notebook Instance to a Local Computer?
- How Do I Copy Data from Development Environment Notebook A to Notebook B?
- Data Storage
-
Environment Configurations
- How Do I Check the CUDA Version Used by a Notebook Instance?
- How Do I Enable the Terminal Function in DevEnviron of ModelArts?
- How Do I Install External Libraries in a Notebook Instance?
- How Do I Obtain the External IP Address of My Local PC?
- How Can I Resolve Abnormal Font Display on a ModelArts Notebook Accessed from iOS?
- Is There a Proxy for Notebook? How Do I Disable It?
-
Notebook Instances
- What Do I Do If I Cannot Access My Notebook Instance?
- What Should I Do When the System Displays an Error Message Indicating that No Space Left After I Run the pip install Command?
- What Do I Do If "Read timed out" Is Displayed After I Run pip install?
- What Do I Do If the Code Can Be Run But Cannot Be Saved, and the Error Message "save error" Is Displayed?
-
Code Execution
- What Do I Do If a Notebook Instance Won't Run My Code?
- Why Does the Instance Break Down When dead kernel Is Displayed During Training Code Running?
- What Do I Do If cudaCheckError Occurs During Training?
- What Should I Do If DevEnviron Prompts Insufficient Space?
- Why Does the Notebook Instance Break Down When opencv.imshow Is Used?
- Why Cannot the Path of a Text File Generated in Windows OS Be Found In a Notebook Instance?
- What Do I Do If Files Fail to Be Saved in JupyterLab?
-
Failures to Access the Development Environment Through VS Code
- What Do I Do If the VS Code Window Is Not Displayed?
- What Do I Do If a Remote Connection Failed After VS Code Is Opened?
- What Do I Do If Error Message "Could not establish connection to xxx" Is Displayed During a Remote Connection?
- What Do I Do If the Connection to a Remote Development Environment Remains in "Setting up SSH Host xxx: Downloading VS Code Server locally" State for More Than 10 Minutes?
- What Do I Do If the Connection to a Remote Development Environment Remains in the State of "Setting up SSH Host xxx: Downloading VS Code Server locally" for More Than 10 Minutes?
- What Do I Do If the Connection to a Remote Development Environment Remains in the State of "ModelArts Remote Connect: Connecting to instance xxx..." for More Than 10 Minutes?
- What Do I Do If a Remote Connection Is in the Retry State?
- What Do I Do If Error Message "The VS Code Server failed to start" Is Displayed?
- What Do I Do If Error Message "Permissions for 'x:/xxx.pem' are too open" Is Displayed?
- What Do I Do If Error Message "Bad owner or permissions on C:\Users\Administrator/.ssh/config" or "Connection permission denied (publickey)" Is Displayed?
- What Do I Do If Error Message "ssh: connect to host xxx.pem port xxxxx: Connection refused" Is Displayed?
- What Do I Do If Error Message "ssh: connect to host ModelArts-xxx port xxx: Connection timed out" Is Displayed?
- What Do I Do If Error Message "Load key "C:/Users/xx/test1/xxx.pem": invalid format" Is Displayed?
- What Do I Do If Error Message "An SSH installation couldn't be found" or "Could not establish connection to instance xxx: 'ssh' ..." Is Displayed?
- What Do I Do If Error Message "no such identity: C:/Users/xx /test.pem: No such file or directory" Is Displayed?
- What Do I Do If Error Message "Host key verification failed" or "Port forwarding is disabled" Is Displayed?
- What Do I Do If Error Message "Failed to install the VS Code Server" or "tar: Error is not recoverable: exiting now" Is Displayed?
- What Do I Do If Error Message "XHR failed" Is Displayed When a Remote Notebook Instance Is Accessed Through VS Code?
- What Do I Do for an Automatically Disconnected VS Code Connection If No Operation Is Performed for a Long Time?
- What Do I Do If It Takes a Long Time to Set Up a Remote Connection After VS Code Is Automatically Upgraded?
- What Do I Do If Error Message "Connection reset" Is Displayed During an SSH Connection?
- What Can I Do If a Notebook Instance Is Frequently Disconnected or Stuck After I Use MobaXterm to Connect to the Notebook Instance in SSH Mode?
-
Others
- How Do I Use Multiple Ascend Cards for Debugging in a Notebook Instance?
- Why Is the Training Speed Similar When Different Notebook Flavors Are Used?
- How Do I Perform Incremental Training When Using MoXing?
- How Do I View GPU Usage on the Notebook?
- How Can I Obtain GPU Usage Through Code?
- Which Real-Time Performance Indicators of an Ascend Chip Can I View?
- What Are the Relationships Between Files Stored in JupyterLab, Terminal, and OBS?
- How Do I Migrate Data from an Old-Version Notebook Instance to a New-Version One?
- How Do I Use the Datasets Created on ModelArts in a Notebook Instance?
- pip and Common Commands
- What Are Sizes of the /cache Directories for Different Notebook Specifications in DevEnviron?
-
Training Jobs
-
Functional Consulting
- What Are the Solutions to Underfitting?
- What Are the Precautions for Switching Training Jobs from the Old Version to the New Version?
- How Do I Obtain a Trained ModelArts Model?
- What Is TensorBoard Used for in Model Visualization Jobs?
- How Do I Obtain RANK_TABLE_FILE on ModelArts for Distributed Training?
- How Do I Obtain the CUDA and cuDNN Versions of a Custom Image?
- How Do I Obtain a MoXing Installation File?
- In a Multi-Node Training, the TensorFlow PS Node Functioning as a Server Will Be Continuously Suspended. How Does ModelArts Determine Whether the Training Is Complete? Which Node Is a Worker?
- How Do I Install MoXing for a Custom Image of a Training Job?
- Reading Data During Training
-
Compiling the Training Code
- How Do I Create a Training Job When a Dependency Package Is Referenced by the Model to Be Trained?
- What Is the Common File Path for Training Jobs?
- How Do I Install a Library That C++ Depends on?
- How Do I Check Whether a Folder Copy Is Complete During Job Training?
- How Do I Load Some Well Trained Parameters During Job Training?
- How Do I Obtain Training Job Parameters from the Boot File of the Training Job?
- Why Can't I Use os.system ('cd xxx') to Access the Corresponding Folder During Job Training?
- How Do I Invoke a Shell Script in a Training Job to Execute the .sh File?
- How Do I Obtain the Dependency File Path to be Used in Training Code?
- What Is the File Path If a File in the model Directory Is Referenced in a Custom Python Package?
-
Creating a Training Job
- What Can I Do If the Message "Object directory size/quantity exceeds the limit" Is Displayed When I Create a Training Job?
- What Are Precautions for Setting Training Parameters?
- What Are Sizes of the /cache Directories for Different Resource Specifications in the Training Environment?
- Is the /cache Directory of a Training Job Secure?
- Why Is a Training Job Always Queuing?
- Managing Training Job Versions
-
Viewing Job Details
- How Do I Check Resource Usage of a Training Job?
- How Do I Access the Background of a Training Job?
- Is There Any Conflict When Models of Two Training Jobs Are Saved in the Same Directory of a Container?
- Only Three Valid Digits Are Retained in a Training Output Log. Can the Value of loss Be Changed?
- Can a Trained Model Be Downloaded or Migrated to Another Account? How Do I Obtain the Download Path?
-
Functional Consulting
-
Service Deployment
-
Model Management
-
Importing Models
- How Do I Import the .h5 Model of Keras to ModelArts?
- How Do I Edit the Installation Package Dependency Parameters in a Model Configuration File When Importing a Model?
- How Do I Change the Default Port to Create a Real-Time Service Using a Custom Image?
- Does ModelArts Support Multi-Model Import?
- Restrictions on the Size of an Image for Importing an AI Application
-
Importing Models
-
Service Deployment
-
Functional Consulting
- What Types of Services Can Models Be Deployed as on ModelArts?
- What Are the Differences Between Real-Time Services and Batch Services?
- What Is the Maximum Size of a Prediction Request Body?
- How Do I Select Compute Node Specifications for Deploying a Service?
- What Is the CUDA Version for Deploying a Service on GPUs?
- Real-Time Services
-
Functional Consulting
-
Model Management
-
API/SDK
- Can ModelArts APIs or SDKs Be Used to Download Models to a Local PC?
- What Installation Environments Do ModelArts SDKs Support?
- Does ModelArts Use the OBS API to Access OBS Files over an Intranet or the Internet?
- How Do I Obtain a Job Resource Usage Curve After I Submit a Training Job by Calling an API?
- How Do I View the Old-Version Dedicated Resource Pool List Using the SDK?
-
Using PyCharm Toolkit
- What Should I Do If an Error Occurs During Toolkit Installation?
- What Should I Do If an Error Occurs When I Edit a Credential in PyCharm Toolkit?
- Why Cannot I Start Training?
- What Should I Do If Error "xxx isn't existed in train_version" Occurs When a Training Job Is Submitted?
- What Should I Do If Error "Invalid OBS path" Occurs When a Training Job Is Submitted?
- What Should I Do If an Error Occurs During Service Deployment?
- How Do I View Error Logs of PyCharm Toolkit?
- How Do I Use PyCharm ToolKit to Create Multiple Jobs for Simultaneous Training?
- What Should I Do If "Error occurs when accessing to OBS" Is Displayed When PyCharm ToolKit Is Used?
-
General Issues
-
Troubleshooting
- General Issues
-
DevEnviron
- Environment Configuration Faults
-
Instance Faults
- What Do I Do If I Cannot Access My Notebook Instance?
- What Should I Do When the System Displays an Error Message Indicating that No Space Left After I Run the pip install Command?
- What Do I Do If the Code Can Be Run But Cannot Be Saved, and the Error Message "save error" Is Displayed?
- ModelArts.6333 Error Occurs
- What Can I Do If a Message Is Displayed Indicating that the Token Does Not Exist or Is Lost When I Open a Notebook Instance?
-
Code Running Failures
- Error Occurs When Using a Notebook Instance to Run Code, Indicating That No File Is Found in /tmp
- What Do I Do If a Notebook Instance Won't Run My Code?
- Why Does the Instance Break Down When dead kernel Is Displayed During Training Code Running?
- What Do I Do If cudaCheckError Occurs During Training?
- What Do I Do If Insufficient Space Is Displayed in DevEnviron?
- Why Does the Notebook Instance Break Down When opencv.imshow Is Used?
- Why Cannot the Path of a Text File Generated in Windows OS Be Found In a Notebook Instance?
- What Do I Do If No Kernel Is Displayed After a Notebook File Is Created?
- JupyterLab Plug-in Faults
-
Save an Image Failures
- What If the Error Message "there are processes in 'D' status, please check process status using'ps -aux' and kill all the 'D' status processes" or "Buildimge,False,Error response from daemon,Cannot pause container xxx" Is Displayed When I Save an Image?
- What Do I Do If Error "container size %dG is greater than threshold %dG" Is Displayed When I Save an Image?
- What Do I Do If Error "too many layers in your image" Is Displayed When I Save an Image?
- What Do I Do If Error "The container size (xG) is greater than the threshold (25G)" Is Reported When I Save an Image?
- Other Faults
-
Training Jobs
-
OBS Operation Issues
- Failed to Correctly Read Files
- Error Message Is Displayed Repeatedly When a TensorFlow-1.8 Job Is Connected to OBS
- TensorFlow Stops Writing TensorBoard to OBS When the Size of Written Data Reaches 5 GB
- Error "Unable to connect to endpoint" Error Occurs When a Model Is Saved
- What Do I Do If Error Message "No such file or directory" Is Displayed in Training Job Logs?
- Error Message "BrokenPipeError: Broken pipe" Displayed When OBS Data Is Copied
- Error Message "ValueError: Invalid endpoint: obs.xxxx.com" Displayed in Logs
- Error Message "errorMessage:The specified key does not exist" Displayed in Logs
-
In-Cloud Migration Adaptation Issues
- Failed to Import a Module
- Error Message "No module named .*" Displayed in Training Job Logs
- Failed to Install a Third-Party Package
- Failed to Download the Code Directory
- Error Message "No such file or directory" Displayed in Training Job Logs
- Failed to Find the .so File During Training
- Failed to Parse Parameters and Log Error Occurs
- Training Output Path Is Used by Another Job
- Failed to Find the Boot File When a Training Job Is Created Using a Custom Image
- Error Message "RuntimeError: std::exception" Displayed for a PyTorch 1.0 Engine
- Error Message "retCode=0x91, [the model stream execute failed]" Displayed in MindSpore Logs
- Error Occurred When Pandas Reads Data from an OBS File If MoXing Is Used to Adapt to an OBS Path
- Error Message "Please upgrade numpy to >= xxx to use this pandas version" Displayed in Logs
- Reinstalled CUDA Version Does Not Match the One in the Target Image
- Error ModelArts.2763 Occurred During Training Job Creation
- Error Message "AttributeError: module '***' has no attribute '***'" Displayed Training Job Logs
- System Container Exits Unexpectedly
-
Memory Limit Issues
- Downloading Files Timed Out or No Space Left for Reading Data
- Insufficient Container Space for Copying Data
- Error Message "No space left" Displayed When a TensorFlow Multi-node Job Downloads Data to /cache
- Size of the Log File Has Reached the Limit
- Error Message "write line error" Displayed in Logs
- Error Message "No space left on device" Displayed in Logs
- Training Job Failed Due to OOM
- Common Issues Related to Insufficient Disk Space and Solutions
- Internet Access Issues
- Permission Issues
-
GPU Issues
- Error Message "No CUDA-capable device is detected" Displayed in Logs
- Error Message "RuntimeError: connect() timed out" Displayed in Logs
- Error Message "cuda runtime error (10) : invalid device ordinal at xxx" Displayed in Logs
- Error Message "RuntimeError: Cannot re-initialize CUDA in forked subprocess" Displayed in Logs
- No GPU Is Found for a Training Job
-
Service Code Issues
- Error Message "pandas.errors.ParserError: Error tokenizing data. C error: Expected .* fields" Displayed in Logs
- Error Message "max_pool2d_with_indices_out_cuda_frame failed with error code 0" Displayed in Logs
- Training Job Failed with Error Code 139
- Debugging Training Code in the Cloud Environment If a Training Job Failed
- Error Message "'(slice(0, 13184, None), slice(None, None, None))' is an invalid key" Displayed in Logs
- Error Message "DataFrame.dtypes for data must be int, float or bool" Displayed in Logs
- Error Message "CUDNN_STATUS_NOT_SUPPORTED" Displayed in Logs
- Error Message "Out of bounds nanosecond timestamp" Displayed in Logs
- Error Message "Unexpected keyword argument passed to optimizer" Displayed in Logs
- Error Message "no socket interface found" Displayed in Logs
- Error Message "Runtimeerror: Dataloader worker (pid 46212) is killed by signal: Killed BP" Displayed in Logs
- Error Message "AttributeError: 'NoneType' object has no attribute 'dtype'" Displayed in Logs
- Error Message "No module name 'unidecode'" Displayed in Logs
- Distributed Tensorflow Cannot Use tf.variable
- When MXNet Creates kvstore, the Program Is Blocked and No Error Is Reported
- ECC Error Occurs in the Log, Causing Training Job Failure
- Training Job Failed Because the Maximum Recursion Depth Is Exceeded
- Training Using a Built-in Algorithm Failed Due to a bndbox Error
- Training Job Status Is Reviewing Job Initialization
- Training Job Process Exits Unexpectedly
- Stopped Training Job Process
- Training Job Suspended
- Training Jobs Created in a Dedicated Resource Pool
- Training Performance Issues
-
OBS Operation Issues
-
Inference Deployment
-
AI Application Management
- Creating an AI Application Failed
- Failed to Build an Image or Import a File When an IAM user Creates an AI Application
- Obtaining the Directory Structure in the Target Image When Importing an AI Application Through OBS
- Failed to Obtain Certain Logs on the ModelArts Log Query Page
- Failed to Download a pip Package When an AI Application Is Created Using OBS
- Failed to Use a Custom Image to Create an AI application
- Insufficient Disk Space Is Displayed When a Service Is Deployed After an AI Application Is Imported
- Error Occurred When a Created AI Application Is Deployed as a Service
- Invalid Runtime Dependency Configured in an Imported Custom Image
- Garbled Characters Displayed in an AI Application Name Returned When AI Application Details Are Obtained Through an API
- The Model or Image Exceeded the Size Limit for AI Application Import
- A Single Model File Exceeded the Size Limit (5 GB) for AI Application Import
- Creating an AI Application Failed Due to Image Building Timeout
-
Service Deployment
- Error Occurred When a Custom Image Model Is Deployed as a Real-Time Service
- Alarm Status of a Deployed Real-Time Service
- Failed to Start a Service
- What Do I Do If an Image Fails to Be Pulled When a Service Is Deployed, Started, Upgraded, or Modified?
- What Do I Do If an Image Restarts Repeatedly When a Service Is Deployed, Started, Upgraded, or Modified?
- What Do I Do If a Container Health Check Fails When a Service Is Deployed, Started, Upgraded, or Modified?
- What Do I Do If Resources Are Insufficient When a Service Is Deployed, Started, Upgraded, or Modified?
- Error Occurred When a CV2 Model Package Is Used to Deploy a Real-Time Service
- Service Is Consistently Being Deployed
- A Started Service Is Intermittently in the Alarm State
- Failed to Deploy a Service and Error "No Module named XXX" Occurred
- Insufficient Permission to or Unavailable Input/Output OBS Path of a Batch Service
-
Service Prediction
- Service Prediction Failed
- Error "APIG.XXXX" Occurred in a Prediction Failure
- Error ModelArts.4206 Occurred in Real-Time Service Prediction
- Error ModelArts.4302 Occurred in Real-Time Service Prediction
- Error ModelArts.4503 Occurred in Real-Time Service Prediction
- Error MR.0105 Occurred in Real-Time Service Prediction
- Method Not Allowed
- Request Timed Out
- Error Occurred When an API Is Called for Deploying a Model Created Using a Custom Image
-
AI Application Management
-
MoXing
- Error Occurs When MoXing Is Used to Copy Data
- How Do I Disable the Warmup Function of the Mox?
- Pytorch Mox Logs Are Repeatedly Generated
- Does moxing.tensorflow Contain the Entire TensorFlow? How Do I Perform Local Fine Tune on the Generated Checkpoint?
- Copying Data Using MoXing Is Slow and the Log Is Repeatedly Printed in a Training Job
- Failed to Access a Folder Using MoXing and Read the Folder Size Using get_size
- APIs or SDKs
-
Best Practices
-
Permissions Management
- Basic Concepts
- Permission Management Mechanisms
-
Configuration Practices in Typical Scenarios
- Assigning Permissions to Individual Users for Using ModelArts
- Separately Assigning Permissions to Administrators and Developers
- Viewing the Notebook Instances of All IAM Users Under One Tenant Account
- Logging In to a Training Container Using Cloud Shell
- Prohibiting a User from Using a Public Resource Pool
- Model Development (Custom Algorithms in Training Jobs of the New Version)
- Model Inference
-
Permissions Management
- Videos
- Data Labeling
- Data Preparation and Analytics
- Data Processing
-
Tool Guide (Cloud Alliance scenario)
- PyCharm Toolkit
- Preparations
- Connecting to a Notebook Instance Through PyCharm Toolkit
- PyCharm Toolkit (Latest Version)
- OBS-based Upload and Download
-
FAQs
- What Should I Do If an Error Occurs During ToolKit Installation?
- An Error Occurs When You Edit a Credential in PyCharm Toolkit
- Why Cannot I Start Training?
- What Should I Do If Error "xxx isn't existed in train_version" Occurs When a Training Job Is Submitted
- What Should I Do If an Error Occurs When I Submit a Training Job
- What Should I Do If an Error Occurs During Service Deployment
- How Do I View Error Logs of PyCharm ToolKit?
Show all
Function
-
DevEnviron
-
During AI development, it is challenging to set up a development environment, select an AI framework and algorithm, debug code, install software, or accelerate hardware. To resolve these issues, ModelArts offers notebook for simplified development.
Both old and new versions of ModelArts notebook are available. Compared with the old version, the new version offers more optimized functions. Use the new version. The following describes the functions of the new-version notebook.
Released in: EU-Dublin
-
JupyterLab
-
The in-cloud Jupyter notebook offered by ModelArts enables online interactive development and debugging. It is out of the box, relieving you of installation and configuration.
-
-
-
Algorithm Management
-
You can upload locally developed algorithms or algorithms developed using other tools to ModelArts for unified management. You can use the algorithm you have created or subscribed to quickly create a training job on ModelArts and obtain the desired model.
Released in: EU-Dublin
-
-
Training Management
-
All the algorithms developed locally or using other tools can be uploaded to ModelArts for unified management. You can also subscribe to algorithms in AI Gallery to build models. In AI Gallery, both the built-in algorithms officially released by ModelArts and the algorithms shared by other users are available for you to subscribe to.
You can use the algorithms either you have created or you have subscribed to to quickly create training jobs on ModelArts.
Released in: EU-Dublin
-
Using Subscribed Algorithms to Develop Models
-
Both officially released algorithms and custom algorithms shared by developers are available in ModelArts AI Gallery. You can use these algorithms to build models without writing any code.
-
-
Using Custom Algorithms to Develop Models
-
If the algorithms that are available to subscribe to cannot meet service requirements or you want to migrate local algorithms to the cloud for training, use the training engines built in ModelArts to create algorithms. This is also known as using a custom script to create algorithms.
ModelArts offers almost all mainstream AI engines. These built-in engines are pre-loaded with some additional Python packages, such as NumPy. You can also use the requirements.txt file in the code directory to install dependency packages.
-
-
Using Custom Images to Develop Models
-
The built-in training engines and the algorithms that can be subscribed to apply to most training scenarios. In certain scenarios, ModelArts allows you to create custom images to train models. Custom images can be used in the cloud only after they are uploaded to the Software Repository for Container (SWR).
Customizing an image requires a deep understanding of containers. Use this method only if the algorithms that are available to subscribe to and the built-in training engines cannot meet your requirements.
-
-
-
Model Management
-
ModelArts allows you to deploy models as AI applications and centrally manages these applications. The models can be locally deployed or obtained from training jobs.
ModelArts also enables you to convert models and deploy them on different devices, such as Arm devices.
Released in: EU-Dublin
-
Importing a Meta Model from OBS Through Manual Configurations
-
In scenarios where frequently-used frameworks are used for model development and training, you can import the model to ModelArts and use it to create an AI application for unified management.
-
-
Importing Models from Custom Images
-
For an AI engine that is not supported by ModelArts, build a model for the AI engine, custom an image for the model, import the model to ModelArts, and deploy the model as AI applications.
-
-
Model Package Specifications
-
When you create an AI application in AI Application Management, if the meta model is imported from OBS or a container image, ensure the model package complies with specifications:
Edit the inference code and configuration file for subsequent inference deployment.
Note: A model trained using a built-in algorithm has had the inference code and configuration file configured. You do not need to configure them anymore.
-
-
-
Service Deployment
-
AI model deployment and large-scale implementation are complex. ModelArts provides you with a range of one-stop deployment modes that allow you to deploy trained models on devices, edges, and the cloud with just a click.
-
Real-Time Services
-
Real-time inference services feature high concurrency, low latency, and elastic scaling, and support multi-model gray release and A/B testing. You can deploy a model as a web service to provide real-time test UI and monitoring capabilities.。
Released in: EU-Dublin
-
-
Batch Services
-
Batch services are suitable for processing a large amount of data and distributed computing. You can use a batch service to perform inference on data in batches. The batch service automatically stops after data processing is completed.
Released in: EU-Dublin
-
-
-
Resource Pools
-
When you use ModelArts for AI development, you may require some CPU, GPU resources for training or inference. ModelArts provides pay-per-use public resource pools and dedicated resource pools that are queue-free, allowing you to meet a diverse range of development requirements.
-
OBS 2.0支持
-
A public resource pool provides public large-scale computing clusters, which are allocated based on job parameter settings. Resources are isolated by job. Billing of public resource pools is based on the resource specifications, duration, and instance quantity, regardless of the tasks (including training, deployment, and development) for which the pools are used. Public resource pools are available by default. You can select a public resource pool during AI development.
-
-
Dedicated Resource Pools
-
A dedicated resource pool provides exclusive compute resources, which can be used for notebook instances, training jobs, and model deployment. Dedicated resource pools deliver higher efficiency, and cannot be shared with other users. You can buy a dedicated resource pool and select it during AI development.
Released in: EU-Dublin
-
-
-
ModelArts SDK
-
ModelArts Software Development Kit (ModelArts SDK) encapsulates the ModelArts RESTful APIs in Python language to simplify application development. You can call ModelArts SDK to easily manage datasets, start AI training, generate models, and deploy models.
In notebook instances, you can use ModelArts SDK to manage OBS, training jobs, models, and real-time services without authentication configurations.Released in: EU-Dublin
-
-
免费体验
-
ModelArts免费算力限时抢。最低成本体验ModelArts,包括自动学习、开发环境(Notebook)、AI全流程开发过程。
-
-
OBS 2.0支持
-
什么是VPC对等连接?
虚拟私有云(Virtual Private Cloud,以下简称VPC),为云服务器、云容器、云数据库等资源构建隔离的、用户自主配置和管理的虚拟网络环境,提升用户云上资源的安全性,简化用户的网络部署。
您可以在VPC中定义安全组、VPN、IP地址段、带宽等网络特性。用户可以通过VPC方便地管理、配置内部网络,进行安全、快捷的网络变更。同时,用户可以自定义安全组内与组间弹性云服务器的访问规则,加强弹性云服务器的安全保护。
虚拟私有云(Virtual Private Cloud,以下简称VPC),为云服务器、云容器、云数据库等资源构建隔离的、用户自主配置和管理的虚拟网络环境,提升用户云上资源的安全性,简化用户的网络部署。
您可以在VPC中定义安全组、VPN、IP地址段、带宽等网络特性。用户可以通过VPC方便地管理、配置内部网络,进行安全、快捷的网络变更。同时,用户可以自定义安全组内与组间弹性云服务器的访问规则,加强弹性云服务器的安全保护。
除亚太-曼谷、亚太-新加坡、拉美-圣地亚哥以外的所有区域均已发布
-
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.