Preparing a Model Training Image
ModelArts provides deep learning-powered base images such as TensorFlow, PyTorch, and MindSpore images. In these images, the software mandatory for running training jobs has been installed. If the software in the base images cannot meet your service requirements, create new images based on the base images and use the new images to create training jobs.
Preset Training Images
The following table lists the preset training base images of ModelArts.
Engine |
Version |
---|---|
PyTorch |
pytorch_1.8.0-cuda_10.2-py_3.7-ubuntu_18.04-x86_64 |
TensorFlow |
tensorflow_2.1.0-cuda_10.1-py_3.7-ubuntu_18.04-x86_64 |
Horovod |
horovod_0.20.0-tensorflow_2.1.0-cuda_10.1-py_3.7-ubuntu_18.04-x86_64 |
horovod_0.22.1-pytorch_1.8.0-cuda_10.2-py_3.7-ubuntu_18.04-x86_64 |
|
MPI |
mindspore_1.3.0-cuda_10.1-py_3.7-ubuntu_1804-x86_64 |
Ascend-Powered-Engine |
tensorflow_1.15-cann_5.1.0-py_3.7-euler_2.8.3-aarch64 |
mindspore_1.7.0-cann_5.1.0-py_3.7-euler_2.8.3-aarch64 |
Procedure

Scenario 1: If the preset images meet ModelArts training constraints but lack necessary code dependencies, install additional software packages.
For details, see Creating a Custom Training Image Using a Preset Image.
Scenario 2: If the local images meet code dependency requirements but not ModelArts training constraints, adapt them to ModelArts.
For details, see Migrating Existing Images to ModelArts.
Scenario 3: If neither the preset nor local images meet your needs, create an image that has necessary code dependencies and meet ModelArts constraints. For details, see the following cases:
Creating a Custom Training Image (PyTorch + CPU/GPU)
Installing pip Dependencies in an Image
Before creating distributed training jobs, pre-install all required pip dependencies. If there are more than 10 nodes, the system automatically deletes the pip source configuration. Executing pip install commands during training may cause training failures.
Install all dependency packages before training. This stops failures from missing pip source configurations when using many nodes, making training more stable and efficient.
Install pip dependencies in either of the following ways:
- Method 1: Install dependencies in a notebook and save it as an image.
- Run the target image in the notebook environment.
- Run the pip install command to install all dependency packages.
- Save the image.
- Obtain the SWR address on the image details page for future training.
- Method 2: Install dependencies in a local container and export an image.
- Run the container in the local environment and run the pip install command to install all dependencies.
- Run the following command to save the container as an image:
docker commit <Container ID> <Image name>
Example:
docker commit my_container my_image:v1
- Run the following command to export the image as a .tar file:
docker save -o <.tar file name>.tar <Image name>:<Tag>
Example:
docker save -o my_image_v1.tar my_image:v1
- Upload the image to SWR for future training jobs.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot