Specifications for Custom Images Used for Training Jobs
When creating an image using locally developed models and training scripts, ensure that they meet the specifications defined by ModelArts.
Specifications
- Custom images cannot contain malicious code.
- Part of content in the basic images cannot be changed, including all the files in /bin, /sbin, /usr, and /lib(64), some important configuration files in /etc, and the ModelArts tools in $HOME.
- A file cannot be added whose owner is root and has permission setuid or setgid.
- The size of a custom image cannot exceed 9.5 GB.
- To ensure that the log content can be displayed normally, the logs must be standard output.
Basic Image Package
To facilitate code download, training log output, and log file upload to OBS, ModelArts provides basic image packages for creating custom images. The basic images provided by ModelArts have the following features:
- Some necessary tools are available in the basic image. You need to create a custom image based on the basic images provided by ModelArts.
- ModelArts continuously updates the basic image versions. For compatible updates, after the basic images are updated, you can still use the old images. For incompatible updates, the custom images created based on the old version cannot run on ModelArts, but the approved custom images can still be used.
- If a custom image fails to be approved and the audit log contains an error message indicating that the basic image does not match, you need to use a new basic image to create an image.
- Table 1 and Table 2 list components and tools contained in basic images. For details about the complete basic image content, see Dockerfile.
|
Component |
Description |
|---|---|
|
run_train.sh |
Training boot script. You can download the code directory, run training commands, redirect training log output, and upload log files to OBS after training commands are executed. |
|
Tool |
Description |
|---|---|
|
utils.sh |
Tool script. The run_train.sh script depends on this script. It provides methods such as SK decryption, code directory download, and log file upload. |
|
ip_mapper.py |
Script for obtaining NIC addresses. By default, the IP address of the ib0 NIC is obtained. Training code can use the IP address of the ib0 NIC to accelerate network communications. |
|
dls-downloader.py |
OBS download script. The utils.sh script depends on this script. |
The name format of the basic images provided by ModelArts is as follows:
- CUDA 8/9/92 image
swr.<region>.myhuaweicloud.com/<image org>/custom-<processor type>-<cuda version>-base:<image tag>
Parameter
Possible Value
Description
<region>
- cn-north-1
- cn-north-4
Region where the image resides. The possible values are described as follows:
- CN North-Beijing1
- Beijing4
<image org>
modelarts-job-dev-image
Organization to which the image belongs. Use modelarts-job-dev-image.
<processor type>
- cpu
- gpu
Processor type.
<cuda version>
- cuda92
- cuda9
- cuda8
CUDA version installed in the image.
In versions earlier than CUDA 10, the CUDA version takes effect only when <processor type> is set to gpu.
NOTE:Check the CUDA version. After the version is specified, it cannot be changed. Otherwise, the training will fail.
<image tag>
- 1.0
- 1.1
- 1.2
- 1.3
Image version.
- Image of the CUDA 8, 9, or 92 version. MoXing is pre-installed by default.
swr.<region>.myhuaweicloud.com/<image org>/custom-<processor type>-<cuda version>-inner-moxing-<python version>:<image tag>
Parameter
Possible Value
Description
<region>
- cn-north-1
- cn-north-4
Region where the image resides. The possible values are described as follows:
- CN North-Beijing1
- Beijing4
<image org>
modelarts-job-dev-image
Organization to which the image belongs. Use modelarts-job-dev-image.
<processor type>
- cpu
- gpu
Processor type.
<cuda version>
- cuda92
- cuda9
- cuda8
CUDA version installed in the image.
In versions earlier than CUDA 10, the CUDA version takes effect only when <processor type> is set to gpu.
NOTE:Check the CUDA version. After the version is specified, it cannot be changed. Otherwise, the training will fail.
python version
- cp27
- cp36
Python environment.
<image tag>
1.3
Image version.
- The image of CUDA 10.0, 10.1, or 10.2 uses Ubuntu 18.04 as the basic image. MoXing is pre-installed by default.
swr.<region>.myhuaweicloud.com/<image org>/custom-base-<cuda version>-<python version>-<os>-<arch>:<image tag>
Parameter
Possible Value
Description
<region>
- cn-north-1
- cn-north-4
Region where the image resides. The possible values are described as follows:
- CN North-Beijing1
- Beijing4
<image org>
modelarts-job-dev-image
Organization to which the image belongs. Use modelarts-job-dev-image.
<cuda version>
- cuda10.0
- cuda10.1
- cuda10.2
CUDA version installed in the image.
NOTE:Check the CUDA version. After the version is specified, it cannot be changed. Otherwise, the training will fail.
python version
cp36
Python 3.6 environment.
os
ubuntu18.04
Operating system.
arch
x86
Architecture.
<image tag>
1.1
Image version.
For example, in the CN North-Beijing1 region, ModelArts supports the following basic images. You can select desired images.
Versions earlier than CUDA 10:
- swr.cn-north-1.myhuaweicloud.com/modelarts-job-dev-image/custom-cpu-base:1.3
- swr.cn-north-1.myhuaweicloud.com/modelarts-job-dev-image/custom-gpu-cuda92-base:1.3
- swr.cn-north-1.myhuaweicloud.com/modelarts-job-dev-image/custom-gpu-cuda9-base:1.3
- swr.cn-north-1.myhuaweicloud.com/modelarts-job-dev-image/custom-gpu-cuda8-base:1.3
- swr.cn-north-1.myhuaweicloud.com/modelarts-job-dev-image/custom-gpu-cuda9-inner-moxing-cp36:1.3
- swr.cn-north-1.myhuaweicloud.com/modelarts-job-dev-image/custom-gpu-cuda8-inner-moxing-cp27:1.3
- swr.cn-north-1.myhuaweicloud.com/modelarts-job-dev-image/custom-gpu-cuda9-inner-moxing-cp27:1.3
- ...
Versions later than CUDA 10:
- swr.cn-north-1.myhuaweicloud.com/modelarts-job-dev-image/custom-base-cuda10.0-cp36-ubuntu18.04-x86:1.1
- swr.cn-north-1.myhuaweicloud.com/modelarts-job-dev-image/custom-base-cuda10.1-cp36-ubuntu18.04-x86:1.1
- swr.cn-north-1.myhuaweicloud.com/modelarts-job-dev-image/custom-base-cuda10.2-cp36-ubuntu18.04-x86:1.1
Last Article: For Training Models
Next Article: Creating a Training Job Using a Custom Image
Did this article solve your problem?
Thank you for your score!Your feedback would help us improve the website.