What Are the Precautions for Switching Training Jobs from the Old Version to the New Version?
The differences between the new version and the old version lie in:
- Differences in Training Job Creation
- Differences in Training Code Adaptation
- Differences in Built-in Training Engines
Differences in Training Job Creation
- In earlier versions, you can create a training job using Algorithm Management, Frequently-used, and Custom.
- In the new version, you can create a training job using Custom algorithmor My algorithm.
The new version reorganizes the algorithms to help you find them more easily. Existing training jobs are not affected.
- The saved algorithms in Algorithm Management in the old version are in My algorithm in the new version.
- The Frequently-used in the old version is the Custom algorithm in the new version. Select Preset image for Boot Mode when you create jobs using the new version.
- The Custom in the old version is the Custom algorithm in the new version. Select Custom image for Boot Mode when you create jobs using the new version.
Differences in Training Code Adaptation
In the old version, you are required to configure data input and output as follows:
# Parse CLI parameters. import argparse parser = argparse.ArgumentParser(description='MindSpore Lenet Example') parser.add_argument('--data_url', type=str, default="./Data", help='path where the dataset is saved') parser.add_argument('--train_url', type=str, default="./Model", help='if is test, must provide\ path where the trained ckpt file') args = parser.parse_args() ... # Download data to your local container. In the code, local_data_path specifies the training input path. mox.file.copy_parallel(args.data_url, local_data_path) ... # Upload the local container data to the OBS path. mox.file.copy_parallel(local_output_path, args.train_url)
In the new version, you only need to configure training input and output. In the code, arg.data_url and arg.train_url are used as local paths. For details, see Developing a Custom Script.
# Parse CLI parameters. import argparse parser = argparse.ArgumentParser(description='MindSpore Lenet Example') parser.add_argument('--data_url', type=str, default="./Data", help='path where the dataset is saved') parser.add_argument('--train_url', type=str, default="./Model", help='if is test, must provide\ path where the trained ckpt file') args = parser.parse_args() ... # The downloaded code does not need to be set. Use data_url and train_url for data training and output. # Download data to your local container. In the code, local_data_path specifies the training input path. #mox.file.copy_parallel(args.data_url, local_data_path) ... # Upload the local container data to the OBS path. #mox.file.copy_parallel(local_output_path, args.train_url)
Differences in Built-in Training Engines
- In the new version, MoXing 2.0.0 or later is installed by default for built-in training engines.
- In the new version, Python 3.7 or later is used for built-in training engines.
- In the new image, the default home directory has been changed from /home/work to /home/ma-user. Check whether the training code contains hard coding of /home/work.
- Built-in training engines are different between the old and new versions. Commonly used built-in training engines have been upgraded in the new version.
To use a training engine in the old version, switch to the old version. Table 1 lists the differences between the built-in training engines in the old and new versions.
Table 1 Differences between the built-in training engines in the old and new versions Runtime Environment
Built-in Training Engine and Version
Old Version
New Version
TensorFlow
TensorFlow-1.8.0
√
x
TensorFlow-1.13.1
√
Coming soon
TensorFlow-2.1.0
√
√
MXNet
MXNet-1.2.1
√
x
Caffe
Caffe-1.0.0
√
x
Spark MLlib
Spark-2.3.2
√
x
Ray
Ray-0.7.4
√
x
XGBoost with scikit-learn
XGBoost-0.80-Sklearn-0.18.1
√
x
PyTorch
PyTorch-1.0.0
√
x
PyTorch-1.3.0
√
x
PyTorch-1.4.0
√
x
PyTorch-1.8.0
x
√
MPI
MindSpore-1.3.0
x
√
Horovod
Horovod_0.20.0-TensorFlow_2.1.0
x
√
horovod_0.22.1-pytorch_1.8.0
x
√
MindSpore-GPU
MindSpore-1.1.0
√
x
MindSpore-1.2.0
√
x
Functional Consulting FAQs
- What Are the Format Requirements for Algorithms Imported from a Local Environment?
- What Are the Solutions to Underfitting?
- What Are the Precautions for Switching Training Jobs from the Old Version to the New Version?
- How Do I Obtain a Trained ModelArts Model?
- How Do I Set the Runtime Environment of the AI Engine Scikit_Learn 0.18.1?
- Must the Hyperparameters Optimized Using a TPE Algorithm Be Categorical?
- What Is TensorBoard Used for in Model Visualization Jobs?
- How Do I Obtain RANK_TABLE_FILE on ModelArts for Distributed Training?
- How Do I Obtain the CUDA and cuDNN Versions of a Custom Image?
- How Do I Obtain a MoXing Installation File?
- In a Multi-Node Training, the TensorFlow PS Node Functioning as a Server Will Be Continuously Suspended. How Does ModelArts Determine Whether the Training Is Complete? Which Node Is a Worker?
- How Do I Install MoXing for a Custom Image of a Training Job?
- An IAM User Cannot Select an Existing SFS Turbo File System When Using a Dedicated Resource Pool to Create a Training Job
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbotmore