Model Training Process

AI modeling involves two stages:

Development: Prepare and configure the environment, and debug code for training based on deep learning. ModelArts DevEnviron is recommended for code debugging.
Experiment: Optimize the datasets and hyperparameters, and obtain an ideal model through multiple rounds of experiments. The ModelArts training platform is recommended for training.

In the two stages, code is designed, developed and tested in repeated cycles. In the development stage, when the code becomes stable, the modeling process enters the experiment stage, during which hyperparameters are continuously optimized to iterate the model. In the experiment stage, when the training performance can be optimized, the modeling process returns to the development stage for optimizing code.

Figure 1 Model development process

ModelArts provides model training, which allows you to view training results and tune model parameters based on the training results. You can select resource pools with different instance flavors for model training.

To train a model on ModelArts Standard, follow these steps:

Figure 2 ModelArts Standard model training process

**Table 1** ModelArts Standard model training process
Task	Subtask	Description
Making preparations	Preparing training code	Model training includes training code, training framework, and training data. Training code contains the boot file or command and dependency package of a training job. To use a preset image to create a training job, develop training code by referring to Developing Code for Training Using a Preset Image. To use a custom image to create a training job, develop training code by referring to Developing Code for Training Using a Custom Image.
	Preparing a training image	There are multiple training image sources. For details, see Preparing a Model Training Image. ModelArts Standard offers mainstream preset images for model training, ready for immediate use. If the preset images do not meet your needs, create custom images.
	Preparing training data	Before training, prepare necessary data, which can be datasets or predictive models. Upload your training data to OBS if it does not need further processing. To create a training job, enter the OBS bucket path directly as the input parameter path. Import your unlabeled or unpreprocessed training dataset to ModelArts data management for processing. To create a training job, choose your dataset in data management as the input parameter.
Creating a debug training job	Creating a debug training job	Before model training, debug your code. ModelArts offers multiple methods for creating a debug training job. With ModelArts, you can easily access JupyterLab in the cloud without worrying about environment installation or configuration. After enabling remote SSH, you can remotely access a training job for debugging from a local IDE. This method does not affect your coding habits. Debugged code can be used for production training at zero cost. ModelArts supports local IDE PyCharm. For details, see Using PyCharm Toolkit to Create and Debug a Training Job.
Creating an algorithm	Creating an algorithm	Before creating a production training job, create an algorithm or subscribe to an algorithm from AI Gallery.
Creating a production training job	Using basic training features	You can create a training job on the ModelArts Standard console. There are multiple algorithm types and training frameworks for you to select. For details, see Table 2. ModelArts Standard also allows you to create training jobs using APIs. For details, see Using PyTorch to Create a Training Job (New-Version Training).
Creating a production training job	Using advanced training features	ModelArts Standard supports the following advanced training features: Incremental learning Distributed training Training acceleration High training reliability
Viewing training results and logs	Viewing training job details	You can view a training job's parameter settings and events on the job details page at any time, whether the job is running or has completed.
Viewing training results and logs	Viewing training job logs	Training logs track the execution and any errors that occur during training job runs. You can view these logs to identify and troubleshoot issues that cause jobs to fail.

**Table 2** Methods of creating a training job
Creation Method	Description
Using a preset image to create a training job	If you have used some mainstream images to develop algorithms locally, you can select a mainstream image and create a training job to build a model.
Using a custom image to create a training job	To use a non-mainstream image, create a custom algorithm image and then use it to create a training job.
Using an existing algorithm to create a training job	In Algorithm Management, you can manage your created algorithms and those subscribed from AI Gallery. This allows you to quickly create training jobs and build models using these algorithms.
Using a subscribed algorithm to create a training job	You can subscribe to algorithms in AI Gallery to quickly create training jobs and build models.