Model Training Workflow

The process of developing an AI model is referred to as modeling, which typically involves two stages:

Development: This involves preparing and configuring the environment and debugging code to ensure it is ready for deep learning training. It is recommended that you debug code within the ModelArts development environment.
Experimentation: This stage focuses on fine-tuning datasets and adjusting hyperparameters. Through multiple rounds of experimentation, you can train a model that meets your performance goals. It is recommended that you conduct these experiments using ModelArts training jobs.

These two processes are interchangeable. For example, once the code stabilizes in the development stage, the workflow enters the experimentation stage to iterate on the model by continuously tuning hyperparameters. Conversely, if you identify a potential optimization for training performance during the experimentation stage, you can return to the development stage to optimize your code.

Figure 1 Model development workflow
Click to enlarge

ModelArts provides model training capabilities, allowing you to monitor training progress and continuously tune model parameters. You can also select resource pools of different specifications based on your data requirements for model training.

Follow the guidance below to train models on ModelArts.

**Table 1** Model training workflow
Task	Subtask	Description
Making preparations	Prepare training code.	The essential elements for model training include training code, a training framework (image), and training data. Training code consists of the boot file or command and training dependency packages. When using a preset image to create a training job, develop training code by referring to Developing Code for Training Using a Preset Image. When using a custom image to create a training job, develop training code by referring to Developing Code for Training Using a Custom Image.
	Prepare a training image.	Model training supports multiple image sources. For details, see Preparing a Model Training Image. ModelArts offers mainstream preset images for model training, ready for immediate use. If the preset images do not meet your needs, create a custom image.
	Prepare training data.	In addition to training datasets, training data can also include predictive models. Prepare your data before creating a training job. If the data is ready for use without further processing, upload it directly to an OBS bucket. Specify the OBS path as the input parameter when creating the training job. If your dataset is unlabeled or requires further preprocessing, import it into the ModelArts Data Management module. Select the dataset from this module as the input parameter when creating the training job.
Creating an algorithm	Create an algorithm.	Before creating a production training job, you must prepare your own algorithm or subscribe to an algorithm from AI Gallery.
Creating a production training job	Use basic training features.	You can create a training job on the ModelArts console. Multiple creation methods are available depending on the algorithm type and training framework. For details, see Table 2. ModelArts also allows you to create training jobs using APIs. For details, see Using PyTorch to Create a Training Job (New-Version Training).
Creating a production training job	Use advanced training features.	ModelArts supports the following advanced training features: Incremental learning Distributed training Training acceleration High training reliability
Viewing training results and logs	View training job details.	During or after a training job, you can view parameter settings, job events, and other details on the training job details page.
Viewing training results and logs	View training job logs.	Training logs record the execution process and exception information. You can use these logs to locate issues occurred during job execution.

**Table 2** Training job creation methods
Creation Method	Use Case
Preset images	Use this method if you have developed your algorithm locally using a mainstream framework.
Custom images	Use this method if your algorithm relies on a non-mainstream framework. You can create an image using your algorithm and use the image to create training jobs.
Existing algorithms	Use this method if you want to use algorithms already managed in the Algorithm Management module, including those you created yourself or those subscribed to from AI Gallery.
AI Gallery algorithms	Use this method if you want to leverage ready-to-use algorithms. You can subscribe to algorithms from AI Gallery to quickly create training jobs.

Next topic: Preparing Model Training Code

Feedback

Was this page helpful?

Helpful Not helpful

Provide feedback

Thank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.

The system is busy. Please try again later.

For any further questions, feel free to contact us through the chatbot.

Chatbot