Updated on 2024-10-29 GMT+08:00

Model Training Process

AI modeling involves two stages:

  • Development: Prepare and configure the environment, and debug code for training based on deep learning. ModelArts DevEnviron is recommended for code debugging.
  • Experiment: Optimize the datasets and hyperparameters, and obtain an ideal model through multiple rounds of experiments. The ModelArts training platform is recommended for training.

In the two stages, code is designed, developed and tested in repeated cycles. In the development stage, when the code becomes stable, the modeling process enters the experiment stage, during which hyperparameters are continuously optimized to iterate the model. In the experiment stage, when the training performance can be optimized, the modeling process returns to the development stage for optimizing code.

Figure 1 Model development process

ModelArts provides model training, which allows you to view training results and tune model parameters based on the training results. You can select resource pools with different instance flavors for model training.

To train a model on ModelArts Standard, follow these steps:

Figure 2 ModelArts Standard model training process
Table 1 ModelArts Standard model training process

Task

Subtask

Description

Making preparations

Preparing training code

Model training includes training code, training framework, and training data.

Training code contains the boot file or command and dependency package of a training job.

Preparing a training image

There are multiple training image sources. For details, see Preparing a Model Training Image.
  • ModelArts Standard offers mainstream preset images for model training, ready for immediate use.
  • If the preset images do not meet your needs, create custom images.

Preparing training data

Before training, prepare necessary data, which can be datasets or predictive models.
  • Upload your training data to OBS if it does not need further processing. To create a training job, enter the OBS bucket path directly as the input parameter path.
  • Import your unlabeled or unpreprocessed training dataset to ModelArts data management for processing. To create a training job, choose your dataset in data management as the input parameter.

Creating a debug training job

Creating a debug training job

Before model training, debug your code. ModelArts offers multiple methods for creating a debug training job.

  • With ModelArts, you can easily access JupyterLab in the cloud without worrying about environment installation or configuration.
  • After enabling remote SSH, you can remotely access a training job for debugging from a local IDE. This method does not affect your coding habits. Debugged code can be used for production training at zero cost. ModelArts supports local IDE PyCharm. For details, see Using PyCharm Toolkit to Create and Debug a Training Job.

Creating an algorithm

Creating an algorithm

Before creating a production training job, create an algorithm or subscribe to an algorithm from AI Gallery.

Creating a production training job

Using basic training features

Using advanced training features

ModelArts Standard supports the following advanced training features:

  • Incremental learning
  • Distributed training
  • Training acceleration
  • High training reliability

Viewing training results and logs

Viewing training job details

You can view a training job's parameter settings and events on the job details page at any time, whether the job is running or has completed.

Viewing training job logs

Training logs track the execution and any errors that occur during training job runs. You can view these logs to identify and troubleshoot issues that cause jobs to fail.

Table 2 Methods of creating a training job

Creation Method

Description

Using a preset image to create a training job

If you have used some mainstream images to develop algorithms locally, you can select a mainstream image and create a training job to build a model.

Using a custom image to create a training job

To use a non-mainstream image, create a custom algorithm image and then use it to create a training job.

Using an existing algorithm to create a training job

In Algorithm Management, you can manage your created algorithms and those subscribed from AI Gallery. This allows you to quickly create training jobs and build models using these algorithms.

Using a subscribed algorithm to create a training job

You can subscribe to algorithms in AI Gallery to quickly create training jobs and build models.