Help Center/ ModelArts/ Model Training/ Model Training Workflow
Updated on 2026-05-30 GMT+08:00

Model Training Workflow

The process of developing an AI model is referred to as modeling, which typically involves two stages:

  • Development: This involves preparing and configuring the environment and debugging code to ensure it is ready for deep learning training. It is recommended that you debug code within the ModelArts development environment.
  • Experimentation: This stage focuses on fine-tuning datasets and adjusting hyperparameters. Through multiple rounds of experimentation, you can train a model that meets your performance goals. It is recommended that you conduct these experiments using ModelArts training jobs.

These two processes are interchangeable. For example, once the code stabilizes in the development stage, the workflow enters the experimentation stage to iterate on the model by continuously tuning hyperparameters. Conversely, if you identify a potential optimization for training performance during the experimentation stage, you can return to the development stage to optimize your code.

Figure 1 Model development workflow

ModelArts provides model training capabilities, allowing you to monitor training progress and continuously tune model parameters. You can also select resource pools of different specifications based on your data requirements for model training.

Follow the guidance below to train models on ModelArts.

Table 1 Model training workflow

Task

Subtask

Description

Making preparations

Prepare training code.

The essential elements for model training include training code, a training framework (image), and training data.

Training code consists of the boot file or command and training dependency packages.

Prepare a training image.

Model training supports multiple image sources. For details, see Preparing a Model Training Image.
  • ModelArts offers mainstream preset images for model training, ready for immediate use.
  • If the preset images do not meet your needs, create a custom image.

Prepare training data.

In addition to training datasets, training data can also include predictive models. Prepare your data before creating a training job.
  • If the data is ready for use without further processing, upload it directly to an OBS bucket. Specify the OBS path as the input parameter when creating the training job.
  • If your dataset is unlabeled or requires further preprocessing, import it into the ModelArts Data Management module. Select the dataset from this module as the input parameter when creating the training job.

Creating an algorithm

Create an algorithm.

Before creating a production training job, you must prepare your own algorithm or subscribe to an algorithm from AI Gallery.

Creating a production training job

Use basic training features.

Use advanced training features.

ModelArts supports the following advanced training features:

  • Incremental learning
  • Distributed training
  • Training acceleration
  • High training reliability

Viewing training results and logs

View training job details.

During or after a training job, you can view parameter settings, job events, and other details on the training job details page.

View training job logs.

Training logs record the execution process and exception information. You can use these logs to locate issues occurred during job execution.

Table 2 Training job creation methods

Creation Method

Use Case

Preset images

Use this method if you have developed your algorithm locally using a mainstream framework.

Custom images

Use this method if your algorithm relies on a non-mainstream framework. You can create an image using your algorithm and use the image to create training jobs.

Existing algorithms

Use this method if you want to use algorithms already managed in the Algorithm Management module, including those you created yourself or those subscribed to from AI Gallery.

AI Gallery algorithms

Use this method if you want to leverage ready-to-use algorithms. You can subscribe to algorithms from AI Gallery to quickly create training jobs.