Updated on 2025-07-28 GMT+08:00

Creating a DeepSeek Model Training Job

Pre-training

To create a DeepSeek model pre-training job, perform the following steps:

  1. Log in to ModelArts Studio Large Model Deveopment Platform. In the My Spaces area, click the required workspace.
    Figure 1 My Spaces
  2. In the navigation pane, choose Model Development > Model Training. Click Create Training Job in the upper right corner.
  3. On the Create Training Job page, set training parameters by referring to Table 1.
    Table 1 Parameters for pre-training the DeepSeek model

    Category

    Training Parameter

    Description

    Training configuration

    Select Model

    You can modify the following information:

    • Sources: Select Model Square.
    • Type: Select NLP and select the base model and version (DeepSeek-V3-32K or DeepSeek-R1-32K) used for training.

    Type

    Select Pre-training.

    Advanced Settings > checkpoints

    checkpoints: During a model training job, checkpoints are used to store the model weight and status.

    • Close: After this function is disabled, checkpoints are not saved and training cannot be resumed based on checkpoints.
    • Automatic: All checkpoints during training are automatically saved.
    • Custom: A specified number of checkpoints are saved based on the settings.

    Training Parameters

    epochs

    Dataset iterations during model training.

    learning_rate

    It determines the step size at each training iteration. If the learning rate is too high, the model never converges. If the learning rate is too low, the model can take a long time to converge.

    batch_size

    It controls the number of samples used in one iteration of model training. A larger batch size leads to more stable gradients but consumes more GPU memory. This may cause out-of-memory (OOM) errors due to hardware limitations and extend the training time.

    sequence_length

    It specifies the maximum length of a single training data record. Data that exceeds the length will be truncated during training.

    warmup

    It controls the duration of the warm-up phase relative to the entire training process. During this phase, the learning rate starts low, allowing the model to stabilize gradually. Once stabilized, the model is then trained at a preset higher learning rate, which accelerates convergence and enhances performance.

    lr_decay_ratio

    It controls the learning rate to ensure the model converges more stably. The minimum learning rate will not be lower than the product of the Learning Rate and this parameter. If the learning rate does not decay, set this parameter to 1.

    weight_decay

    It is a regularization method that reduces the size of model parameters to prevent model overfitting and improve the model generalization capability.

    checkpoint_save_strategy

    It can be set to save_checkpoint_steps or save_checkpoint_epoch. It specifies whether to save checkpoint files based on the number of iteration steps or based on the number of epochs during training.

    save_checkpoint_steps or save_checkpoint_epoch

    save_checkpoint_steps specifies the number of training steps at which the model checkpoint file is saved during training.

    save_checkpoint_epoch specifies the number of training epochs during which the model checkpoint file is saved.

    checkpoints

    During a model training job, checkpoints are used to store the model weight and status.

    • Close: After this function is disabled, checkpoints are not saved and training cannot be resumed based on checkpoints.
    • Automatic: All checkpoints during training are automatically saved.
    • Custom: A specified number of checkpoints are saved based on the settings.

    Training data configuration

    Training set

    Select the training dataset.

    Resource Disposition

    Billing model

    Billing mode of the current training job.

    Training Unit

    Select the number of training units required for training the model.

    The minimum number of training units required for the current training is displayed.

    Single-instance training units

    Select the number of training units for a single instance.

    Number of instances

    Select the number of instances.

    Subscription reminder

    Subscription reminder

    After this function is enabled, the system sends SMS or email notifications to users when the task status is updated.

    publish model

    Enable automatic publishing

    After this function is enabled, the final product generated after model training is automatically published as a workspace asset so that the model can be compressed, deployed, evaluated, and shared with other workspaces.

    Basic Information

    Name

    Name of a training job.

    Description

    Description of the training job.

  4. Click Create Now.
  5. After a fine-tuning job is created, the Model Training page is displayed. You can view the job status at any time.

Full Fine-Tuning

To create a DeepSeek model full fine-tuning job, perform the following steps:

  1. Log in to ModelArts Studio Large Model Deveopment Platform. In the My Spaces area, click the required workspace.
    Figure 2 My Spaces
  2. In the navigation pane, choose Model Development > Model Training. Click Create Training Job in the upper right corner.
  3. On the Create Training Job page, set training parameters by referring to Table 2.
    Table 2 Parameters for full fine-tuning the DeepSeek model

    Category

    Training Parameter

    Description

    Training configuration

    Select Model

    You can modify the following information:

    • Sources: Select Model Square.
    • Type: Select NLP and select the base model and version (DeepSeek-V3-32K or DeepSeek-R1-32K) used for training.

    Type

    Select Supervised fine-tuning.

    Training Objective

    Select Full fine-tuning.

    • Full fine-tuning: When supervised fine-tuning is performed on a model, all parameters of the model are updated.

    Advanced Settings

    During a model training job, checkpoints are used to store the model weight and status.

    • Close: After this function is disabled, checkpoints are not saved and training cannot be resumed based on checkpoints.
    • Automatic: All checkpoints during training are automatically saved.
    • Custom: A specified number of checkpoints are saved based on the settings.

    Training Parameters

    epochs

    Dataset iterations during model training.

    learning_rate

    It determines the step size at each training iteration. If the learning rate is too high, the model never converges. If the learning rate is too low, the model can take a long time to converge.

    batch_size

    It controls the number of samples used in one iteration of model training. A larger batch size leads to more stable gradients but consumes more GPU memory. This may cause out-of-memory (OOM) errors due to hardware limitations and extend the training time.

    sequence_length

    It specifies the maximum length of a single training data record. Data that exceeds the length will be truncated during training.

    warmup

    It controls the duration of the warm-up phase relative to the entire training process. During this phase, the learning rate starts low, allowing the model to stabilize gradually. Once stabilized, the model is then trained at a preset higher learning rate, which accelerates convergence and enhances performance.

    lr_decay_ratio

    It controls the learning rate to ensure the model converges more stably. The minimum learning rate will not be lower than the product of the Learning Rate and this parameter. If the learning rate does not decay, set this parameter to 1.

    weight_decay

    It is a regularization method that reduces the size of model parameters to prevent model overfitting and improve the model generalization capability.

    checkpoint_save_strategy

    It can be set to save_checkpoint_steps or save_checkpoint_epoch. It specifies whether to save checkpoint files based on the number of iteration steps or based on the number of epochs during training.

    save_checkpoint_steps or save_checkpoint_epoch

    save_checkpoint_steps specifies the number of training steps at which the model checkpoint file is saved during training.

    save_checkpoint_epoch specifies the number of training epochs during which the model checkpoint file is saved.

    checkpoints

    During a model training job, checkpoints are used to store the model weight and status.

    • Close: After this function is disabled, checkpoints are not saved and training cannot be resumed based on checkpoints.
    • Automatic: All checkpoints during training are automatically saved.
    • Custom: A specified number of checkpoints are saved based on the settings.

    Data configuration

    Training set

    Select the training dataset.

    Resource Disposition

    Billing model

    Billing mode of the current training job.

    Training Unit

    Select the number of training units required for training the model.

    The minimum number of training units required for the current training is displayed.

    Single-instance training units

    Select the number of training units for a single instance.

    Number of instances

    Select the number of instances.

    Subscription reminder

    Subscription reminder

    After this function is enabled, the system sends SMS or email notifications to users when the task status is updated.

    publish model

    Enable automatic publishing

    If this function is disabled, the model will be manually published to the model asset library after the training is complete.

    If this function is enabled, configure the visibility, model name, and description.

    Basic Information

    Name

    Name of a training job.

    Description

    Description of the training job.

    The default values of training parameters vary depending on the model. The default values displayed on the frontend page prevail.

  4. Click Create Now.
  5. After a fine-tuning job is created, the Model Training page is displayed. You can view the job status at any time.

LoRA Fine-Tuning

To create a DeepSeek model LoRA fine-tuning job, perform the following steps:

  1. Log in to ModelArts Studio Large Model Deveopment Platform. In the My Spaces area, click the required workspace.
    Figure 3 My Spaces
  2. In the navigation pane, choose Model Development > Model Training. Click Create Training Job in the upper right corner.
  3. On the Create Training Job page, set training parameters by referring to Table 3.
    Table 3 Parameters for DeepSeek model LoRA fine-tuning

    Category

    Training Parameter

    Description

    Training configuration

    Select Model

    You can modify the following information:

    • Sources: Select Model Square.
    • Type: Select NLP and select the base model and version (DeepSeek-V3-32K or DeepSeek-R1-32K) used for training.

    Type

    Select Supervised fine-tuning.

    Training Objective

    Select LoRA fine-tuning.

    • LoRA fine-tuning: During supervised fine-tuning, the pre-trained model's weight parameters are frozen. Low-rank decomposition is applied to the weight matrix in the self-attention module while retaining the original weights. During training, only the low-rank parameters are updated.

    Training Parameter

    epochs

    Dataset iterations during model training.

    learning_rate

    It determines the step size at each training iteration. If the learning rate is too high, the model never converges. If the learning rate is too low, the model can take a long time to converge.

    batch_size

    It controls the number of samples used in one iteration of model training. A larger batch size leads to more stable gradients but consumes more GPU memory. This may cause out-of-memory (OOM) errors due to hardware limitations and extend the training time.

    sequence_length

    It specifies the maximum length of a single training data record. Data that exceeds the length will be truncated during training.

    warmup

    It controls the duration of the warm-up phase relative to the entire training process. During this phase, the learning rate starts low, allowing the model to stabilize gradually. Once stabilized, the model is then trained at a preset higher learning rate, which accelerates convergence and enhances performance.

    lr_decay_ratio

    It controls the learning rate to ensure the model converges more stably. The minimum learning rate will not be lower than the product of the Learning Rate and this parameter. If the learning rate does not decay, set this parameter to 1.

    weight_decay

    Adds a penalty term related to the model weight size to the loss function, encouraging the model to keep small weights to prevent overfitting or overly complex models.

    Rank of the LoRA matrix

    In the LoRA matrix, the rank value is used to measure the complexity and information amount of the matrix. A larger value enhances the representation capability of the model, but increases the training duration. A smaller value reduces the number of parameters and the overfitting risk.

    Data configuration

    Training set

    Select the training dataset.

    Resource Disposition

    Billing model

    Billing mode of the current training job.

    Training Unit

    Select the number of training units required for training the model.

    The minimum number of training units required for the current training is displayed.

    Single-instance training units

    Select the number of training units for a single instance.

    Number of instances

    Select the number of instances.

    Subscription reminder

    Subscription reminder

    After this function is enabled, the system sends SMS or email notifications to users when the task status is updated.

    publish model

    Enable automatic publishing

    If this function is disabled, the model will be manually published to the model asset library after the training is complete.

    If this function is enabled, configure the visibility, model name, and description.

    Basic Information

    Name

    Name of a training job.

    Description

    Description of the training job.

    The default values of training parameters vary depending on the model. The default values displayed on the frontend page prevail.

  4. Click Create Now.
  5. After a fine-tuning job is created, the Model Training page is displayed. You can view the job status at any time.

QLoRA Fine-Tuning

To create a DeepSeek model full fine-tuning job, perform the following steps:

  1. Log in to ModelArts Studio Large Model Deveopment Platform. In the My Spaces area, click the required workspace.
    Figure 4 My Spaces
  2. In the navigation pane, choose Model Development > Model Training. Click Create Training Job in the upper right corner.
  3. On the Create Training Job page, set training parameters by referring to Table 4.
    Table 4 Parameters for DeepSeek model QLoRA fine-tuning

    Category

    Training Parameter

    Description

    Training configuration

    Select Model

    You can modify the following information:

    • Sources: Select Model Square.
    • Type: Select NLP and select the base model and version (DeepSeek-V3-32K or DeepSeek-R1-32K) used for training.

    Type

    Select Supervised fine-tuning.

    Training Objective

    Select QLoRA fine-tuning.

    • QLoRA fine-tuning: QLoRA is an efficient fine-tuning approach for LLMs. By combining quantization and low-rank adaptation, it significantly reduces GPU memory usage while maintaining model performance, making it suitable for resource-constrained environments.

    Training Parameter

    epochs

    Dataset iterations during model training.

    learning_rate

    It determines the step size at each training iteration. If the learning rate is too high, the model never converges. If the learning rate is too low, the model can take a long time to converge.

    batch_size

    It controls the number of samples used in one iteration of model training. A larger batch size leads to more stable gradients but consumes more GPU memory. This may cause out-of-memory (OOM) errors due to hardware limitations and extend the training time.

    sequence_length

    It specifies the maximum length of a single training data record. Data that exceeds the length will be truncated during training.

    warmup

    It controls the duration of the warm-up phase relative to the entire training process. During this phase, the learning rate starts low, allowing the model to stabilize gradually. Once stabilized, the model is then trained at a preset higher learning rate, which accelerates convergence and enhances performance.

    lr_decay_ratio

    It controls the learning rate to ensure the model converges more stably. The minimum learning rate will not be lower than the product of the Learning Rate and this parameter. If the learning rate does not decay, set this parameter to 1.

    weight_decay

    It is a regularization method that reduces the size of model parameters to prevent model overfitting and improve the model generalization capability.

    Rank of the LoRA matrix

    In the LoRA matrix, the rank value is used to measure the complexity and information amount of the matrix. A larger value enhances the representation capability of the model, but increases the training duration. A smaller value reduces the number of parameters and the overfitting risk.

    Data configuration

    Training set

    Select the training dataset.

    Resource Disposition

    Billing model

    Billing mode of the current training job.

    Training Unit

    Select the number of training units required for training the model.

    The minimum number of training units required for the current training is displayed.

    Single-instance training units

    Select the number of training units for a single instance.

    Number of instances

    Select the number of instances.

    Subscription reminder

    Subscription reminder

    After this function is enabled, the system sends SMS or email notifications to users when the task status is updated.

    publish model

    Enable automatic publishing

    If this function is disabled, the model will be manually published to the model asset library after the training is complete.

    If this function is enabled, configure the visibility, model name, and description.

    Basic Information

    Name

    Name of a training job.

    Description

    Description of the training job.

    The default values of training parameters vary depending on the model. The default values displayed on the frontend page prevail.

  4. Click Create Now.
  5. After a fine-tuning job is created, the Model Training page is displayed. You can view the job status at any time.