Updated on 2025-11-04 GMT+08:00

LlaMA-Factory

This section describes the YAML configuration file and parameters for training. You can choose parameters as required.

YAML File Configuration

Modify the YAML file.

Table 1 Model training script parameters

Parameter

Example Value

Description

backend_config.training.dataset

  • Instruction supervision fine-tuning: alpaca_en_demo
  • Multimodal dataset (image): mllm_demo,identity

(Mandatory) Dataset name registered in the dataset_info.json file. If you use custom data, configure the dataset_info.json file by referring to README_zh.md and store the dataset in the same directory as the dataset_info.json file.

backend_config.training.dataset_dir

/home/ma-user/AscendFactory/third-party/LLaMA-Factory/data

(Mandatory) Specify this parameter as a hyperparameter in the ascendfactory-cli train XXX command.

Dataset included in the LlaMA-Factory code package: ${INSTALL_DIR}/third-party/LLaMA-Factory/data, where ${INSTALL_DIR} is related to the setting in install.sh.

Custom data: AscendFactory/data

backend_config.training.model_name_or_path

/home/ma-user/xxx/model/Qwen2-72B

(Mandatory) Specify this parameter as a hyperparameter in the ascendfactory-cli train XXX command.

Path for storing the tokenizer and Hugging Face weight files. Change it based on the actual situation.

backend_config.training.adapter_name_or_path

/home/ma-user/xxx/sft_lora/

Unmerged LoRA weights generated after LoRA training is complete. Transfer the LoRA-based fine-tuned model's weight file during incremental training.

af_output_dir

/home/ma-user/xxx/saves/qwen2-7b/sft_lora/

(Mandatory) Specify this parameter as a hyperparameter in the ascendfactory-cli train XXX command.

Specifies the output directory. The model parameters and log files generated during training are stored in this directory.

backend_config.training.train_from_scratch

false

Indicates whether the model is trained from scratch. The default value is false.

  • true: The model is trained from scratch, and the weights are not loaded.
  • false: The model is incrementally trained from the position where the weights are loaded.

backend_config.training.do_train

true

Specifies whether to execute the training step of the script. If this parameter is set to true, model training is performed. If this parameter is set to false, model training is not performed.

backend_config.training.cutoff_len

4096

Maximum length of text processing. Change the value as required.

backend_config.training.packing

true

(Optional) Using static data length fills incomplete data up to the maximum text processing size. With dynamic data length, this parameter gets removed.

backend_config.training.deepspeed

-

(Optional) ZeRO optimization strategy. The options are as follows:

  • ds_config_zero0.json
  • ds_config_zero1.json
  • ds_config_zero2.json
  • ds_config_zero3.json
  • ds_config_zero2_offload.json
  • ds_config_zero3_offload.json

backend_config.training.stage

sft

Specifies the current training phase. Currently, only sft is supported.

sft indicates supervised fine-tuning.

backend_config.training.finetuning_type

full

Specifies the fine-tuning strategy. The value can be full or lora.

If this parameter is set to full, the entire model is fine-tuned.

backend_config.training.lora_target

all

Target modules that adopt LoRA. The default value is all.

backend_config.training.template

qwen

Specifies the template. If this parameter is set to qwen, the QWEN template is used for training.

backend_config.training.max_samples

50000

Specifies the maximum number of samples used during training. If this parameter is set, only the specified number of samples are used in the training process, and other samples are ignored. This can be used to control the scale and computing requirements of the training process.

backend_config.training.num_train_epochs

backend_config.training.max_steps

5

5000

Number of training epochs. Change it based on the actual situation. An epoch is a process where all training samples are trained once.

Number of training steps.

(Either one). If both parameters are set, only max_steps takes effect.

backend_config.training.overwrite_cache

true

Specifies whether to overwrite the cache. If this parameter is set to overwrite_cache, the cache is overwritten during training. This is usually used when the dataset changes or the cache needs to be regenerated.

backend_config.training.preprocessing_num_workers

16

Number of worker threads for preprocessing data. As the number of threads increases, the preprocessing speed increases, but the memory usage also increases.

backend_config.training.per_device_train_batch_size

1

Batch size for training on each device.

backend_config.training.gradient_accumulation_steps

8

Number of gradient accumulation steps. This can increase the batch size without increasing the memory consumption.

backend_config.training.logging_steps

2

Number of interval steps for outputting logs during model training. Logs include information such as the training progress, learning rate, and loss value. You are advised to set this parameter.

backend_config.training.save_steps

5000

Number of interval steps for saving a model during model training. Saved models can be used for subsequent training.

  • If the parameter value is greater than or equal to max_steps, only the last version of the model that has been trained for TRAIN_ITERS times is saved.
  • If the parameter value is less than max_steps, the model version is saved every save_steps times.

Number of saved model versions = max_steps/save_steps

+1

backend_config.training.save_total_limit

0

Controls the number of times that the weight version is saved.

  • It has no impact if left unset or given a value of zero or less.
  • The parameter value must be less than or equal to max_steps/save_steps + 1.
  • If the parameter value is greater than 1, the number of model version saving times is the same as the value of save_total_limit.

backend_config.training.plot_loss

true

Specifies whether to draw the loss curve. If this parameter is set to true, the loss curve is saved as an image after the training is complete.

backend_config.training.overwrite_output_dir

true

Specifies whether to overwrite the output directory. The default value is true. If this parameter is set to true, the output directory is cleared at the beginning of each training. If this parameter is set to false, the system loads the most recent training weights saved after an interruption and resumes training. Set resume_from_checkpoint to enable fast fault recovery.

backend_config.training.resume_from_checkpoint

{output_dir}/checkpoint-xx

Resumable training: Resumes an unexpectedly stopped training job by loading weights from the checkpoint-xx folder. This setting lets you continue training from where it left off. This parameter takes precedence over overwrite_output_dir.

backend_config.training.bf16/fp16

true

(Optional) Precision format, which defaults to bf16.

backend_config.training.learning_rate

2.0e-5

Specifies the learning rate.

backend_config.training.disable_gradient_checkpointing

true

Specifies whether to disable gradient checkpointing. By default, gradient checkpointing is enabled. It saves the model's state during deep learning training for easy restoration later. While this reduces memory usage, particularly with large models, it may impact performance. True: Gradient checkpointing is disabled.

backend_config.training.include_tokens_per_second

backend_config.training.include_num_input_tokens_seen

true

Tokens processed per second and input tokens encountered during training. This parameter is used to measure the performance.

backend_config.lora_merge.export_dir

${af_output_dir}/lora_merged

Specifies where to save the combined weights after merging LoRA weights with the original model during fine-tuning.

backend_config.training.recompute_layers_ratio

Float value range: [0,1]

Specifies how many layers use recomputation. If you have enough video memory, adjusting this value helps maximize unused memory for better performance.

It will not work if disable_gradient_checkpointing is set true.