LlaMA-Factory
This section describes the YAML configuration file and parameters for training. You can choose parameters as required.
YAML File Configuration
Modify the YAML file.

|
Parameter |
Example Value |
Description |
|---|---|---|
|
backend_config.training.dataset |
|
(Mandatory) Dataset name registered in the dataset_info.json file. If you use custom data, configure the dataset_info.json file by referring to README_zh.md and store the dataset in the same directory as the dataset_info.json file. |
|
backend_config.training.dataset_dir |
/home/ma-user/AscendFactory/third-party/LLaMA-Factory/data |
(Mandatory) Specify this parameter as a hyperparameter in the ascendfactory-cli train XXX command. Dataset included in the LlaMA-Factory code package: ${INSTALL_DIR}/third-party/LLaMA-Factory/data, where ${INSTALL_DIR} is related to the setting in install.sh. Custom data: AscendFactory/data |
|
backend_config.training.model_name_or_path |
/home/ma-user/xxx/model/Qwen2-72B |
(Mandatory) Specify this parameter as a hyperparameter in the ascendfactory-cli train XXX command. Path for storing the tokenizer and Hugging Face weight files. Change it based on the actual situation. |
|
backend_config.training.adapter_name_or_path |
/home/ma-user/xxx/sft_lora/ |
Unmerged LoRA weights generated after LoRA training is complete. Transfer the LoRA-based fine-tuned model's weight file during incremental training. |
|
af_output_dir |
/home/ma-user/xxx/saves/qwen2-7b/sft_lora/ |
(Mandatory) Specify this parameter as a hyperparameter in the ascendfactory-cli train XXX command. Specifies the output directory. The model parameters and log files generated during training are stored in this directory. |
|
backend_config.training.train_from_scratch |
false |
Indicates whether the model is trained from scratch. The default value is false.
|
|
backend_config.training.do_train |
true |
Specifies whether to execute the training step of the script. If this parameter is set to true, model training is performed. If this parameter is set to false, model training is not performed. |
|
backend_config.training.cutoff_len |
4096 |
Maximum length of text processing. Change the value as required. |
|
backend_config.training.packing |
true |
(Optional) Using static data length fills incomplete data up to the maximum text processing size. With dynamic data length, this parameter gets removed. |
|
backend_config.training.deepspeed |
- |
(Optional) ZeRO optimization strategy. The options are as follows:
|
|
backend_config.training.stage |
sft |
Specifies the current training phase. Currently, only sft is supported. sft indicates supervised fine-tuning. |
|
backend_config.training.finetuning_type |
full |
Specifies the fine-tuning strategy. The value can be full or lora. If this parameter is set to full, the entire model is fine-tuned. |
|
backend_config.training.lora_target |
all |
Target modules that adopt LoRA. The default value is all. |
|
backend_config.training.template |
qwen |
Specifies the template. If this parameter is set to qwen, the QWEN template is used for training. |
|
backend_config.training.max_samples |
50000 |
Specifies the maximum number of samples used during training. If this parameter is set, only the specified number of samples are used in the training process, and other samples are ignored. This can be used to control the scale and computing requirements of the training process. |
|
backend_config.training.num_train_epochs backend_config.training.max_steps |
5 5000 |
Number of training epochs. Change it based on the actual situation. An epoch is a process where all training samples are trained once. Number of training steps. (Either one). If both parameters are set, only max_steps takes effect. |
|
backend_config.training.overwrite_cache |
true |
Specifies whether to overwrite the cache. If this parameter is set to overwrite_cache, the cache is overwritten during training. This is usually used when the dataset changes or the cache needs to be regenerated. |
|
backend_config.training.preprocessing_num_workers |
16 |
Number of worker threads for preprocessing data. As the number of threads increases, the preprocessing speed increases, but the memory usage also increases. |
|
backend_config.training.per_device_train_batch_size |
1 |
Batch size for training on each device. |
|
backend_config.training.gradient_accumulation_steps |
8 |
Number of gradient accumulation steps. This can increase the batch size without increasing the memory consumption. |
|
backend_config.training.logging_steps |
2 |
Number of interval steps for outputting logs during model training. Logs include information such as the training progress, learning rate, and loss value. You are advised to set this parameter. |
|
backend_config.training.save_steps |
5000 |
Number of interval steps for saving a model during model training. Saved models can be used for subsequent training.
Number of saved model versions = max_steps/save_steps +1 |
|
backend_config.training.save_total_limit |
0 |
Controls the number of times that the weight version is saved.
|
|
backend_config.training.plot_loss |
true |
Specifies whether to draw the loss curve. If this parameter is set to true, the loss curve is saved as an image after the training is complete. |
|
backend_config.training.overwrite_output_dir |
true |
Specifies whether to overwrite the output directory. The default value is true. If this parameter is set to true, the output directory is cleared at the beginning of each training. If this parameter is set to false, the system loads the most recent training weights saved after an interruption and resumes training. Set resume_from_checkpoint to enable fast fault recovery. |
|
backend_config.training.resume_from_checkpoint |
{output_dir}/checkpoint-xx |
Resumable training: Resumes an unexpectedly stopped training job by loading weights from the checkpoint-xx folder. This setting lets you continue training from where it left off. This parameter takes precedence over overwrite_output_dir. |
|
backend_config.training.bf16/fp16 |
true |
(Optional) Precision format, which defaults to bf16. |
|
backend_config.training.learning_rate |
2.0e-5 |
Specifies the learning rate. |
|
backend_config.training.disable_gradient_checkpointing |
true |
Specifies whether to disable gradient checkpointing. By default, gradient checkpointing is enabled. It saves the model's state during deep learning training for easy restoration later. While this reduces memory usage, particularly with large models, it may impact performance. True: Gradient checkpointing is disabled. |
|
backend_config.training.include_tokens_per_second backend_config.training.include_num_input_tokens_seen |
true |
Tokens processed per second and input tokens encountered during training. This parameter is used to measure the performance. |
|
backend_config.lora_merge.export_dir |
${af_output_dir}/lora_merged |
Specifies where to save the combined weights after merging LoRA weights with the original model during fine-tuning. |
|
backend_config.training.recompute_layers_ratio |
Float value range: [0,1] |
Specifies how many layers use recomputation. If you have enough video memory, adjusting this value helps maximize unused memory for better performance. It will not work if disable_gradient_checkpointing is set true. |
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot