Yaml配置文件参数配置说明

本小节主要详细描述demo_yaml配置文件、配置参数说明，用户可根据实际自行选择其需要的参数。

表1 模型训练脚本参数
参数	示例值	参数说明
model_name_or_path	/home/ma-user/ws/tokenizers/Qwen2-72B	必须修改。加载tokenizer与Hugging Face权重时，对应的存放绝对或相对路径。请根据实际规划修改。
do_train	true	指示脚本执行训练步骤，用来控制是否进行模型训练的。如果设置为true，则会进行模型训练；如果设置为false，则不会进行模型训练。
cutoff_len	4096	文本处理时的最大长度，此处为4096，用户可根据自己要求适配。
packing	true	可选项。当选用静态数据长度时，可将不足于文本处理时的最大长度数据弥补到文本处理时的最大长度;当选用动态数据长度则去掉此参数。
deepspeed	examples/deepspeed/ds_z3_config.json	可选项。用于指定DeepSpeed的配置文件相对或绝对路径。DeepSpeed是一个开源库，用于加速深度学习训练。通过使用DeepSpeed，可以实现如混合精度训练、ZeRO内存优化等高级特性，以提高训练效率和性能
stage	sft	表示当前的训练阶段。可选择值：[pt、sft、rm、ppo、dpo]，pt代表预训练，sft代表指令监督微调，rm代表奖励模型训练，ppo代表PPO训练，dpo代表DPO训练。
finetuning_type	full	用于指定微调策略类型，可选择值【full、lora】如果设置为"full"，则对整个模型进行微调。这意味着在微调过程中，除了输出层外，模型的所有参数都将被调整以适应新的任务。
dataset	identity,alpaca_en_demo	【可选】注册在dataset_info.json文件数据集名称。如选用定义数据请参考准备数据（可选）配置dataset_info.json文件，并将数据集存放于dataset_info.json同目录下。
dataset_dir	/home/ma-user/ws/LLaMAFactory/LLaMA-Factory/data	【可选】dataset_info.json配置文件所属的绝对路径；如使用自定义数据集，yaml配置文件需添加此参数。
template	qwen	必须修改。用于指定模板。如果设置为"qwen"，则使用QWEN模板进行训练，模板选择可参照表1中的template列
max_samples	50000	用于指定训练过程中使用的最大样本数量。如果设置了这个参数，训练过程将只使用指定数量的样本，而忽略其他样本。这可以用于控制训练过程的规模和计算需求
overwrite_cache	true	用于指定是否覆盖缓存。如果设置为"overwrite_cache"，则在训练过程中覆盖缓存。这通常在数据集发生变化，或者需要重新生成缓存时使用
preprocessing_num_workers	16	用于指定预处理数据的工作线程数。随着线程数的增加，预处理的速度也会提高，但也会增加内存的使用。
per_device_train_batch_size	1	指定每个设备的训练批次大小。
gradient_accumulation_steps	8	必须修改，指定梯度累积的步数，这可以增加批次大小而不增加内存消耗。可参考表1
output_dir	/home/ma-user/ws/tokenizers/Qwen2-72B	必须修改。指定输出目录。训练过程中生成的模型参数和日志文件将保存在这个目录下
logging_steps	2	用于指定模型训练过程中，多少步输出一次日志。日志包括了训练进度、学习率、损失值等信息。建议设置
save_steps	5000	指定模型训练过程中，每多少步保存一次模型。保存的模型可以用于后续的训练或推理任务
plot_loss	true	用于指定是否绘制损失曲线。如果设置为"true"，则在训练结束后，将损失曲线保存为图片
overwrite_output_dir	true	是否覆盖输出目录。如果设置为"true"，则在每次训练开始时，都会清空输出目录，以便保存新的训练结果。
num_train_epochs	5	表示训练轮次，根据实际需要修改。一个Epoch是将所有训练样本训练一次的过程。
fp16/bf16	true	使用混合精度格式，减少内存使用和计算需求。二者选其一
learning_rate	2.0e-5	指定学习率
disable_gradient_checkpointing	true	关闭重计算，用于禁用梯度检查点，默认开启梯度检查点;在深度学习模型训练中用于保存模型的状态，以便在需要时恢复。这种技术可以帮助减少内存使用，特别是在训练大型模型时，但同时影响性能。True表示关闭重计算功能。
include_tokens_per_second include_num_input_tokens_seen	true	用于在训练过程中包含每秒处理的tokens和已经看到的输入tokens，方便计算性能。

sft_yaml样例模板

### model
model_name_or_path: /home/ma-user/ws/tokenizers/Qwen2-72B
### method
stage: sft
do_train: true
finetuning_type: full
deepspeed: examples/deepspeed/ds_z3_config.json
### dataset
dataset: identity,alpaca_en_demo
dataset_dir: /home/ma-user/ws/llm_train/LLaMAFactory/LLaMA-Factory/data 
template: qwen
cutoff_len: 4096
packing: true
max_samples: 100000
overwrite_cache: true
preprocessing_num_workers: 16
### output
output_dir: /home/ma-user/ws/tokenizers/Qwen2-72B/sft
logging_steps: 2
save_steps: 5000
plot_loss: true
overwrite_output_dir: true
### train
per_device_train_batch_size: 1
gradient_accumulation_steps: 8
learning_rate: 2.0e-5
num_train_epochs: 10.0
lr_scheduler_type: cosine
warmup_ratio: 0.1
bf16: true
flash_attn: sdpa
ddp_timeout: 180000000
include_tokens_per_second: true
include_num_input_tokens_seen: true

lora_yaml样例模板

### model
model_name_or_path: /home/ma-user/ws/tokenizers/Qwen2-72B
### method
stage: sft
do_train: true
finetuning_type: lora
lora_target: all
deepspeed: examples/deepspeed/ds_z3_config.json
### dataset
dataset: identity,alpaca_en_demo
dataset_dir: /home/ma-user/ws/llm_train/LLaMAFactory/LLaMA-Factory/data 
template: qwen
cutoff_len: 4096
packing: true
max_samples: 50000
overwrite_cache: true
preprocessing_num_workers: 16
### output
output_dir: /home/ma-user/ws/tokenizers/Qwen2-72B/lora
logging_steps: 2
save_steps: 5000
plot_loss: true
overwrite_output_dir: true
### train
per_device_train_batch_size: 1
gradient_accumulation_steps: 8
learning_rate: 2.0e-5
num_train_epochs: 10.0
lr_scheduler_type: cosine
warmup_ratio: 0.1
bf16: true
flash_attn: sdpa
ddp_timeout: 180000000
include_tokens_per_second: true
include_num_input_tokens_seen: true

dpo_yaml样例模板

### model
model_name_or_path: /home/ma-user/ws/tokenizers/Qwen2-72B
### method
stage: dpo
do_train: true
finetuning_type: lora
lora_target: all
pref_beta: 0.1
pref_loss: sigmoid
deepspeed: examples/deepspeed/ds_z3_config.json
### dataset
dataset: dpo_en_demo
dataset_dir: /home/ma-user/ws/llm_train/LLaMAFactory/LLaMA-Factory/data 
template: qwen
cutoff_len: 4096
packing: true
max_samples: 50000
overwrite_cache: true
preprocessing_num_workers: 16
### output
output_dir: /home/ma-user/ws/tokenizers/Qwen2-72B/dpo 
logging_steps: 2
save_steps: 5000
plot_loss: true
overwrite_output_dir: true
### train
per_device_train_batch_size: 1
gradient_accumulation_steps: 8
learning_rate: 5.0e-6
num_train_epochs: 3.0
lr_scheduler_type: cosine
warmup_ratio: 0.1
bf16: true
flash_attn: sdpa
ddp_timeout: 180000000
include_tokens_per_second: true
include_num_input_tokens_seen: true

ds_z1_config.json样例模板

{
  "train_batch_size": "auto",
  "train_micro_batch_size_per_gpu": "auto",
  "gradient_accumulation_steps": "auto",
  "gradient_clipping": "auto",
  "zero_allow_untested_optimizer": true,
  "fp16": {
    "enabled": "auto",
    "loss_scale": 0,
    "loss_scale_window": 1000,
    "initial_scale_power": 16,
    "hysteresis": 2,
    "min_loss_scale": 1
  },
  "bf16": {
    "enabled": "auto"
  },
  "zero_optimization": {
    "stage": 1,
    "allgather_partitions": true,
    "allgather_bucket_size": 5e8,
    "overlap_comm": true,
    "reduce_scatter": true,
    "reduce_bucket_size": 5e8,
    "contiguous_gradients": true,
    "round_robin_gradients": true
  }
}

父主题： 训练脚本说明

上一篇：训练脚本说明

下一篇：模型NPU卡数、梯度累积值取值表

意见反馈

文档内容是否对您有帮助？

有帮助没帮助

提供反馈

提交成功！非常感谢您的反馈，我们会继续努力做到更好！您可在我的云声建议查看反馈及问题处理状态。

系统繁忙，请稍后重试

在使用文档中是否遇到以下问题

内容与产品页面不一致

内容不易理解

缺失示例代码

步骤不可操作

搜不到想要的内容

缺少最佳实践

意见反馈（选填）

0/500

请至少选择一项反馈信息并填写问题反馈

字符长度不能超过500

直接提交取消

如您有其它疑问，您也可以通过华为云社区问答频道来与我们联系探讨

智能客服提问云社区提问

Yaml配置文件参数配置说明

sft_yaml样例模板

lora_yaml样例模板

dpo_yaml样例模板

ds_z1_config.json样例模板

相关文档

意见反馈

文档内容是否对您有帮助？

7*24

备案

专业服务

退订

建议反馈

售前咨询热线