训练启动脚本说明和参数配置
本代码包中集成了不同模型(包括llama2、llama3、Qwen、Qwen1.5 ......)的训练脚本,并可通过统一的训练脚本一键式运行。训练脚本可判断是否完成预处理后的数据和权重转换的模型。如果未完成,则执行脚本,自动完成数据预处理和权重转换的过程。
如果用户进行自定义数据集预处理以及权重转换,可通过编辑 1_preprocess_data.sh 、2_convert_mg_hf.sh中的具体python指令,并在Notebook环境中运行执行。用户可通过Notebook中创建.ipynb文件,并编辑以下代码可实现Notebook环境中的数据与OBS中的数据进行相互传递。
import moxing as mox # OBS存放数据路径 obs_data_dir= "obs://<bucket_name>/data" # NoteBook存放数据路径 local_data_dir= "/home/ma-user/work/data" # OBS数据上传至Notebook mox.file.copy_parallel(obs_data_dir, local_data_dir) # Notebook数据上传至OBS mox.file.copy_parallel(local_data_dir, obs_data_dir)
模型推荐的参数与NPU卡数设置
不同模型推荐的训练参数和计算规格要求如表1所示。规格与节点数中的1*节点 & 4*Ascend表示单机4卡,以此类推。
序号 |
支持模型 |
支持模型参数量 |
训练策略类型 |
文本序列长度(SEQ_LEN) |
并行参数设置 |
micro batch size (MBS) |
规格与节点数 |
---|---|---|---|---|---|---|---|
1 |
llama2 |
llama2-7b |
pretrain/sft |
4096 |
TP(tensor model parallel size)=1 PP(pipeline model parallel size)=4 |
1 |
1*节点 & 4*Ascend |
lora |
TP(tensor model parallel size)=1 PP(pipeline model parallel size)=4 |
2 |
1*节点 & 4*Ascend |
||||
pretrain/sft |
8192 |
TP(tensor model parallel size)=2 PP(pipeline model parallel size)=4 |
1 |
1*节点 & 8*Ascend |
|||
lora |
TP(tensor model parallel size)=2 PP(pipeline model parallel size)=4 |
2 |
1*节点 & 8*Ascend |
||||
2 |
llama2-13b |
pretrain/sft |
4096 |
TP(tensor model parallel size)=8 PP(pipeline model parallel size)=1 |
4 |
1*节点 & 8*Ascend |
|
lora |
TP(tensor model parallel size)=8 PP(pipeline model parallel size)=1 |
4 |
1*节点 & 8*Ascend |
||||
pretrain/sft |
8192 |
TP(tensor model parallel size)=8 PP(pipeline model parallel size)=1 |
2 |
1*节点 & 8*Ascend |
|||
lora |
TP(tensor model parallel size)=8 PP(pipeline model parallel size)=1 |
2 |
1*节点 & 8*Ascend |
||||
3 |
llama2-70b |
pretrain/sft |
4096 |
TP(tensor model parallel size)=8 PP(pipeline model parallel size)=4 |
1 |
4*节点 & 8*Ascend |
|
lora |
TP(tensor model parallel size)=8 PP(pipeline model parallel size)=4 |
2 |
4*节点 & 8*Ascend |
||||
pretrain/sft |
8192 |
TP(tensor model parallel size)=8 PP(pipeline model parallel size)=8 |
1 |
8*节点 & 8*Ascend |
|||
lora |
TP(tensor model parallel size)=8 PP(pipeline model parallel size)=4 |
1 |
4*节点 & 8*Ascend |
||||
4 |
llama3 |
llama3-8b |
pretrain/sft |
4096 |
TP(tensor model parallel size)=4 PP(pipeline model parallel size)=1 |
2 |
1*节点 & 4*Ascend |
lora |
TP(tensor model parallel size)=4 PP(pipeline model parallel size)=1 |
4 |
1*节点 & 4*Ascend |
||||
pretrain/sft |
8192 |
TP(tensor model parallel size)=4 PP(pipeline model parallel size)=1 |
1 |
1*节点 & 4*Ascend |
|||
lora |
TP(tensor model parallel size)=4 PP(pipeline model parallel size)=1 |
2 |
1*节点 & 4*Ascend |
||||
5 |
llama3-70b |
pretrain/sft |
4096 |
TP(tensor model parallel size)=8 PP(pipeline model parallel size)=4 |
1 |
4*节点 & 8*Ascend |
|
lora |
TP(tensor model parallel size)=8 PP(pipeline model parallel size)=4 |
2 |
4*节点 & 8*Ascend |
||||
pretrain/sft |
8192 |
TP(tensor model parallel size)=8 PP(pipeline model parallel size)=8 |
1 |
8*节点 & 8*Ascend |
|||
lora |
TP(tensor model parallel size)=8 PP(pipeline model parallel size)=4 |
1 |
4*节点 & 8*Ascend |
||||
6 |
Qwen |
qwen-7b |
pretrain/sft |
4096 |
TP(tensor model parallel size)=4 PP(pipeline model parallel size)=1 |
2 |
1*节点 & 4*Ascend |
lora |
TP(tensor model parallel size)=4 PP(pipeline model parallel size)=1 |
4 |
1*节点 & 4*Ascend |
||||
pretrain/sft |
8192 |
TP(tensor model parallel size)=4 PP(pipeline model parallel size)=1 |
1 |
1*节点 & 4*Ascend |
|||
lora |
TP(tensor model parallel size)=4 PP(pipeline model parallel size)=1 |
2 |
1*节点 & 4*Ascend |
||||
7 |
qwen-14b |
pretrain/sft |
4096 |
TP(tensor model parallel size)=4 PP(pipeline model parallel size)=2 |
2 |
1*节点 & 8*Ascend |
|
lora |
TP(tensor model parallel size)=4 PP(pipeline model parallel size)=1 |
2 |
1*节点 & 4*Ascend |
||||
pretrain/sft |
8192 |
TP(tensor model parallel size)=4 PP(pipeline model parallel size)=2 |
1 |
1*节点 & 8*Ascend |
|||
lora |
TP(tensor model parallel size)=4 PP(pipeline model parallel size)=2 |
2 |
1*节点 & 8*Ascend |
||||
8 |
qwen-72b |
pretrain/sft |
4096 |
TP(tensor model parallel size)=8 PP(pipeline model parallel size)=4 |
1 |
4*节点 & 8*Ascend |
|
lora |
TP(tensor model parallel size)=8 PP(pipeline model parallel size)=4 |
2 |
4*节点 & 8*Ascend |
||||
pretrain/sft |
8192 |
TP(tensor model parallel size)=8 PP(pipeline model parallel size)=8 |
1 |
8*节点 & 8*Ascend |
|||
lora |
TP(tensor model parallel size)=8 PP(pipeline model parallel size)=4 |
1 |
4*节点 & 8*Ascend |
||||
9 |
Qwen1.5 |
qwen1.5-7b |
pretrain/sft |
4096 |
TP(tensor model parallel size)=1 PP(pipeline model parallel size)=4 |
1 |
1*节点 & 4*Ascend |
lora |
TP(tensor model parallel size)=1 PP(pipeline model parallel size)=4 |
2 |
1*节点 & 4*Ascend |
||||
pretrain/sft |
8192 |
TP(tensor model parallel size)=4 PP(pipeline model parallel size)=1 |
1 |
1*节点 & 4*Ascend |
|||
lora |
TP(tensor model parallel size)=1 PP(pipeline model parallel size)=4 |
1 |
1*节点 & 4*Ascend |
||||
10 |
qwen1.5-14b |
pretrain/sft |
4096 |
TP(tensor model parallel size)=8 PP(pipeline model parallel size)=1 |
4 |
1*节点 & 8*Ascend |
|
lora |
TP(tensor model parallel size)=4 PP(pipeline model parallel size)=1 |
4 |
1*节点 & 4*Ascend |
||||
pretrain/sft |
8192 |
TP(tensor model parallel size)=8 PP(pipeline model parallel size)=1 |
2 |
1*节点 & 8*Ascend |
|||
lora |
TP(tensor model parallel size)=8 PP(pipeline model parallel size)=1 |
2 |
1*节点 & 8*Ascend |
||||
11 |
qwen1.5-32b |
pretrain/sft |
4096 |
TP(tensor model parallel size)=8 PP(pipeline model parallel size)=2 |
2 |
2*节点 & 8*Ascend |
|
lora |
TP(tensor model parallel size)=8 PP(pipeline model parallel size)=2 |
4 |
2*节点 & 8*Ascend |
||||
pretrain/sft |
8192 |
TP(tensor model parallel size)=8 PP(pipeline model parallel size)=2 |
1 |
2*节点 & 8*Ascend |
|||
lora |
TP(tensor model parallel size)=8 PP(pipeline model parallel size)=2 |
2 |
2*节点 & 8*Ascend |
||||
12 |
qwen1.5-72b |
pretrain/sft |
4096 |
TP(tensor model parallel size)=8 PP(pipeline model parallel size)=4 |
1 |
4*节点 & 8*Ascend |
|
lora |
TP(tensor model parallel size)=8 PP(pipeline model parallel size)=4 |
2 |
4*节点 & 8*Ascend |
||||
pretrain/sft |
8192 |
TP(tensor model parallel size)=8 PP(pipeline model parallel size)=8 |
1 |
8*节点 & 8*Ascend |
|||
lora |
TP(tensor model parallel size)=8 PP(pipeline model parallel size)=4 |
1 |
4*节点 & 8*Ascend |
||||
13 |
Yi |
yi-6b |
pretrain/sft |
4096 |
TP(tensor model parallel size)=1 PP(pipeline model parallel size)=4 |
1 |
1*节点 & 4*Ascend |
lora |
TP(tensor model parallel size)=1 PP(pipeline model parallel size)=4 |
2 |
1*节点 & 4*Ascend |
||||
pretrain/sft |
8192 |
TP(tensor model parallel size)=2 PP(pipeline model parallel size)=2 |
1 |
1*节点 & 4*Ascend |
|||
lora |
TP(tensor model parallel size)=1 PP(pipeline model parallel size)=4 |
1 |
1*节点 & 4*Ascend |
||||
14 |
yi-34b |
pretrain/sft |
4096 |
TP(tensor model parallel size)=4 PP(pipeline model parallel size)=4 |
1 |
2*节点 & 8*Ascend |
|
lora |
TP(tensor model parallel size)=4 PP(pipeline model parallel size)=4 |
2 |
2*节点 & 8*Ascend |
||||
pretrain/sft |
8192 |
TP(tensor model parallel size)=8 PP(pipeline model parallel size)=4 |
1 |
4*节点 & 8*Ascend |
|||
lora |
TP(tensor model parallel size)=8 PP(pipeline model parallel size)=4 |
2 |
4*节点 & 8*Ascend |
||||
15 |
ChatGLMv3 |
glm3-6b |
pretrain/sft |
4096 |
TP(tensor model parallel size)=1 PP(pipeline model parallel size)=2 |
1 |
1*节点 & 2*Ascend |
lora |
TP(tensor model parallel size)=1 PP(pipeline model parallel size)=2 |
2 |
1*节点 & 2*Ascend |
||||
pretrain/sft |
8192 |
TP(tensor model parallel size)=1 PP(pipeline model parallel size)=4 |
1 |
1*节点 & 4*Ascend |
|||
lora |
TP(tensor model parallel size)=1 PP(pipeline model parallel size)=2 |
1 |
1*节点 & 2*Ascend |
||||
16 |
Baichuan2 |
baichuan2-13b |
pretrain/sft |
4096 |
TP(tensor model parallel size)=8 PP(pipeline model parallel size)=1 |
2 |
1*节点 & 8*Ascend |
lora |
TP(tensor model parallel size)=8 PP(pipeline model parallel size)=1 |
4 |
1*节点 & 8*Ascend |
||||
pretrain/sft |
8192 |
TP(tensor model parallel size)=8 PP(pipeline model parallel size)=2 |
1 |
1*节点 & 8*Ascend |
|||
lora |
TP(tensor model parallel size)=8 PP(pipeline model parallel size)=1 |
1 |
2*节点 & 8*Ascend |
||||
17 |
Qwen2 |
qwen2-0.5b |
pretrain/sft |
4096 |
TP(tensor model parallel size)=1 PP(pipeline model parallel size)=1 |
2 |
1*节点 & 1*Ascend |
lora |
TP(tensor model parallel size)=1 PP(pipeline model parallel size)=1 |
2 |
1*节点 & 1*Ascend |
||||
pretrain/sft |
8192 |
TP(tensor model parallel size)=1 PP(pipeline model parallel size)=1 |
1 |
1*节点 & 1*Ascend |
|||
lora |
TP(tensor model parallel size)=1 PP(pipeline model parallel size)=1 |
1 |
1*节点 & 1*Ascend |
||||
18 |
qwen2-1.5b |
pretrain/sft |
4096 |
TP(tensor model parallel size)=1 PP(pipeline model parallel size)=1 |
2 |
1*节点 & 1*Ascend |
|
lora |
TP(tensor model parallel size)=1 PP(pipeline model parallel size)=1 |
2 |
1*节点 & 1*Ascend |
||||
pretrain/sft |
8192 |
TP(tensor model parallel size)=1 PP(pipeline model parallel size)=1 |
1 |
1*节点 & 1*Ascend |
|||
lora |
TP(tensor model parallel size)=1 PP(pipeline model parallel size)=1 |
1 |
1*节点 & 1*Ascend |
||||
19 |
qwen2-7b |
pretrain/sft |
4096 |
TP(tensor model parallel size)=4 PP(pipeline model parallel size)=1 |
2 |
1*节点 & 4*Ascend |
|
lora |
TP(tensor model parallel size)=4 PP(pipeline model parallel size)=1 |
2 |
1*节点 & 4*Ascend |
||||
pretrain/sft |
8192 |
TP(tensor model parallel size)=4 PP(pipeline model parallel size)=2 |
1 |
1*节点 & 8*Ascend |
|||
lora |
TP(tensor model parallel size)=4 PP(pipeline model parallel size)=2 |
2 |
1*节点 & 8*Ascend |
||||
20 |
qwen2-72b |
pretrain/sft |
4096 |
TP(tensor model parallel size)=8 PP(pipeline model parallel size)=4 |
1 |
4*节点 & 8*Ascend |
|
lora |
TP(tensor model parallel size)=8 PP(pipeline model parallel size)=4 |
2 |
4*节点 & 8*Ascend |
||||
pretrain/sft |
8192 |
TP(tensor model parallel size)=8 PP(pipeline model parallel size)=8 |
1 |
8*节点 & 8*Ascend |
|||
lora |
TP(tensor model parallel size)=8 PP(pipeline model parallel size)=8 |
1 |
8*节点 & 8*Ascend |
||||
21 |
GLMv4 |
glm4-9b |
pretrain/sft |
4096 |
TP(tensor model parallel size)=1 PP(pipeline model parallel size)=4 |
1 |
1*节点 & 4*Ascend |
lora |
TP(tensor model parallel size)=1 PP(pipeline model parallel size)=2 |
1 |
1*节点 & 2*Ascend |
||||
pretrain/sft |
8192 |
TP(tensor model parallel size)=2 PP(pipeline model parallel size)=2 |
1 |
1*节点 & 4*Ascend |
|||
lora |
TP(tensor model parallel size)=2 PP(pipeline model parallel size)=1 |
1 |
1*节点 & 2*Ascend |
||||
22 |
mistral |
mistral-7b |
pretrain/sft |
4096 |
TP(tensor model parallel size)=1 PP(pipeline model parallel size)=4 |
1 |
1*节点 & 4*Ascend |
lora |
TP(tensor model parallel size)=1 PP(pipeline model parallel size)=4 |
2 |
1*节点 & 4*Ascend |
||||
23 |
mixtral |
mixtral-8x7b |
pretrain/sft |
4096 |
TP(tensor model parallel size)=2 PP(pipeline model parallel size)=8 |
1 |
2*节点 & 8*Ascend |
pretrain/sft |
8192 |
TP(tensor model parallel size)=2 PP(pipeline model parallel size)=8 |
1 |
2*节点 & 8*Ascend |
|||
24 |
llama3.1 |
llama3.1-8b |
pretrain/sft |
4096 |
TP(tensor model parallel size)=4 PP(pipeline model parallel size)=1 |
2 |
1*节点 & 4*Ascend |
lora |
TP(tensor model parallel size)=4 PP(pipeline model parallel size)=1 |
4 |
1*节点 & 4*Ascend |
||||
pretrain/sft |
8192 |
TP(tensor model parallel size)=4 PP(pipeline model parallel size)=1 |
1 |
1*节点 & 4*Ascend |
|||
lora |
TP(tensor model parallel size)=4 PP(pipeline model parallel size)=1 |
2 |
1*节点 & 4*Ascend |
||||
25 |
llama3.1-70b |
pretrain/sft |
4096 |
TP(tensor model parallel size)=8 PP(pipeline model parallel size)=4 |
1 |
4*节点 & 8*Ascend |
|
lora |
TP(tensor model parallel size)=8 PP(pipeline model parallel size)=2 |
4 |
2*节点 & 8*Ascend |
||||
pretrain/sft |
8192 |
TP(tensor model parallel size)=8 PP(pipeline model parallel size)=8 |
1 |
8*节点 & 8*Ascend |
|||
lora |
TP(tensor model parallel size)=8 PP(pipeline model parallel size)=2 |
2 |
2*节点 & 8*Ascend |
||||
26 |
Qwen2.5 |
qwen2.5-0.5b |
pretrain/sft |
4096 |
TP(tensor model parallel size)=1 PP(pipeline model parallel size)=1 |
1 |
1*节点 & 1*Ascend |
lora |
TP(tensor model parallel size)=1 PP(pipeline model parallel size)=1 |
2 |
1*节点 & 1*Ascend |
||||
pretrain/sft |
8192 |
TP(tensor model parallel size)=1 PP(pipeline model parallel size)=1 |
1 |
1*节点 & 1*Ascend |
|||
lora |
TP(tensor model parallel size)=1 PP(pipeline model parallel size)=1 |
1 |
1*节点 & 1*Ascend |
||||
27 |
qwen2.5-7b |
pretrain/sft |
4096 |
TP(tensor model parallel size)=4 PP(pipeline model parallel size)=1 |
2 |
1*节点 & 4*Ascend |
|
lora |
TP(tensor model parallel size)=4 PP(pipeline model parallel size)=1 |
4 |
1*节点 & 4*Ascend |
||||
pretrain/sft |
8192 |
TP(tensor model parallel size)=4 PP(pipeline model parallel size)=2 |
1 |
1*节点 & 8*Ascend |
|||
lora |
TP(tensor model parallel size)=4 PP(pipeline model parallel size)=2 |
2 |
1*节点 & 8*Ascend |
||||
28 |
qwen2.5-14b |
pretrain/sft |
4096 |
TP(tensor model parallel size)=8 PP(pipeline model parallel size)=1 |
4 |
1*节点 & 8*Ascend |
|
lora |
TP(tensor model parallel size)=4 PP(pipeline model parallel size)=1 |
4 |
1*节点 & 4*Ascend |
||||
pretrain/sft |
8192 |
TP(tensor model parallel size)=8 PP(pipeline model parallel size)=1 |
2 |
1*节点 & 8*Ascend |
|||
lora |
TP(tensor model parallel size)=8 PP(pipeline model parallel size)=1 |
2 |
1*节点 & 8*Ascend |
||||
29 |
qwen2.5-32b |
pretrain/sft |
4096 |
TP(tensor model parallel size)=8 PP(pipeline model parallel size)=2 |
2 |
2*节点 & 8*Ascend |
|
lora |
TP(tensor model parallel size)=8 PP(pipeline model parallel size)=2 |
4 |
2*节点 & 8*Ascend |
||||
pretrain/sft |
8192 |
TP(tensor model parallel size)=8 PP(pipeline model parallel size)=2 |
1 |
2*节点 & 8*Ascend |
|||
lora |
TP(tensor model parallel size)=8 PP(pipeline model parallel size)=2 |
2 |
2*节点 & 8*Ascend |
||||
30 |
qwen2.5-72b |
pretrain/sft |
4096 |
TP(tensor model parallel size)=8 PP(pipeline model parallel size)=4 |
2 |
4*节点 & 8*Ascend |
|
lora |
TP(tensor model parallel size)=8 PP(pipeline model parallel size)=4 |
4 |
4*节点 & 8*Ascend |
||||
pretrain/sft |
8192 |
TP(tensor model parallel size)=8 PP(pipeline model parallel size)=8 |
1 |
8*节点 & 8*Ascend |
|||
lora |
TP(tensor model parallel size)=8 PP(pipeline model parallel size)=4 |
2 |
4*节点 & 8*Ascend |
||||
31 |
llama3.2 |
llama3.2-1b |
pretrain/sft |
4096 |
TP(tensor model parallel size)=1 PP(pipeline model parallel size)=1 |
2 |
1*节点 & 1*Ascend |
lora |
TP(tensor model parallel size)=1 PP(pipeline model parallel size)=1 |
2 |
1*节点 & 1*Ascend |
||||
pretrain/sft |
8192 |
TP(tensor model parallel size)=1 PP(pipeline model parallel size)=1 |
1 |
1*节点 & 1*Ascend |
|||
lora |
TP(tensor model parallel size)=1 PP(pipeline model parallel size)=1 |
1 |
1*节点 & 1*Ascend |
||||
32 |
llama3.2-3b |
pretrain/sft |
4096 |
TP(tensor model parallel size)=1 PP(pipeline model parallel size)=2 |
2 |
1*节点 & 2*Ascend |
|
lora |
TP(tensor model parallel size)=1 PP(pipeline model parallel size)=1 |
2 |
1*节点 & 1*Ascend |
||||
pretrain/sft |
8192 |
TP(tensor model parallel size)=1 PP(pipeline model parallel size)=2 |
1 |
1*节点 & 2*Ascend |
|||
lora |
TP(tensor model parallel size)=1 PP(pipeline model parallel size)=1 |
1 |
1*节点 & 1*Ascend |