训练启动脚本说明和参数配置
本代码包中集成了不同模型(包括llama2、llama3、Qwen、Qwen1.5 ......)的训练脚本,并可通过统一的训练脚本一键式运行。训练脚本可判断是否完成预处理后的数据和权重转换的模型。如果未完成,则执行脚本,自动完成数据预处理和权重转换的过程。
如果用户进行自定义数据集预处理以及权重转换,可通过编辑 1_preprocess_data.sh 、2_convert_mg_hf.sh中的具体python指令,并在Notebook环境中运行执行。用户可通过Notebook中创建.ipynb文件,并编辑以下代码可实现Notebook环境中的数据与OBS中的数据进行相互传递。
import moxing as mox # OBS存放数据路径 obs_data_dir= "obs://<bucket_name>/data" # NoteBook存放数据路径 local_data_dir= "/home/ma-user/work/data" # OBS数据上传至Notebook mox.file.copy_parallel(obs_data_dir, local_data_dir) # Notebook数据上传至OBS mox.file.copy_parallel(local_data_dir, obs_data_dir)
模型推荐的参数与NPU卡数设置
不同模型推荐的训练参数和计算规格要求如表1所示。规格与节点数中的1*节点 & 4*Ascend表示单机4卡,以此类推。
序号 |
支持模型 |
支持模型参数量 |
文本序列长度 |
并行参数设置 |
规格与节点数 |
---|---|---|---|---|---|
1 |
llama2 |
llama2-7b |
SEQ_LEN=4096 |
TP(tensor model parallel size)=1 PP(pipeline model parallel size)=4 |
1*节点 & 8*Ascend |
SEQ_LEN=8192 |
TP(tensor model parallel size)=2 PP(pipeline model parallel size)=4 |
1*节点 & 8*Ascend |
|||
2 |
llama2-13b |
SEQ_LEN=4096 |
TP(tensor model parallel size)=8 PP(pipeline model parallel size)=1 |
1*节点 & 8*Ascend |
|
SEQ_LEN=8192 |
TP(tensor model parallel size)=8 PP(pipeline model parallel size)=1 |
1*节点 & 8*Ascend |
|||
3 |
llama2-70b |
SEQ_LEN=4096 |
TP(tensor model parallel size)=8 PP(pipeline model parallel size)=4 |
4*节点 & 8*Ascend |
|
SEQ_LEN=8192 |
TP(tensor model parallel size)=8 PP(pipeline model parallel size)=8 |
8*节点 & 8*Ascend |
|||
4 |
llama3 |
llama3-8b |
SEQ_LEN=4096 |
TP(tensor model parallel size)=4 PP(pipeline model parallel size)=1 |
1*节点 & 8*Ascend |
SEQ_LEN=8192 |
TP(tensor model parallel size)=4 PP(pipeline model parallel size)=1 |
1*节点 & 8*Ascend |
|||
5 |
llama3-70b |
SEQ_LEN=4096 |
TP(tensor model parallel size)=8 PP(pipeline model parallel size)=4 |
4*节点 & 8*Ascend |
|
SEQ_LEN=8192 |
TP(tensor model parallel size)=8 PP(pipeline model parallel size)=8 |
8*节点 & 8*Ascend |
|||
6 |
Qwen |
qwen-7b |
SEQ_LEN=4096 |
TP(tensor model parallel size)=4 PP(pipeline model parallel size)=1 |
1*节点 & 8*Ascend |
SEQ_LEN=8192 |
TP(tensor model parallel size)=4 PP(pipeline model parallel size)=1 |
1*节点 & 8*Ascend |
|||
7 |
qwen-14b |
SEQ_LEN=4096 |
TP(tensor model parallel size)=8 PP(pipeline model parallel size)=1 |
1*节点 & 8*Ascend |
|
SEQ_LEN=8192 |
TP(tensor model parallel size)=8 PP(pipeline model parallel size)=1 |
1*节点 & 8*Ascend |
|||
8 |
qwen-72b |
SEQ_LEN=4096 |
TP(tensor model parallel size)=8 PP(pipeline model parallel size)=4 |
4*节点 & 8*Ascend |
|
SEQ_LEN=8192 |
TP(tensor model parallel size)=8 PP(pipeline model parallel size)=8 |
8*节点 & 8*Ascend |
|||
9 |
Qwen1.5 |
qwen1.5-7b |
SEQ_LEN=4096 |
TP(tensor model parallel size)=4 PP(pipeline model parallel size)=1 |
1*节点 & 8*Ascend |
SEQ_LEN=8192 |
TP(tensor model parallel size)=4 PP(pipeline model parallel size)=1 |
1*节点 & 8*Ascend |
|||
10 |
qwen1.5-14b |
SEQ_LEN=4096 |
TP(tensor model parallel size)=8 PP(pipeline model parallel size)=1 |
1*节点 & 8*Ascend |
|
SEQ_LEN=8192 |
TP(tensor model parallel size)=8 PP(pipeline model parallel size)=1 |
1*节点 & 8*Ascend |
|||
11 |
qwen1.5-32b |
SEQ_LEN=4096 |
TP(tensor model parallel size)=8 PP(pipeline model parallel size)=2 |
2*节点 & 8*Ascend |
|
SEQ_LEN=8192 |
TP(tensor model parallel size)=8 PP(pipeline model parallel size)=4 |
4*节点 & 8*Ascend |
|||
12 |
qwen1.5-72b |
SEQ_LEN=4096 |
TP(tensor model parallel size)=8 PP(pipeline model parallel size)=4 |
4*节点 & 8*Ascend |
|
SEQ_LEN=8192 |
TP(tensor model parallel size)=8 PP(pipeline model parallel size)=8 |
8*节点 & 8*Ascend |
|||
13 |
Yi |
yi-6b |
SEQ_LEN=4096 |
TP(tensor model parallel size)=1 PP(pipeline model parallel size)=4 |
1*节点 & 8*Ascend |
SEQ_LEN=8192 |
TP(tensor model parallel size)=2 PP(pipeline model parallel size)=4 |
1*节点 & 8*Ascend |
|||
14 |
yi-34b |
SEQ_LEN=4096 |
TP(tensor model parallel size)=4 PP(pipeline model parallel size)=4 |
2*节点 & 8*Ascend |
|
SEQ_LEN=8192 |
TP(tensor model parallel size)=8 PP(pipeline model parallel size)=4 |
4*节点 & 8*Ascend |
|||
15 |
ChatGLMv3 |
glm3-6b |
SEQ_LEN=4096 |
TP(tensor model parallel size)=1 PP(pipeline model parallel size)=4 |
1*节点 & 8*Ascend |
SEQ_LEN=8192 |
TP(tensor model parallel size)=2 PP(pipeline model parallel size)=4 |
1*节点 & 8*Ascend |
|||
16 |
Baichuan2 |
baichuan2-13b |
SEQ_LEN=4096 |
TP(tensor model parallel size)=8 PP(pipeline model parallel size)=1 |
1*节点 & 8*Ascend |
SEQ_LEN=8192 |
TP(tensor model parallel size)=8 PP(pipeline model parallel size)=1 |
1*节点 & 8*Ascend |
|||
17 |
Qwen2 |
qwen2-0.5b |
SEQ_LEN=4096 |
TP(tensor model parallel size)=2 PP(pipeline model parallel size)=1 |
1*节点 & 4*Ascend |
SEQ_LEN=8192 |
TP(tensor model parallel size)=2 PP(pipeline model parallel size)=1 |
1*节点 & 4*Ascend |
|||
18 |
qwen2-1.5b |
SEQ_LEN=4096 |
TP(tensor model parallel size)=2 PP(pipeline model parallel size)=1 |
1*节点 & 4*Ascend |
|
SEQ_LEN=8192 |
TP(tensor model parallel size)=2 PP(pipeline model parallel size)=1 |
1*节点 & 4*Ascend |
|||
19 |
qwen2-7b |
SEQ_LEN=4096 |
TP(tensor model parallel size)=4 PP(pipeline model parallel size)=1 |
1*节点 & 8*Ascend |
|
SEQ_LEN=8192 |
TP(tensor model parallel size)=4 PP(pipeline model parallel size)=1 |
1*节点 & 8*Ascend |
|||
20 |
qwen2-72b |
SEQ_LEN=4096 |
TP(tensor model parallel size)=8 PP(pipeline model parallel size)=4 |
4*节点 & 8*Ascend |
|
SEQ_LEN=8192 |
TP(tensor model parallel size)=8 PP(pipeline model parallel size)=8 |
8*节点 & 8*Ascend |
|||
21 |
GLMv4 |
glm4-9b |
SEQ_LEN=4096 |
TP(tensor model parallel size)=2 PP(pipeline model parallel size)=4 |
1*节点 & 8*Ascend |
SEQ_LEN=8192 |
TP(tensor model parallel size)=2 PP(pipeline model parallel size)=4 |
1*节点 & 8*Ascend |
|||
22 |
mistral |
mistral-7b |
SEQ_LEN=4096 |
TP(tensor model parallel size)=1 PP(pipeline model parallel size)=4 |
1*节点 & 8*Ascend |
23 |
mixtral |
mixtral-8x7b |
SEQ_LEN=4096 |
TP(tensor model parallel size)=2 PP(pipeline model parallel size)=8 |
2*节点 & 8*Ascend |
SEQ_LEN=8192 |
TP(tensor model parallel size)=2 PP(pipeline model parallel size)=8 |
2*节点 & 8*Ascend |
|||
24 |
llama3.1 |
llama3.1-8b |
SEQ_LEN=4096 |
TP(tensor model parallel size)=4 PP(pipeline model parallel size)=1 |
1*节点 & 4*Ascend |
SEQ_LEN=8192 |
TP(tensor model parallel size)=4 PP(pipeline model parallel size)=1 |
1*节点 & 4*Ascend |
|||
25 |
llama3.1-70b |
SEQ_LEN=4096 |
TP(tensor model parallel size)=8 PP(pipeline model parallel size)=4 |
4*节点 & 8*Ascend |
|
SEQ_LEN=8192 |
TP(tensor model parallel size)=8 PP(pipeline model parallel size)=8 |
8*节点 & 8*Ascend |