更新时间:2024-12-09 GMT+08:00
分享

训练启动脚本说明和参数配置

本代码包中集成了不同模型(包括llama2、llama3、Qwen、Qwen1.5 ......)的训练脚本,并可通过统一的训练脚本一键式运行。训练脚本可判断是否完成预处理后的数据和权重转换的模型。如果未完成,则执行脚本,自动完成数据预处理和权重转换的过程

如果用户进行自定义数据集预处理以及权重转换,可通过编辑 1_preprocess_data.sh2_convert_mg_hf.sh中的具体python指令,并在Notebook环境中运行执行。用户可通过Notebook中创建.ipynb文件,并编辑以下代码可实现Notebook环境中的数据与OBS中的数据进行相互传递。

import moxing as mox
# OBS存放数据路径
obs_data_dir= "obs://<bucket_name>/data"
# NoteBook存放数据路径 
local_data_dir= "/home/ma-user/work/data"
# OBS数据上传至Notebook
mox.file.copy_parallel(obs_data_dir, local_data_dir)
# Notebook数据上传至OBS
mox.file.copy_parallel(local_data_dir, obs_data_dir)

模型推荐的参数与NPU卡数设置

不同模型推荐的训练参数和计算规格要求如表1所示。规格与节点数中的1*节点 & 4*Ascend表示单机4卡,以此类推。

表1 不同模型推荐的参数与NPU卡数设置

序号

支持模型

支持模型参数量

文本序列长度

并行参数设置

规格与节点数

1

llama2

llama2-7b

SEQ_LEN=4096

TP(tensor model parallel size)=1

PP(pipeline model parallel size)=4

1*节点 & 8*Ascend

SEQ_LEN=8192

TP(tensor model parallel size)=2

PP(pipeline model parallel size)=4

1*节点 & 8*Ascend

2

llama2-13b

SEQ_LEN=4096

TP(tensor model parallel size)=8

PP(pipeline model parallel size)=1

1*节点 & 8*Ascend

SEQ_LEN=8192

TP(tensor model parallel size)=8

PP(pipeline model parallel size)=1

1*节点 & 8*Ascend

3

llama2-70b

SEQ_LEN=4096

TP(tensor model parallel size)=8

PP(pipeline model parallel size)=4

4*节点 & 8*Ascend

SEQ_LEN=8192

TP(tensor model parallel size)=8

PP(pipeline model parallel size)=8

8*节点 & 8*Ascend

4

llama3

llama3-8b

SEQ_LEN=4096

TP(tensor model parallel size)=4

PP(pipeline model parallel size)=1

1*节点 & 8*Ascend

SEQ_LEN=8192

TP(tensor model parallel size)=4

PP(pipeline model parallel size)=1

1*节点 & 8*Ascend

5

llama3-70b

SEQ_LEN=4096

TP(tensor model parallel size)=8

PP(pipeline model parallel size)=4

4*节点 & 8*Ascend

SEQ_LEN=8192

TP(tensor model parallel size)=8

PP(pipeline model parallel size)=8

8*节点 & 8*Ascend

6

Qwen

qwen-7b

SEQ_LEN=4096

TP(tensor model parallel size)=4

PP(pipeline model parallel size)=1

1*节点 & 8*Ascend

SEQ_LEN=8192

TP(tensor model parallel size)=4

PP(pipeline model parallel size)=1

1*节点 & 8*Ascend

7

qwen-14b

SEQ_LEN=4096

TP(tensor model parallel size)=8

PP(pipeline model parallel size)=1

1*节点 & 8*Ascend

SEQ_LEN=8192

TP(tensor model parallel size)=8

PP(pipeline model parallel size)=1

1*节点 & 8*Ascend

8

qwen-72b

SEQ_LEN=4096

TP(tensor model parallel size)=8

PP(pipeline model parallel size)=4

4*节点 & 8*Ascend

SEQ_LEN=8192

TP(tensor model parallel size)=8

PP(pipeline model parallel size)=8

8*节点 & 8*Ascend

9

Qwen1.5

qwen1.5-7b

SEQ_LEN=4096

TP(tensor model parallel size)=4

PP(pipeline model parallel size)=1

1*节点 & 8*Ascend

SEQ_LEN=8192

TP(tensor model parallel size)=4

PP(pipeline model parallel size)=1

1*节点 & 8*Ascend

10

qwen1.5-14b

SEQ_LEN=4096

TP(tensor model parallel size)=8

PP(pipeline model parallel size)=1

1*节点 & 8*Ascend

SEQ_LEN=8192

TP(tensor model parallel size)=8

PP(pipeline model parallel size)=1

1*节点 & 8*Ascend

11

qwen1.5-32b

SEQ_LEN=4096

TP(tensor model parallel size)=8

PP(pipeline model parallel size)=2

2*节点 & 8*Ascend

SEQ_LEN=8192

TP(tensor model parallel size)=8

PP(pipeline model parallel size)=4

4*节点 & 8*Ascend

12

qwen1.5-72b

SEQ_LEN=4096

TP(tensor model parallel size)=8

PP(pipeline model parallel size)=4

4*节点 & 8*Ascend

SEQ_LEN=8192

TP(tensor model parallel size)=8

PP(pipeline model parallel size)=8

8*节点 & 8*Ascend

13

Yi

yi-6b

SEQ_LEN=4096

TP(tensor model parallel size)=1

PP(pipeline model parallel size)=4

1*节点 & 8*Ascend

SEQ_LEN=8192

TP(tensor model parallel size)=2

PP(pipeline model parallel size)=4

1*节点 & 8*Ascend

14

yi-34b

SEQ_LEN=4096

TP(tensor model parallel size)=4

PP(pipeline model parallel size)=4

2*节点 & 8*Ascend

SEQ_LEN=8192

TP(tensor model parallel size)=8

PP(pipeline model parallel size)=4

4*节点 & 8*Ascend

15

ChatGLMv3

glm3-6b

SEQ_LEN=4096

TP(tensor model parallel size)=1

PP(pipeline model parallel size)=4

1*节点 & 8*Ascend

SEQ_LEN=8192

TP(tensor model parallel size)=2

PP(pipeline model parallel size)=4

1*节点 & 8*Ascend

16

Baichuan2

baichuan2-13b

SEQ_LEN=4096

TP(tensor model parallel size)=8

PP(pipeline model parallel size)=1

1*节点 & 8*Ascend

SEQ_LEN=8192

TP(tensor model parallel size)=8

PP(pipeline model parallel size)=1

1*节点 & 8*Ascend

17

Qwen2

qwen2-0.5b

SEQ_LEN=4096

TP(tensor model parallel size)=2

PP(pipeline model parallel size)=1

1*节点 & 4*Ascend

SEQ_LEN=8192

TP(tensor model parallel size)=2

PP(pipeline model parallel size)=1

1*节点 & 4*Ascend

18

qwen2-1.5b

SEQ_LEN=4096

TP(tensor model parallel size)=2

PP(pipeline model parallel size)=1

1*节点 & 4*Ascend

SEQ_LEN=8192

TP(tensor model parallel size)=2

PP(pipeline model parallel size)=1

1*节点 & 4*Ascend

19

qwen2-7b

SEQ_LEN=4096

TP(tensor model parallel size)=4

PP(pipeline model parallel size)=1

1*节点 & 8*Ascend

SEQ_LEN=8192

TP(tensor model parallel size)=4

PP(pipeline model parallel size)=1

1*节点 & 8*Ascend

20

qwen2-72b

SEQ_LEN=4096

TP(tensor model parallel size)=8

PP(pipeline model parallel size)=4

4*节点 & 8*Ascend

SEQ_LEN=8192

TP(tensor model parallel size)=8

PP(pipeline model parallel size)=8

8*节点 & 8*Ascend

21

GLMv4

glm4-9b

SEQ_LEN=4096

TP(tensor model parallel size)=2

PP(pipeline model parallel size)=4

1*节点 & 8*Ascend

SEQ_LEN=8192

TP(tensor model parallel size)=2

PP(pipeline model parallel size)=4

1*节点 & 8*Ascend

22

mistral

mistral-7b

SEQ_LEN=4096

TP(tensor model parallel size)=1

PP(pipeline model parallel size)=4

1*节点 & 8*Ascend

23

mixtral

mixtral-8x7b

SEQ_LEN=4096

TP(tensor model parallel size)=2

PP(pipeline model parallel size)=8

2*节点 & 8*Ascend

SEQ_LEN=8192

TP(tensor model parallel size)=2

PP(pipeline model parallel size)=8

2*节点 & 8*Ascend

24

llama3.1

llama3.1-8b

SEQ_LEN=4096

TP(tensor model parallel size)=4

PP(pipeline model parallel size)=1

1*节点 & 4*Ascend

SEQ_LEN=8192

TP(tensor model parallel size)=4

PP(pipeline model parallel size)=1

1*节点 & 4*Ascend

25

llama3.1-70b

SEQ_LEN=4096

TP(tensor model parallel size)=8

PP(pipeline model parallel size)=4

4*节点 & 8*Ascend

SEQ_LEN=8192

TP(tensor model parallel size)=8

PP(pipeline model parallel size)=8

8*节点 & 8*Ascend

相关文档