模型NPU卡数取值表
不同模型推荐的训练参数和计算规格要求如表1所示。规格与节点数中的1*节点 & 4*Ascend表示单机4卡,以此类推
支持模型 |
支持模型参数量 |
文本序列长度 |
训练类型 |
Zero并行 |
规格与节点数 |
---|---|---|---|---|---|
llama3 |
70B |
cutoff_len=4096 |
lora |
per_device_train_batch_size=1 |
2*节点 & 8*Ascend |
sft |
per_device_train_batch_size=1 |
8*节点 & 8*Ascend |
|||
cutoff_len=8192 |
lora |
per_device_train_batch_size=1 |
2*节点 & 8*Ascend |
||
sft |
per_device_train_batch_size=1 |
8*节点 & 8*Ascend |
|||
8B |
cutoff_len=4096/8192 |
lora sft |
per_device_train_batch_size=1 |
1*节点 & 1*Ascend 1*节点 & 4*Ascend |
|
Qwen2 |
72B |
cutoff_len=4096 |
lora sft |
per_device_train_batch_size=1 |
2*节点 & 8*Ascend 4*节点 & 8*Ascend |
cutoff_len=8192 |
lora sft |
per_device_train_batch_size=1 |
2*节点 & 8*Ascend 8*节点 & 8*Ascend |
||
7B |
cutoff_len=4096 |
lora/sft |
per_device_train_batch_size=1 |
1*节点 & 4*Ascend |
|
cutoff_len=8192 |
lora/sft |
per_device_train_batch_size=1 |
1*节点 & 8*Ascend |
||
0.5/1.5B |
cutoff_len=4096/8192 |
lora/sft |
per_device_train_batch_size=1 |
1*节点 & 1*Ascend |
|
Qwen1.5 |
0.5B/1.8B |
cutoff_len=4096/8192 |
lora/sft |
per_device_train_batch_size=1 |
1*节点 & 1*Ascend |
4B |
cutoff_len=4096/8192 |
sft |
per_device_train_batch_size=1 |
1*节点 & 4*Ascend |
|
cutoff_len=4096/8192 |
lora |
per_device_train_batch_size=1 |
1*节点 & 1*Ascend |
||
7B |
cutoff_len=4096/8192 |
lora |
per_device_train_batch_size=1 |
1*节点 & 1*Ascend |
|
cutoff_len=4096/8192 |
sft |
per_device_train_batch_size=1 |
1*节点 & 8*Ascend |
||
14B |
cutoff_len=4096/8192 |
sft |
per_device_train_batch_size=1 |
1*节点 & 8*Ascend |
|
cutoff_len=4096/8192 |
lora |
per_device_train_batch_size=1 |
1*节点 & 1*Ascend |
||
falcon2 |
11B |
cutoff_len=4096/8192 |
sft |
per_device_train_batch_size=1 |
1*节点 & 8*Ascend |
cutoff_len=4096/8192 |
lora |
per_device_train_batch_size=1 |
1*节点 & 1*Ascend |
||
Yi |
6B |
cutoff_len=4096/8192 |
sft |
per_device_train_batch_size=1 |
1*节点 & 4*Ascend |
cutoff_len=4096/8192 |
lora |
per_device_train_batch_size=1 |
1*节点 & 1*Ascend |
||
34B |
cutoff_len=4096 |
sft lora |
per_device_train_batch_size=1 |
2*节点 & 8*Ascend 1*节点 & 2*Ascend |
|
cutoff_len=8192 |
sft lora |
per_device_train_batch_size=1 |
2*节点 & 8*Ascend 1*节点 & 4*Ascend |