Training Features Supported by Each Model
The AscendFactory solution supports various training features for each model, as outlined in this section.
|
Type |
Series |
Model |
Pre-Training and Fine-Tuning |
Reinforcement Learning |
|||||||||||||||||||||
|
MindSpeed-LLM |
LlamaFactory |
MindSpeed-MM |
VeRL |
MindSpeed-RL |
|||||||||||||||||||||
|
Pre-training and full-parameter fine-tuning |
LoRA fine-tuning |
Multi-sample pack |
Flash |
SPTD parallelism |
Long sequence parallelism |
Mixture of Expert (MoE) parallelism |
Dynamic sentence length |
Training methods |
ZeRO parallelism |
Flash |
Pre-training and full-parameter fine-tuning |
SPTD parallelism |
Distributed optimizer |
Recomputation |
Training methods |
sglang |
vllm |
Training backend |
Training methods |
vllm |
Training backend Megatron |
Long sequence parallelism |
|||
|
Fine-tuning |
Attention |
(SP, PP, TP, DP) |
(Ring Attention, Ulysses, hybrid long sequence) |
(Expert parallelism and communication rearrangement optimization) |
(PT: pre-training) |
(ZeRO-1, ZeRO-2, and ZeRO-3) |
Attention |
(SP, PP, TP, DP) |
Version |
Version |
Version |
||||||||||||||
|
LLM |
DeepSeek |
DeepSeek-R1-671B |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
❌ |
❌ |
❌ |
N/A |
N/A |
N/A |
N/A |
❌ |
❌ |
❌ |
❌ |
❌ |
❌ |
❌ |
❌ |
|
DeepSeek-V3-671B |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
❌ |
❌ |
❌ |
N/A |
N/A |
N/A |
N/A |
❌ |
❌ |
❌ |
❌ |
❌ |
❌ |
❌ |
❌ |
||
|
DeepSeek-V2-Lite 16B |
✅ |
❌ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
❌ |
❌ |
❌ |
N/A |
N/A |
N/A |
N/A |
❌ |
❌ |
❌ |
❌ |
❌ |
❌ |
❌ |
❌ |
||
|
Qwen2 |
Qwen2-0.5B |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
PT, SFT |
✅ |
✅ |
N/A |
N/A |
N/A |
N/A |
❌ |
❌ |
❌ |
❌ |
❌ |
❌ |
❌ |
❌ |
|
|
Qwen2-1.5B |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
❌ |
❌ |
❌ |
N/A |
N/A |
N/A |
N/A |
❌ |
❌ |
❌ |
❌ |
❌ |
❌ |
❌ |
❌ |
||
|
Qwen2-7B |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
PT, SFT |
✅ |
✅ |
N/A |
N/A |
N/A |
N/A |
❌ |
❌ |
❌ |
❌ |
❌ |
❌ |
❌ |
❌ |
||
|
Qwen2-72B |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
PT, SFT |
✅ |
✅ |
N/A |
N/A |
N/A |
N/A |
❌ |
❌ |
❌ |
❌ |
❌ |
❌ |
❌ |
❌ |
||
|
Qwen2.5 |
Qwen2.5-0.5B |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
PT, SFT |
✅ |
✅ |
N/A |
N/A |
N/A |
N/A |
❌ |
❌ |
❌ |
❌ |
❌ |
❌ |
❌ |
❌ |
|
|
Qwen2.5-1.5B |
❌ |
❌ |
❌ |
❌ |
❌ |
❌ |
❌ |
❌ |
❌ |
❌ |
❌ |
N/A |
N/A |
N/A |
N/A |
❌ |
❌ |
❌ |
❌ |
GRPO |
0.9.1 |
✅ |
✅ |
||
|
Qwen2.5-7B |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
PT, SFT |
✅ |
✅ |
N/A |
N/A |
N/A |
N/A |
❌ |
❌ |
❌ |
❌ |
GRPO |
0.9.1 |
✅ |
✅ |
||
|
Qwen2.5-14B |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
PT, SFT, DPO |
✅ |
✅ |
N/A |
N/A |
N/A |
N/A |
❌ |
❌ |
❌ |
❌ |
❌ |
❌ |
❌ |
❌ |
||
|
Qwen2.5-32B |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
PT, SFT |
✅ |
✅ |
N/A |
N/A |
N/A |
N/A |
GRPO, DAPO, PPO |
❌ |
0.9.1 |
FSDP |
GRPO |
0.9.1 |
✅ |
✅ |
||
|
Qwen2.5-72B |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
PT, SFT, DPO |
✅ |
✅ |
N/A |
N/A |
N/A |
N/A |
❌ |
❌ |
❌ |
❌ |
❌ |
❌ |
❌ |
❌ |
||
|
Qwen3 |
Qwen3-0.6B |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
PT, SFT |
✅ |
✅ |
N/A |
N/A |
N/A |
N/A |
❌ |
❌ |
❌ |
❌ |
❌ |
❌ |
❌ |
❌ |
|
|
Qwen3-1.7B |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
PT, SFT |
✅ |
✅ |
N/A |
N/A |
N/A |
N/A |
❌ |
❌ |
❌ |
❌ |
❌ |
❌ |
❌ |
❌ |
||
|
Qwen3-4B |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
PT, SFT |
✅ |
✅ |
N/A |
N/A |
N/A |
N/A |
❌ |
❌ |
❌ |
❌ |
❌ |
❌ |
❌ |
❌ |
||
|
Qwen3-8B |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
PT, SFT |
✅ |
✅ |
N/A |
N/A |
N/A |
N/A |
GRPO |
❌ |
0.9.1 |
FSDP |
❌ |
❌ |
❌ |
❌ |
||
|
Qwen3-14B |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
PT, SFT |
✅ |
✅ |
N/A |
N/A |
N/A |
N/A |
GRPO, DAPO, PPO |
❌ |
0.9.1 |
FSDP |
❌ |
❌ |
❌ |
❌ |
||
|
Qwen3-32B |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
PT, SFT |
✅ |
✅ |
N/A |
N/A |
N/A |
N/A |
GRPO, DAPO, PPO |
❌ |
0.9.1 |
FSDP |
❌ |
❌ |
❌ |
❌ |
||
|
Qwen3-30B-A3B |
✅ |
❌ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
PT, SFT |
✅ |
✅ |
N/A |
N/A |
N/A |
N/A |
❌ |
❌ |
❌ |
❌ |
❌ |
❌ |
❌ |
❌ |
||
|
Qwen3-235b-A22B |
✅ |
❌ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
PT, SFT |
✅ |
✅ |
N/A |
N/A |
N/A |
N/A |
❌ |
❌ |
❌ |
❌ |
❌ |
❌ |
❌ |
❌ |
||
|
Llama |
Llama3.1 -8B/70B |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
PT, SFT |
✅ |
✅ |
N/A |
N/A |
N/A |
N/A |
❌ |
❌ |
❌ |
❌ |
❌ |
❌ |
❌ |
❌ |
|
|
Llama3.2-1B/3B |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
PT, SFT |
✅ |
✅ |
N/A |
N/A |
N/A |
N/A |
❌ |
❌ |
❌ |
❌ |
❌ |
❌ |
❌ |
❌ |
||
|
GLM |
glm-4-9b-chat |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
PT, SFT |
✅ |
✅ |
N/A |
N/A |
N/A |
N/A |
❌ |
❌ |
❌ |
❌ |
❌ |
❌ |
❌ |
❌ |
|
|
Mixtral |
Mixtral-8x7B-Instruct-v0.1 |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
❌ |
❌ |
❌ |
N/A |
N/A |
N/A |
N/A |
❌ |
❌ |
❌ |
❌ |
❌ |
❌ |
❌ |
❌ |
|
|
Multimodal model |
Qwen2 VL |
Qwen2-VL-2B |
N/A |
N/A |
N/A |
N/A |
N/A |
N/A |
N/A |
N/A |
PT, SFT |
✅ |
✅ |
❌ |
❌ |
❌ |
❌ |
❌ |
❌ |
❌ |
❌ |
N/A |
N/A |
N/A |
N/A |
|
Qwen2-VL-7B |
N/A |
N/A |
N/A |
N/A |
N/A |
N/A |
N/A |
N/A |
PT, SFT |
✅ |
✅ |
❌ |
❌ |
❌ |
❌ |
❌ |
❌ |
❌ |
❌ |
N/A |
N/A |
N/A |
N/A |
||
|
Qwen2-VL-72B |
N/A |
N/A |
N/A |
N/A |
N/A |
N/A |
N/A |
N/A |
PT, SFT |
✅ |
✅ |
❌ |
❌ |
❌ |
❌ |
❌ |
❌ |
❌ |
❌ |
N/A |
N/A |
N/A |
N/A |
||
|
Qwen2.5 VL |
Qwen2.5-VL-3B |
N/A |
N/A |
N/A |
N/A |
N/A |
N/A |
N/A |
N/A |
PT, SFT |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
GRPO |
❌ |
0.9.1 |
FSDP |
N/A |
N/A |
N/A |
N/A |
|
|
Qwen2.5-VL-7B |
N/A |
N/A |
N/A |
N/A |
N/A |
N/A |
N/A |
N/A |
PT, SFT, DPO |
✅ |
✅ |
✅ |
✅ |
✅ |
✅ |
GRPO, DAPO, PPO |
❌ |
0.9.1 |
FSDP |
N/A |
N/A |
N/A |
N/A |
||
|
Qwen2.5-VL-32B |
N/A |
N/A |
N/A |
N/A |
N/A |
N/A |
N/A |
N/A |
PT, SFT |
✅ |
✅ |
❌ |
❌ |
❌ |
❌ |
GRPO, DAPO, PPO |
❌ |
0.9.1 |
FSDP |
N/A |
N/A |
N/A |
N/A |
||
|
Qwen2.5-VL-72B |
N/A |
N/A |
N/A |
N/A |
N/A |
N/A |
N/A |
N/A |
PT, SFT |
✅ |
✅ |
❌ |
❌ |
❌ |
❌ |
GRPO |
❌ |
0.9.1 |
FSDP |
N/A |
N/A |
N/A |
N/A |
||
|
Gemma |
Gemma3-27b |
N/A |
N/A |
N/A |
N/A |
N/A |
N/A |
N/A |
N/A |
PT, SFT |
✅ |
✅ |
❌ |
❌ |
❌ |
❌ |
❌ |
❌ |
❌ |
❌ |
N/A |
N/A |
N/A |
N/A |
|
- "N/A" means the model does not work with the framework. Multimodal models, for example, do not support the MindSpeed-LLM training framework.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot