Help Center/ ModelArts/ Best Practices/ LLM Training/ Adapting Mainstream Open-Source Models to AscendFactory NPU Training Based on Lite Server/ Training Features Supported by Each Model

Updated on 2025-11-04 GMT+08:00

View PDF

Training Features Supported by Each Model

The AscendFactory solution supports various training features for each model, as outlined in this section.

Type

Series

Model

Pre-Training and Fine-Tuning

Reinforcement Learning

MindSpeed-LLM

LlamaFactory

MindSpeed-MM

VeRL

MindSpeed-RL

Pre-training and full-parameter fine-tuning

LoRA fine-tuning

Multi-sample pack

Flash

SPTD parallelism

Long sequence parallelism

Mixture of Expert (MoE) parallelism

Dynamic sentence length

Training methods

ZeRO parallelism

Flash

Pre-training and full-parameter fine-tuning

SPTD parallelism

Distributed optimizer

Recomputation

Training methods

sglang

vllm

Training backend

Training methods

vllm

Training backend Megatron

Long sequence parallelism

Fine-tuning

Attention

(SP, PP, TP, DP)

(Ring Attention, Ulysses, hybrid long sequence)

(Expert parallelism and communication rearrangement optimization)

(PT: pre-training)

(ZeRO-1, ZeRO-2, and ZeRO-3)

Attention

(SP, PP, TP, DP)

Version

LLM

DeepSeek

DeepSeek-R1-671B

✅

❌

N/A

❌

DeepSeek-V3-671B

✅

❌

N/A

❌

DeepSeek-V2-Lite 16B

✅

❌

✅

❌

N/A

❌

Qwen2

Qwen2-0.5B

✅

PT, SFT

✅

N/A

❌

Qwen2-1.5B

✅

❌

N/A

❌

Qwen2-7B

✅

PT, SFT

✅

N/A

❌

Qwen2-72B

✅

PT, SFT

✅

N/A

❌

Qwen2.5

Qwen2.5-0.5B

✅

PT, SFT

✅

N/A

❌

Qwen2.5-1.5B

❌

N/A

❌

GRPO

0.9.1

✅

Qwen2.5-7B

✅

PT, SFT

✅

N/A

❌

GRPO

0.9.1

✅

Qwen2.5-14B

✅

PT, SFT, DPO

✅

N/A

❌

Qwen2.5-32B

✅

PT, SFT

✅

N/A

GRPO, DAPO, PPO

❌

0.9.1

FSDP

GRPO

0.9.1

✅

Qwen2.5-72B

✅

PT, SFT, DPO

✅

N/A

❌

Qwen3

Qwen3-0.6B

✅

PT, SFT

✅

N/A

❌

Qwen3-1.7B

✅

PT, SFT

✅

N/A

❌

Qwen3-4B

✅

PT, SFT

✅

N/A

❌

Qwen3-8B

✅

PT, SFT

✅

N/A

GRPO

❌

0.9.1

FSDP

❌

Qwen3-14B

✅

PT, SFT

✅

N/A

GRPO, DAPO, PPO

❌

0.9.1

FSDP

❌

Qwen3-32B

✅

PT, SFT

✅

N/A

GRPO, DAPO, PPO

❌

0.9.1

FSDP

❌

Qwen3-30B-A3B

✅

❌

✅

PT, SFT

✅

N/A

❌

Qwen3-235b-A22B

✅

❌

✅

PT, SFT

✅

N/A

❌

Llama

Llama3.1 -8B/70B

✅

PT, SFT

✅

N/A

❌

Llama3.2-1B/3B

✅

PT, SFT

✅

N/A

❌

GLM

glm-4-9b-chat

✅

PT, SFT

✅

N/A

❌

Mixtral

Mixtral-8x7B-Instruct-v0.1

✅

❌

N/A

❌

Multimodal model

Qwen2 VL

Qwen2-VL-2B

N/A

PT, SFT

✅

❌

N/A

Qwen2-VL-7B

N/A

PT, SFT

✅

❌

N/A

Qwen2-VL-72B

N/A

PT, SFT

✅

❌

N/A

Qwen2.5 VL

Qwen2.5-VL-3B

N/A

PT, SFT

✅

GRPO

❌

0.9.1

FSDP

N/A

Qwen2.5-VL-7B

N/A

PT, SFT, DPO

✅

GRPO, DAPO, PPO

❌

0.9.1

FSDP

N/A

Qwen2.5-VL-32B

N/A

PT, SFT

✅

❌

GRPO, DAPO, PPO

❌

0.9.1

FSDP

N/A

Qwen2.5-VL-72B

N/A

PT, SFT

✅

❌

GRPO

❌

0.9.1

FSDP

N/A

Gemma

Gemma3-27b

N/A

PT, SFT

✅

❌

N/A

"N/A" means the model does not work with the framework. Multimodal models, for example, do not support the MindSpeed-LLM training framework.

Parent topic: Adapting Mainstream Open-Source Models to AscendFactory NPU Training Based on Lite Server

Previous topic: Supported Models

Next topic: Minimum Number of PUs and Sequence Length Supported by Each Model

Feedback

Was this page helpful?

Helpful Not helpful

Provide feedback

Thank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.

The system is busy. Please try again later.

Which of the following issues have you encountered?

Content is inconsistent with the product UI

Unclear descriptions

Lack of examples or code

Incorrect steps

Can't find what I need

Lack of best practices

Feedback (optional)

0/500

Select at least one type of issue, and enter your comments or suggestions.

Enter a maximum of 500 characters.

Submit Cancel

For any further questions, feel free to contact us through the chatbot.

Chatbot