Updated on 2025-11-04 GMT+08:00

Supported Models

Models are classified into large language models (LLMs) and multimodal models. The details are as follows.

Table 1 Supported LLMs and their weight download addresses

Series

Model

Training Scenario

Training Framework

Version

Open-Source Weight File Download Address

DeepSeek

DeepSeek-R1-671B

Pre-training and fine-tuning

MindSpeed-LLM

6.5.902 or later

https://huggingface.co/deepseek-ai/DeepSeek-R1/tree/main

DeepSeek-V3-671B

Pre-training and fine-tuning

MindSpeed-LLM

6.5.902 or later

https://huggingface.co/deepseek-ai/DeepSeek-V3-Base/tree/main

DeepSeek-V2-Lite 16B

Pre-training and full-parameter fine-tuning

MindSpeed-LLM

6.5.906 or later

https://huggingface.co/deepseek-ai/DeepSeek-V2-Lite

Qwen2

Qwen2-0.5B

Pre-training and fine-tuning

MindSpeed-LLM

6.5.902 or later

https://huggingface.co/Qwen/Qwen2-0.5B-Instruct

Pre-training and fine-tuning

LLaMA-Factory

6.5.902 or later

Qwen2-1.5B

Pre-training and fine-tuning

MindSpeed-LLM

6.5.902 or later

https://huggingface.co/Qwen/Qwen2-1.5B-Instruct

Qwen2-7B

Pre-training and fine-tuning

MindSpeed-LLM

6.5.902 or later

https://huggingface.co/Qwen/Qwen2-7B-Instruct

Pre-training and fine-tuning

LLaMA-Factory

6.5.902 or later

Qwen2-72B

Pre-training and fine-tuning

MindSpeed-LLM

6.5.902 or later

https://huggingface.co/Qwen/Qwen2-72B-Instruct

Pre-training and fine-tuning

LLaMA-Factory

6.5.902 or later

Qwen2.5

Qwen2.5-0.5B

Pre-training and fine-tuning

MindSpeed-LLM

6.5.902 or later

https://huggingface.co/Qwen/Qwen2.5-0.5B-Instruct

Pre-training and fine-tuning

LLaMA-Factory

Qwen2.5-1.5B

Reinforcement learning

MindSpeed-RL

6.5.906 or later

https://huggingface.co/Qwen/Qwen2.5-1.5B

Qwen2.5-7B

Pre-training and fine-tuning

MindSpeed-LLM

6.5.902 or later

https://huggingface.co/Qwen/Qwen2.5-7B

Pre-training and fine-tuning

LLaMA-Factory

Reinforcement learning

MindSpeed-RL

6.5.906 or later

Qwen2.5-14B

Pre-training and fine-tuning

MindSpeed-LLM

6.5.902 or later

https://huggingface.co/Qwen/Qwen2.5-14B-Instruct

Pre-training and fine-tuning

LLaMA-Factory

6.5.902 or later

Reinforcement learning

LLaMA-Factory

6.5.907 or later

Qwen2.5-32B

Pre-training and fine-tuning

MindSpeed-LLM

6.5.902 or later

https://huggingface.co/Qwen/Qwen2.5-32B

Pre-training and fine-tuning

LLaMA-Factory

6.5.902 or later

Reinforcement learning

MindSpeed-RL

6.5.906 or later

Reinforcement learning

VeRL

6.5.907 or later

Qwen2.5-72B

Pre-training and fine-tuning

LLaMA-Factory

6.5.902 or later

https://huggingface.co/Qwen/Qwen2.5-72B-Instruct

Pre-training and fine-tuning

MindSpeed-LLM

6.5.902 or later

Reinforcement learning

LLaMA-Factory

6.5.907 or later

Qwen3

Qwen3-0.6B

Pre-training and fine-tuning

MindSpeed-LLM

6.5.905 or later

https://huggingface.co/Qwen/Qwen3-0.6B

Pre-training and fine-tuning

LLaMA-Factory

6.5.905 or later

Qwen3-1.7B

Pre-training and fine-tuning

MindSpeed-LLM

6.5.905 or later

https://huggingface.co/Qwen/Qwen3-1.7B

Pre-training and fine-tuning

LLaMA-Factory

6.5.905 or later

Qwen3-4B

Pre-training and fine-tuning

MindSpeed-LLM

6.5.905 or later

https://huggingface.co/Qwen/Qwen3-4B

Pre-training and fine-tuning

LLaMA-Factory

6.5.905 or later

Reinforcement learning

VeRL

6.5.907 or later

Qwen3-8B

Reinforcement learning

VeRL

6.5.906 or later

https://huggingface.co/Qwen/Qwen3-8B

Pre-training and fine-tuning

MindSpeed-LLM

6.5.905 or later

Pre-training and fine-tuning

LLaMA-Factory

6.5.905 or later

Qwen3-14B

Pre-training and fine-tuning

MindSpeed-LLM

6.5.905 or later

https://huggingface.co/Qwen/Qwen3-14B

Pre-training and fine-tuning

LLaMA-Factory

6.5.905 or later

Qwen3-32B

Reinforcement learning

VeRL

6.5.906 or later

https://huggingface.co/Qwen/Qwen3-32B

Pre-training and fine-tuning

MindSpeed-LLM

6.5.905 or later

Pre-training and fine-tuning

LLaMA-Factory

6.5.905 or later

Qwen3-30B-A3B

Pre-training and full-parameter fine-tuning

MindSpeed-LLM

6.5.905 or later

https://huggingface.co/Qwen/Qwen3-30B-A3B

Pre-training and fine-tuning

LLaMA-Factory

6.5.905 or later

Qwen3-235b-A22B

Pre-training and full-parameter fine-tuning

MindSpeed-LLM

6.5.905 or later

https://huggingface.co/Qwen/Qwen3-235B-A22B

Pre-training and fine-tuning

LLaMA-Factory

6.5.905 or later

Llama

Llama3.1 -8B/70B

Pre-training and fine-tuning

MindSpeed-LLM

6.5.902 or later

https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct

https://huggingface.co/meta-llama/Meta-Llama-3.1-70B-Instruct

Pre-training and fine-tuning

LLaMA-Factory

6.5.902 or later

Llama3.2-1B/3B

Pre-training and fine-tuning

MindSpeed-LLM

6.5.902 or later

https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct

https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct

Pre-training and fine-tuning

LLaMA-Factory

6.5.902 or later

GLM

glm-4-9b-chat

Pre-training and fine-tuning

MindSpeed-LLM

6.5.902 or later

https://huggingface.co/THUDM/glm-4-9b-chat

Pre-training and fine-tuning

LLaMA-Factory

6.5.902 or later

Mistral AI

Mixtral-8x7B-Instruct-v0.1

Pre-training and fine-tuning

MindSpeed-LLM

6.5.902 or later

https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1

Table 2 Supported multimodal models and their weight download addresses

Series

Model

Training Scenario

Training Framework

Version

Open-Source Weight File Download Address

Qwen2 VL

Qwen2-VL-2B

Pre-training and fine-tuning

LLaMA-Factory

6.5.902 or later

https://huggingface.co/Qwen/Qwen2-VL-2B-Instruct/tree/main

Qwen2-VL-7B

Pre-training and fine-tuning

LLaMA-Factory

6.5.902 or later

https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct/tree/main

Qwen2-VL-72B

Pre-training and fine-tuning

LLaMA-Factory

6.5.902 or later

https://huggingface.co/Qwen/Qwen2-VL-72B-Instruct

Qwen2.5 VL

Qwen2.5-VL-3B

Reinforcement learning

VeRL

6.5.906 or later

https://huggingface.co/Qwen/Qwen2.5-VL-3B-Instruct

Pre-training and fine-tuning

MindSpeed-MM

6.5.907 or later

Pre-training and fine-tuning

LLaMA-Factory

6.5.907 or later

Qwen2.5-VL-7B

Pre-training and fine-tuning

LLaMA-Factory

6.5.905 or later

https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct

Pre-training and fine-tuning

MindSpeed-MM

6.5.907 or later

Reinforcement learning

VeRL

6.5.906 or later

Qwen2.5-VL-32B

Pre-training and fine-tuning

LLaMA-Factory

6.5.906 or later

https://huggingface.co/Qwen/Qwen2.5-VL-32B-Instruct

Reinforcement learning

VeRL

6.5.905 or later

Qwen2.5-VL-72B

Pre-training and fine-tuning

LLaMA-Factory

6.5.905 or later

https://huggingface.co/Qwen/Qwen2.5-VL-72B-Instruct

Reinforcement learning

VeRL

6.5.906 or later

Gemma

Gemma3-27b

Pre-training and fine-tuning

LLaMA-Factory

6.5.905 or later

https://huggingface.co/google/gemma-3-27b-it