Supported Models
|
No. |
Model |
FP16/BF16 Inference Supported |
W4A16 Quantization Supported |
W8A8 Quantization Supported |
kv-cache-int8 Quantization Supported |
Ascend_turbo graph Supported |
Acl_graph Supported |
v0/v1 Backend |
Address for Obtaining the Open-Source Weight |
|---|---|---|---|---|---|---|---|---|---|
|
1 |
DeepSeek-R1-Distill-Llama-8B |
√ |
x |
x |
x |
x |
x |
v1 |
https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-8B |
|
2 |
DeepSeek-R1-Distill-Llama-70B |
√ |
x |
x |
x |
x |
x |
v1 |
https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-70B |
|
3 |
DeepSeek-R1-Distill-Qwen-1.5B |
√ |
x |
x |
x |
√ |
√ |
v1 |
https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B |
|
4 |
DeepSeek-R1-Distill-Qwen-7B |
√ |
x |
x |
x |
√ |
√ |
v1 |
https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-7B |
|
5 |
DeepSeek-R1-Distill-Qwen-14B |
√ |
x |
x |
x |
√ |
√ |
v1 |
https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-14B |
|
6 |
DeepSeek-R1-0528-Qwen3-8B |
√ |
x |
x |
x |
√ |
√ |
v1 |
https://huggingface.co/deepseek-ai/DeepSeek-R1-0528-Qwen3-8B |
|
7 |
glm-4-9b |
√ |
x |
x |
x |
x |
x |
v1 |
|
|
8 |
llama3-8b |
√ |
x |
x |
x |
x |
x |
v1 |
|
|
9 |
llama3-70b |
√ |
x |
x |
x |
x |
x |
v1 |
|
|
10 |
llama3.1-8b |
√ |
x |
x |
x |
x |
x |
v1 |
https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct |
|
11 |
llama3.1-70b |
√ |
x |
x |
x |
x |
x |
v1 |
https://huggingface.co/meta-llama/Meta-Llama-3.1-70B-Instruct |
|
12 |
llama-3.2-1B |
√ |
x |
x |
x |
x |
x |
v1 |
|
|
13 |
llama-3.2-3B |
√ |
x |
x |
x |
x |
x |
v1 |
|
|
14 |
qwen2-0.5b |
√ |
√ |
√ |
x |
√ |
√ |
v1 |
|
|
15 |
qwen2-1.5b |
√ |
√ |
√ |
x |
√ |
√ |
v1 |
|
|
16 |
qwen2-7b |
√ |
√ |
√ |
x |
√ |
√ |
v1 |
|
|
17 |
qwen2-72b |
√ |
√ |
√ |
x |
√ |
√ |
v1 |
|
|
18 |
qwen2.5-0.5b |
√ |
√ |
√ |
x |
√ |
√ |
v1 |
|
|
19 |
qwen2.5-1.5b |
√ |
√ |
√ |
x |
√ |
√ |
v1 |
|
|
20 |
qwen2.5-3b |
√ |
√ |
√ |
x |
√ |
√ |
v1 |
|
|
21 |
qwen2.5-7b |
√ |
√ |
√ |
x |
√ |
√ |
v1 |
|
|
22 |
qwen2.5-14b |
√ |
√ |
√ |
x |
√ |
√ |
v1 |
|
|
23 |
qwen2.5-32b |
√ |
√ |
√ |
x |
√ |
√ |
v1 |
|
|
24 |
qwen2.5-72b |
√ |
√ |
√ |
x |
√ |
√ |
v1 |
|
|
25 |
qwen3-0.6b |
√ |
√ |
√ |
x |
√ |
√ |
v1 |
|
|
26 |
qwen3-1.7b |
√ |
√ |
√ |
x |
√ |
√ |
v1 |
|
|
27 |
qwen3-4b |
√ |
√ |
√ |
x |
√ |
√ |
v1 |
|
|
28 |
qwen3-8b |
√ |
√ |
√ |
x |
√ |
√ |
v1 |
|
|
29 |
qwen3-14b |
√ |
√ |
√ |
x |
√ |
√ |
v1 |
|
|
30 |
qwen3-30b-a3b |
√ |
x |
x |
x |
√ |
x |
v1 |
|
|
31 |
qwen3-32b |
√ |
√ |
√ |
x |
√ |
√ |
v1 |
|
|
32 |
qwen3-235b-a22b |
√ |
x |
x |
x |
√ |
x |
v1 |
|
|
33 |
QwQ-32B |
√ |
x |
x |
x |
√ |
√ |
v1 |
|
|
34 |
Qwen3-Coder-480B-A35B |
√ |
x |
x |
x |
√ |
√ |
v1 |
|
|
35 |
Qwen3-Embedding-0.6B |
√ |
x |
x |
x |
x |
√ |
v0 |
|
|
36 |
Qwen3-Embedding-4B |
√ |
x |
x |
x |
x |
√ |
v0 |
|
|
37 |
Qwen3-Embedding-8B |
√ |
x |
x |
x |
x |
√ |
v0 |
|
|
38 |
Qwen3-Reranker-0.6B |
√ |
x |
x |
x |
x |
√ |
v0 |
|
|
39 |
Qwen3-Reranker-4B |
√ |
x |
x |
x |
x |
√ |
v0 |
|
|
40 |
Qwen3-Reranker-8B |
√ |
x |
x |
x |
x |
√ |
v0 |
|
|
41 |
bge-reranker-v2-m3 |
√ |
x |
x |
x |
x |
√ |
v0 |
|
|
42 |
bge-base-en-v1.5 |
√ |
x |
x |
x |
x |
√ |
v0 |
|
|
43 |
bge-base-zh-v1.5 |
√ |
x |
x |
x |
x |
√ |
v0 |
|
|
44 |
bge-large-en-v1.5 |
√ |
x |
x |
x |
x |
√ |
v0 |
|
|
45 |
bge-large-zh-v1.5 |
√ |
x |
x |
x |
x |
√ |
v0 |
|
|
46 |
bge-m3 |
√ |
x |
x |
x |
x |
√ |
v0 |
|
No. |
Model |
FP16/BF16 Inference Supported |
W4A16 Quantization Supported |
W8A8 Quantization Supported |
W8A16 Quantization Supported |
kv-cache-int8 Quantization Supported |
Address for Obtaining the Open-Source Weight |
Remarks |
|---|---|---|---|---|---|---|---|---|
|
1 |
qwen2-vl-2B |
√ |
x |
x |
x |
x |
- |
|
|
2 |
qwen2-vl-7B |
√ |
x |
x |
x |
x |
- |
|
|
3 |
qwen2-vl-72B |
√ |
√ |
x |
x |
x |
The awq version only supports the eager mode. --enforce-eager |
|
|
4 |
qwen2.5-vl-7B |
√ |
x |
x |
x |
x |
https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct/tree/main |
- |
|
5 |
qwen2.5-vl-32B |
√ |
x |
x |
x |
x |
https://huggingface.co/Qwen/Qwen2.5-VL-32B-Instruct/tree/main |
- |
|
6 |
qwen2.5-vl-72B |
√ |
√ |
x |
x |
x |
https://huggingface.co/Qwen/Qwen2.5-VL-72B-Instruct/tree/main https://huggingface.co/Qwen/Qwen2.5-VL-72B-Instruct-AWQ/tree/main |
The awq version only supports the eager mode. --enforce-eager |
|
7 |
internvl2.5-26B |
√ |
x |
x |
x |
x |
- |
|
|
8 |
internvl2-llama3-76B-awq |
√ |
x |
x |
x |
x |
https://huggingface.co/OpenGVLab/InternVL2-Llama3-76B-AWQ/tree/main |
The awq version only supports the eager mode. --enforce-eager |
|
9 |
gemma3-27B |
√ |
x |
x |
x |
x |
- |
|
|
10 |
internvl3-8B |
√ |
x |
x |
x |
x |
- |
|
|
11 |
internvl3-14B |
√ |
x |
x |
x |
x |
- |
|
|
12 |
internvl3-38B |
√ |
x |
x |
x |
x |
- |
|
|
13 |
internvl3-78B |
√ |
x |
x |
x |
x |
- |
For details about the number of PUs supported by each model, see Minimum Number of PUs and Maximum Sequence Length Supported by Each Model.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot