支持的模型列表
|
序号 |
模型名称 |
是否支持fp16/bf16推理 |
是否支持W4A16量化 |
是否支持W8A8量化 |
是否支持kv-cache-int8量化 |
是否支持Ascend_turbo graph |
是否支持Acl_graph |
v0/v1 后端 |
开源权重获取地址 |
|---|---|---|---|---|---|---|---|---|---|
|
1 |
DeepSeek-R1-Distill-Llama-8B |
√ |
x |
x |
x |
x |
x |
v1 |
https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-8B |
|
2 |
DeepSeek-R1-Distill-Llama-70B |
√ |
x |
x |
x |
x |
x |
v1 |
https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-70B |
|
3 |
DeepSeek-R1-Distill-Qwen-1.5B |
√ |
x |
x |
x |
√ |
√ |
v1 |
https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B |
|
4 |
DeepSeek-R1-Distill-Qwen-7B |
√ |
x |
x |
x |
√ |
√ |
v1 |
https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-7B |
|
5 |
DeepSeek-R1-Distill-Qwen-14B |
√ |
x |
x |
x |
√ |
√ |
v1 |
https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-14B |
|
6 |
DeepSeek-R1-0528-Qwen3-8B |
√ |
x |
x |
x |
√ |
√ |
v1 |
https://huggingface.co/deepseek-ai/DeepSeek-R1-0528-Qwen3-8B |
|
7 |
GLM-4-9B |
√ |
x |
x |
x |
x |
x |
v1 |
|
|
8 |
Llama3-8B |
√ |
x |
x |
x |
x |
x |
v1 |
|
|
9 |
Llama3-70B |
√ |
x |
x |
x |
x |
x |
v1 |
|
|
10 |
Llama3.1-8B |
√ |
x |
x |
x |
x |
x |
v1 |
https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct |
|
11 |
Llama3.1-70B |
√ |
x |
x |
x |
x |
x |
v1 |
https://huggingface.co/meta-llama/Meta-Llama-3.1-70B-Instruct |
|
12 |
Llama3.2-1B |
√ |
x |
x |
x |
x |
x |
v1 |
|
|
13 |
Llama3.2-3B |
√ |
x |
x |
x |
x |
x |
v1 |
|
|
14 |
Qwen2-0.5B |
√ |
√ |
√ |
x |
√ |
√ |
v1 |
|
|
15 |
Qwen2-1.5B |
√ |
√ |
√ |
x |
√ |
√ |
v1 |
|
|
16 |
Qwen2-7B |
√ |
√ |
√ |
x |
√ |
√ |
v1 |
|
|
17 |
Qwen2-72B |
√ |
√ |
√ |
x |
√ |
√ |
v1 |
|
|
18 |
Qwen2.5-0.5B |
√ |
√ |
√ |
x |
√ |
√ |
v1 |
|
|
19 |
Qwen2.5-1.5B |
√ |
√ |
√ |
x |
√ |
√ |
v1 |
|
|
20 |
Qwen2.5-3B |
√ |
√ |
√ |
x |
√ |
√ |
v1 |
|
|
21 |
Qwen2.5-7B |
√ |
√ |
√ |
x |
√ |
√ |
v1 |
|
|
22 |
Qwen2.5-14B |
√ |
√ |
√ |
x |
√ |
√ |
v1 |
|
|
23 |
Qwen2.5-32B |
√ |
√ |
√ |
x |
√ |
√ |
v1 |
|
|
24 |
Qwen2.5-72B |
√ |
√ |
√ |
x |
√ |
√ |
v1 |
|
|
25 |
Qwen3-0.6B |
√ |
√ |
√ |
x |
√ |
√ |
v1 |
|
|
26 |
Qwen3-1.7B |
√ |
√ |
√ |
x |
√ |
√ |
v1 |
|
|
27 |
Qwen3-4B |
√ |
√ |
√ |
x |
√ |
√ |
v1 |
|
|
28 |
Qwen3-8B |
√ |
√ |
√ |
x |
√ |
√ |
v1 |
|
|
29 |
Qwen3-14B |
√ |
√ |
√ |
x |
√ |
√ |
v1 |
|
|
30 |
Qwen3-30B-A3B |
√ |
x |
x |
x |
√ |
x |
v1 |
|
|
31 |
Qwen3-32B |
√ |
√ |
√ |
x |
√ |
√ |
v1 |
|
|
32 |
Qwen3-235B-A22B |
√ |
x |
√ |
x |
√ |
x |
v1 |
|
|
33 |
Qwen3-235B-A22B-Thinking-2507 |
√ |
x |
√ |
x |
√ |
x |
v1 |
|
|
34 |
Qwen3-235B-A22B-Instruct-2507 |
√ |
x |
√ |
x |
√ |
x |
v1 |
|
|
35 |
QwQ-32B |
√ |
x |
x |
x |
√ |
√ |
v1 |
|
|
36 |
Qwen3-Coder-480B-A35B |
√ |
x |
x |
x |
√ |
√ |
v1 |
|
|
37 |
Qwen3-Embedding-0.6B |
√ |
x |
x |
x |
x |
√ |
v0 |
|
|
38 |
Qwen3-Embedding-4B |
√ |
x |
x |
x |
x |
√ |
v0 |
|
|
39 |
Qwen3-Embedding-8B |
√ |
x |
x |
x |
x |
√ |
v0 |
|
|
40 |
Qwen3-Reranker-0.6B |
√ |
x |
x |
x |
x |
√ |
v0 |
|
|
41 |
Qwen3-Reranker-4B |
√ |
x |
x |
x |
x |
√ |
v0 |
|
|
42 |
Qwen3-Reranker-8B |
√ |
x |
x |
x |
x |
√ |
v0 |
|
|
43 |
bge-reranker-v2-m3 |
√ |
x |
x |
x |
x |
√ |
v0 |
|
|
44 |
bge-base-en-v1.5 |
√ |
x |
x |
x |
x |
√ |
v0 |
|
|
45 |
bge-base-zh-v1.5 |
√ |
x |
x |
x |
x |
√ |
v0 |
|
|
46 |
bge-large-en-v1.5 |
√ |
x |
x |
x |
x |
√ |
v0 |
|
|
47 |
bge-large-zh-v1.5 |
√ |
x |
x |
x |
x |
√ |
v0 |
|
|
48 |
bge-m3 |
√ |
x |
x |
x |
x |
√ |
v0 |
|
序号 |
模型名称 |
是否支持fp16/bf16推理 |
是否支持W4A16量化 |
是否支持W8A8量化 |
是否支持W8A16量化 |
是否支持kv-cache-int8量化 |
开源权重获取地址 |
备注 |
|---|---|---|---|---|---|---|---|---|
|
1 |
Qwen2-VL-2B |
√ |
x |
x |
x |
x |
- |
|
|
2 |
Qwen2-VL-7B |
√ |
x |
x |
x |
x |
- |
|
|
3 |
Qwen2-VL-72B |
√ |
√ |
x |
x |
x |
awq版本只支持eager模式 --enforce-eager |
|
|
4 |
Qwen2.5-VL-7B |
√ |
x |
x |
x |
x |
https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct/tree/main |
- |
|
5 |
Qwen2.5-VL-32B |
√ |
x |
x |
x |
x |
https://huggingface.co/Qwen/Qwen2.5-VL-32B-Instruct/tree/main |
- |
|
6 |
Qwen2.5-VL-72B |
√ |
√ |
x |
x |
x |
https://huggingface.co/Qwen/Qwen2.5-VL-72B-Instruct/tree/main https://huggingface.co/Qwen/Qwen2.5-VL-72B-Instruct-AWQ/tree/main |
awq版本只支持eager模式 --enforce-eager |
|
7 |
InternVL2.5-26B |
√ |
x |
x |
x |
x |
- |
|
|
8 |
InternVL2-llama3-76B-AWQ |
√ |
x |
x |
x |
x |
https://huggingface.co/OpenGVLab/InternVL2-Llama3-76B-AWQ/tree/main |
awq版本只支持eager模式 --enforce-eager |
|
9 |
GEMMA-3-27B |
√ |
x |
x |
x |
x |
- |
|
|
10 |
InternVL3-8B |
√ |
x |
x |
x |
x |
- |
|
|
11 |
InternVL3-14B |
√ |
x |
x |
x |
x |
- |
|
|
12 |
InternVL3-38B |
√ |
x |
x |
x |
x |
- |
|
|
13 |
InternVL3-78B |
√ |
x |
x |
x |
x |
- |
