更新时间:2025-08-27 GMT+08:00
分享

支持的模型列表

表1 支持的大语言模型列表和权重获取地址

序号

模型名称

是否支持fp16/bf16推理

是否支持W4A16量化

是否支持W8A8量化

是否支持kv-cache-int8量化

是否支持Ascend_turbo graph

是否支持Acl_graph

v0/v1 后端

开源权重获取地址

1

DeepSeek-R1-Distill-Llama-8B

x

x

x

x

x

v1

deepseek-ai/DeepSeek-R1-Distill-Llama-8B · Hugging Face

2

DeepSeek-R1-Distill-Llama-70B

x

x

x

x

x

v1

deepseek-ai/DeepSeek-R1-Distill-Llama-70B · Hugging Face

3

DeepSeek-R1-Distill-Qwen-1.5B

x

x

x

v1

deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B · Hugging Face

4

DeepSeek-R1-Distill-Qwen-7B

x

x

x

v1

deepseek-ai/DeepSeek-R1-Distill-Qwen-7B · Hugging Face

5

DeepSeek-R1-Distill-Qwen-14B

x

x

x

v1

deepseek-ai/DeepSeek-R1-Distill-Qwen-14B · Hugging Face

6

glm-4-9b

x

x

x

x

x

v1

https://huggingface.co/THUDM/glm-4-9b-chat

7

llama3-8b

x

x

x

x

x

v1

https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct

8

llama3-70b

x

x

x

x

x

v1

https://huggingface.co/meta-llama/Meta-Llama-3-70B-Instruct

9

llama3.1-8b

x

x

x

x

x

v1

https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct

10

llama3.1-70b

x

x

x

x

x

v1

https://huggingface.co/meta-llama/Meta-Llama-3.1-70B-Instruct

11

llama-3.2-1B

x

x

x

x

x

v1

Llama-3.2-1B-Instruct · 模型库 (modelscope.cn)

12

llama-3.2-3B

x

x

x

x

x

v1

Llama-3.2-3B-Instruct · 模型库 (modelscope.cn)

13

qwen2-0.5b

x

v1

https://huggingface.co/Qwen/Qwen2-0.5B-Instruct

14

qwen2-1.5b

x

v1

https://huggingface.co/Qwen/Qwen2-1.5B-Instruct

15

qwen2-7b

x

v1

https://huggingface.co/Qwen/Qwen2-7B-Instruct

16

qwen2-72b

x

v1

https://huggingface.co/Qwen/Qwen2-72B-Instruct

17

qwen2.5-0.5b

x

v1

https://huggingface.co/Qwen/Qwen2.5-0.5B-Instruct

18

qwen2.5-1.5b

x

v1

https://huggingface.co/Qwen/Qwen2.5-1.5B-Instruct

19

qwen2.5-3b

x

v1

https://huggingface.co/Qwen/Qwen2.5-3B-Instruct

20

qwen2.5-7b

x

v1

https://huggingface.co/Qwen/Qwen2.5-7B-Instruct

21

qwen2.5-14b

x

v1

https://huggingface.co/Qwen/Qwen2.5-14B-Instruct

22

qwen2.5-32b

x

v1

https://huggingface.co/Qwen/Qwen2.5-32B-Instruct

23

qwen2.5-72b

x

v1

https://huggingface.co/Qwen/Qwen2.5-72B-Instruct

24

qwen3-0.6b

x

v1

https://huggingface.co/Qwen/Qwen3-0.6B

25

qwen3-1.7b

x

v1

https://huggingface.co/Qwen/Qwen3-1.7B

26

qwen3-4b

x

v1

https://huggingface.co/Qwen/Qwen3-4B

27

qwen3-8b

x

v1

https://huggingface.co/Qwen/Qwen3-8B

28

qwen3-14b

x

v1

https://huggingface.co/Qwen/Qwen3-14B

29

qwen3-30b-a3b

x

x

x

x

v1

https://huggingface.co/Qwen/Qwen3-30B-A3B

30

qwen3-32b

x

v1

https://huggingface.co/Qwen/Qwen3-32B

31

qwen3-235b-a22b

x

x

x

x

v1

https://huggingface.co/Qwen/Qwen3-235B-A22B

32

QwQ-32B

x

x

x

v1

https://huggingface.co/Qwen/QwQ-32B

33

bge-reranker-v2-m3

x

x

x

x

v0

https://huggingface.co/BAAI/bge-reranker-v2-m3

34

bge-base-en-v1.5

x

x

x

x

v0

https://huggingface.co/BAAI/bge-base-en-v1.5

35

bge-base-zh-v1.5

x

x

x

x

v0

https://huggingface.co/BAAI/bge-base-zh-v1.5

36

bge-large-en-v1.5

x

x

x

x

v0

https://huggingface.co/BAAI/bge-large-en-v1.5

37

bge-large-zh-v1.5

x

x

x

x

v0

https://huggingface.co/BAAI/bge-large-zh-v1.5

38

bge-m3

x

x

x

x

v0

https://huggingface.co/BAAI/bge-m3

表2 支持的多模态模型列表和权重获取地址

序号

模型名称

是否支持fp16/bf16推理

是否支持W4A16量化

是否支持W8A8量化

是否支持W8A16量化

是否支持kv-cache-int8量化

开源权重获取地址

备注

1

qwen2-vl-2B

x

x

x

x

https://huggingface.co/Qwen/Qwen2-VL-2B-Instruct/tree/main

-

2

qwen2-vl-7B

x

x

x

x

https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct/tree/main

-

3

qwen2-vl-72B

x

x

x

https://huggingface.co/Qwen/Qwen2-VL-72B-Instruct/tree/main

https://huggingface.co/Qwen/Qwen2-VL-72B-Instruct-AWQ

awq版本只支持eager模式

--enforce-eager

4

qwen2.5-vl-7B

x

x

x

x

https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct/tree/main

-

5

qwen2.5-vl-32B

x

x

x

x

https://huggingface.co/Qwen/Qwen2.5-VL-32B-Instruct/tree/main

-

6

qwen2.5-vl-72B

x

x

x

https://huggingface.co/Qwen/Qwen2.5-VL-72B-Instruct/tree/main

https://huggingface.co/Qwen/Qwen2.5-VL-72B-Instruct-AWQ/tree/main

awq版本只支持eager模式

--enforce-eager

7

internvl2.5-26B

x

x

x

x

https://huggingface.co/OpenGVLab/InternVL2_5-26B/tree/main

-

8

internvl2-llama3-76B-awq

x

x

x

x

https://huggingface.co/OpenGVLab/InternVL2-Llama3-76B-AWQ/tree/main

awq版本只支持eager模式

--enforce-eager

9

gemma3-27B

x

x

x

x

https://huggingface.co/google/gemma-3-27b-it/tree/main

-

各模型支持的卡数请参见各模型支持的最小卡数和最大序列章节。

相关文档