更新时间:2025-11-19 GMT+08:00
分享

支持的模型列表

表1 支持的大语言模型列表和权重获取地址

序号

模型名称

是否支持fp16/bf16推理

是否支持W4A16量化

是否支持W8A8量化

是否支持kv-cache-int8量化

是否支持Ascend_turbo graph

是否支持Acl_graph

v0/v1 后端

开源权重获取地址

1

DeepSeek-R1-Distill-Llama-8B

x

x

x

x

x

v1

https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-8B

2

DeepSeek-R1-Distill-Llama-70B

x

x

x

x

x

v1

https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-70B

3

DeepSeek-R1-Distill-Qwen-1.5B

x

x

x

v1

https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B

4

DeepSeek-R1-Distill-Qwen-7B

x

x

x

v1

https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-7B

5

DeepSeek-R1-Distill-Qwen-14B

x

x

x

v1

https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-14B

6

DeepSeek-R1-0528-Qwen3-8B

x

x

x

v1

https://huggingface.co/deepseek-ai/DeepSeek-R1-0528-Qwen3-8B

7

GLM-4-9B

x

x

x

x

x

v1

https://huggingface.co/THUDM/glm-4-9b-chat

8

Llama3-8B

x

x

x

x

x

v1

https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct

9

Llama3-70B

x

x

x

x

x

v1

https://huggingface.co/meta-llama/Meta-Llama-3-70B-Instruct

10

Llama3.1-8B

x

x

x

x

x

v1

https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct

11

Llama3.1-70B

x

x

x

x

x

v1

https://huggingface.co/meta-llama/Meta-Llama-3.1-70B-Instruct

12

Llama3.2-1B

x

x

x

x

x

v1

https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct

13

Llama3.2-3B

x

x

x

x

x

v1

https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct

14

Qwen2-0.5B

x

v1

https://huggingface.co/Qwen/Qwen2-0.5B-Instruct

15

Qwen2-1.5B

x

v1

https://huggingface.co/Qwen/Qwen2-1.5B-Instruct

16

Qwen2-7B

x

v1

https://huggingface.co/Qwen/Qwen2-7B-Instruct

17

Qwen2-72B

x

v1

https://huggingface.co/Qwen/Qwen2-72B-Instruct

18

Qwen2.5-0.5B

x

v1

https://huggingface.co/Qwen/Qwen2.5-0.5B-Instruct

19

Qwen2.5-1.5B

x

v1

https://huggingface.co/Qwen/Qwen2.5-1.5B-Instruct

20

Qwen2.5-3B

x

v1

https://huggingface.co/Qwen/Qwen2.5-3B-Instruct

21

Qwen2.5-7B

x

v1

https://huggingface.co/Qwen/Qwen2.5-7B-Instruct

22

Qwen2.5-14B

x

v1

https://huggingface.co/Qwen/Qwen2.5-14B-Instruct

23

Qwen2.5-32B

x

v1

https://huggingface.co/Qwen/Qwen2.5-32B-Instruct

24

Qwen2.5-72B

x

v1

https://huggingface.co/Qwen/Qwen2.5-72B-Instruct

25

Qwen3-0.6B

x

v1

https://huggingface.co/Qwen/Qwen3-0.6B

26

Qwen3-1.7B

x

v1

https://huggingface.co/Qwen/Qwen3-1.7B

27

Qwen3-4B

x

v1

https://huggingface.co/Qwen/Qwen3-4B

28

Qwen3-8B

x

v1

https://huggingface.co/Qwen/Qwen3-8B

29

Qwen3-14B

x

v1

https://huggingface.co/Qwen/Qwen3-14B

30

Qwen3-30B-A3B

x

x

x

x

v1

https://huggingface.co/Qwen/Qwen3-30B-A3B

31

Qwen3-32B

x

v1

https://huggingface.co/Qwen/Qwen3-32B

32

Qwen3-235B-A22B

x

x

x

v1

https://huggingface.co/Qwen/Qwen3-235B-A22B

33

Qwen3-235B-A22B-Thinking-2507

x

x

x

v1

https://huggingface.co/Qwen/Qwen3-235B-A22B-Thinking-2507

34

Qwen3-235B-A22B-Instruct-2507

x

x

x

v1

https://huggingface.co/Qwen/Qwen3-235B-A22B-Instruct-2507

35

QwQ-32B

x

x

x

v1

https://huggingface.co/Qwen/QwQ-32B

36

Qwen3-Coder-480B-A35B

x

x

x

v1

https://huggingface.co/Qwen/Qwen3-Coder-480B-A35B-Instruct

37

Qwen3-Embedding-0.6B

x

x

x

x

v0

https://huggingface.co/Qwen/Qwen3-Embedding-0.6B

38

Qwen3-Embedding-4B

x

x

x

x

v0

https://huggingface.co/Qwen/Qwen3-Embedding-4B

39

Qwen3-Embedding-8B

x

x

x

x

v0

https://huggingface.co/Qwen/Qwen3-Embedding-8B

40

Qwen3-Reranker-0.6B

x

x

x

x

v0

https://huggingface.co/Qwen/Qwen3-Reranker-0.6B

41

Qwen3-Reranker-4B

x

x

x

x

v0

https://huggingface.co/Qwen/Qwen3-Reranker-4B

42

Qwen3-Reranker-8B

x

x

x

x

v0

https://huggingface.co/Qwen/Qwen3-Reranker-4B

43

bge-reranker-v2-m3

x

x

x

x

v0

https://huggingface.co/BAAI/bge-reranker-v2-m3

44

bge-base-en-v1.5

x

x

x

x

v0

https://huggingface.co/BAAI/bge-base-en-v1.5

45

bge-base-zh-v1.5

x

x

x

x

v0

https://huggingface.co/BAAI/bge-base-zh-v1.5

46

bge-large-en-v1.5

x

x

x

x

v0

https://huggingface.co/BAAI/bge-large-en-v1.5

47

bge-large-zh-v1.5

x

x

x

x

v0

https://huggingface.co/BAAI/bge-large-zh-v1.5

48

bge-m3

x

x

x

x

v0

https://huggingface.co/BAAI/bge-m3

表2 支持的多模态模型列表和权重获取地址

序号

模型名称

是否支持fp16/bf16推理

是否支持W4A16量化

是否支持W8A8量化

是否支持W8A16量化

是否支持kv-cache-int8量化

开源权重获取地址

备注

1

Qwen2-VL-2B

x

x

x

x

https://huggingface.co/Qwen/Qwen2-VL-2B-Instruct/tree/main

-

2

Qwen2-VL-7B

x

x

x

x

https://huggingface.co/Qwen/Qwen2-VL-7B-Instruct/tree/main

-

3

Qwen2-VL-72B

x

x

x

https://huggingface.co/Qwen/Qwen2-VL-72B-Instruct/tree/main

https://huggingface.co/Qwen/Qwen2-VL-72B-Instruct-AWQ

awq版本只支持eager模式

--enforce-eager

4

Qwen2.5-VL-7B

x

x

x

x

https://huggingface.co/Qwen/Qwen2.5-VL-7B-Instruct/tree/main

-

5

Qwen2.5-VL-32B

x

x

x

x

https://huggingface.co/Qwen/Qwen2.5-VL-32B-Instruct/tree/main

-

6

Qwen2.5-VL-72B

x

x

x

https://huggingface.co/Qwen/Qwen2.5-VL-72B-Instruct/tree/main

https://huggingface.co/Qwen/Qwen2.5-VL-72B-Instruct-AWQ/tree/main

awq版本只支持eager模式

--enforce-eager

7

InternVL2.5-26B

x

x

x

x

https://huggingface.co/OpenGVLab/InternVL2_5-26B/tree/main

-

8

InternVL2-llama3-76B-AWQ

x

x

x

x

https://huggingface.co/OpenGVLab/InternVL2-Llama3-76B-AWQ/tree/main

awq版本只支持eager模式

--enforce-eager

9

GEMMA-3-27B

x

x

x

x

https://huggingface.co/google/gemma-3-27b-it/tree/main

-

10

InternVL3-8B

x

x

x

x

https://huggingface.co/OpenGVLab/InternVL3-8B/tree/main

-

11

InternVL3-14B

x

x

x

x

https://huggingface.co/OpenGVLab/InternVL3-14B/tree/main

-

12

InternVL3-38B

x

x

x

x

https://huggingface.co/OpenGVLab/InternVL3-38B/tree/main

-

13

InternVL3-78B

x

x

x

x

https://huggingface.co/OpenGVLab/InternVL3-78B/tree/main

-

各模型支持的卡数请参见各模型支持的最小卡数和最大序列章节。

相关文档