资源规划
| 序号 | 模型名称 | 是否支持fp16/bf16推理 | 是否支持W8A8量化 | v0/v1 后端 | 最小卡数(64G显存) | 最大序列(K) max-model-len | 开源权重获取地址 |
|---|---|---|---|---|---|---|---|
| 1 | Qwen3-14B | √ | x | v1 | 1 | 32 | |
| 2 | Qwen3-30B-A3B-Instruct-2507 | √ | x | v1 | 2 | 32 | |
| 3 | Qwen3-32B | √ | x | v1 | 2 | 32 | |
| 4 | Qwen3-235B-A22B-Thinking-2507 | √ | x | v1 | 16 | 64 | |
| 5 | Qwen3-235B-A22B-Instruct-2507 | √ | x | v1 | 16 | 64 | |
| 6 | DeepSeek-R1-Distill-Llama-70B | √ | x | v1 | 4 | 32 | https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-70B |
| 7 | Qwen3-Embedding-8B | √ | x | v0 | 1 | 40 | |
| 8 | Qwen3-Reranker-8B | √ | x | v0 | 1 | 40 | |
| 9 | bge-reranker-v2-m3 | √ | x | v0 | 1 | 8 | |
| 10 | bge-large-zh-v1.5 | √ | x | v0 | 1 | 0.5 | |
| 11 | bge-m3 | √ | x | v0 | 1 | 8 |