单实例流量限制(QPS)
解释说明
单实例流量限制QPS和请求的输入输出有关,表1中的QPS推荐值是在多轮对话、摘要生产和信息检索场景下预估出的数据,仅供参考,如果要了解其余典型场景的QPS推荐值请联系技术支持。
单位:次/秒
在部署过程中出现错误码“ModelArts.81101”,且错误码信息为“Too many requests, the rate limit is %s times per second.”,表示QPS请求数量达到限制,建议等待限流结束后再重启服务。
|
模型名称 |
QPS推荐值 |
|---|---|
|
Baichuan2-13B |
1 |
|
Baichuan2-7B |
3 |
|
ChatGLM3-6B |
3 |
|
Llama2-13B |
1 |
|
Llama2-13B-AWQ |
1 |
|
Llama2-13B-SQ |
1 |
|
Llama2-70B |
1 |
|
Llama2-70B-AWQ |
1 |
|
Llama2-70B-SQ |
1 |
|
Llama2-7B |
3 |
|
Llama2-7B-AWQ |
3 |
|
Llama2-7B-SQ |
3 |
|
Llama3-70B |
1 |
|
Llama3-70B-AWQ |
1 |
|
Llama3-70B-SQ |
1 |
|
Llama3-8B |
3 |
|
Llama3-8B-AWQ |
3 |
|
Llama3-8B-SQ |
6 |
|
Llama3.1-70B |
1 |
|
Llama3.1-8B |
3 |
|
Qwen-7B |
3 |
|
Qwen-14B |
1 |
|
Qwen-72B |
1 |
|
QwQ-32B-16K |
1 |
|
Qwen1.5-7B |
3 |
|
Qwen1.5-7B-AWQ |
3 |
|
Qwen1.5-7B-SQ |
3 |
|
Qwen1.5-14B |
1 |
|
Qwen1.5-14B-AWQ |
1 |
|
Qwen1.5-14B-SQ |
1 |
|
Qwen1.5-32B |
1 |
|
Qwen1.5-72B |
1 |
|
Qwen1.5-72B-AWQ |
1 |
|
Qwen1.5-72B-SQ |
1 |
|
Qwen2-0.5B |
9 |
|
Qwen2-1.5B |
6 |
|
Qwen2-7B |
3 |
|
Qwen2-7B-AWQ |
3 |
|
Qwen2-72B |
1 |
|
Qwen2-72B-AWQ |
1 |
|
Qwen2-72B-SQ |
1 |
|
Qwen2-72B-1K |
1 |
|
Qwen2-72B-32K |
1 |
|
Qwen2.5-0.5B |
9 |
|
Qwen2.5-1.5B |
6 |
|
Qwen2.5-7B |
3 |
|
Qwen2.5-14B |
1 |
|
Qwen2.5-32B |
1 |
|
Qwen2.5-32B-AWQ |
1 |
|
Qwen2.5-32B-SQ |
1 |
|
Qwen2.5-72B |
1 |
|
Qwen2.5-72B-1K |
1 |
|
Qwen2.5-72B-8K |
1 |
|
Qwen2.5-72B-32K |
1 |
|
Qwen2.5-72B-AWQ |
1 |
|
Qwen2.5-72B-SQ |
1 |
|
Qwen2-VL-7B |
1 |
|
Glm-4-9B |
3 |
|
Yi-6B |
3 |
|
Yi-34B |
1 |
|
Deepseek-Coder-33B |
1 |
|
DeepSeek-R1 |
1 |
|
DeepSeek-V3 |
1 |
|
DeepSeek-R1-Distill-Qwen-14B |
1 |
|
DeepSeek-R1-Distill-Qwen-14B-4K |
1 |
|
DeepSeek-R1-Distill-Qwen-32B |
1 |
|
DeepSeek-R1-Distill-Qwen-32B-4K |
1 |
|
DeepSeek-R1-Distill-Qwen-32B-8K |
1 |
|
DeepSeek-R1-Distill-Qwen-32B-32K |
1 |
|
DeepSeek-R1-Distill-Llama-8B |
3 |
|
DeepSeek-R1-Distill-Llama-8B-4K |
3 |
|
DeepSeek-R1-Distill-Llama-70B-8K |
1 |