Help Center/
ModelArts/
Best Practices/
LLM Inference/
Adapting Mainstream Open-Source Models to Ascend-vLLM for NPU Inference Based on Lite Server (New)
Updated on 2025-11-04 GMT+08:00
Adapting Mainstream Open-Source Models to Ascend-vLLM for NPU Inference Based on Lite Server (New)
- Introduction to Ascend-vLLM
- Supported Models
- Minimum Number of PUs and Maximum Sequence Length Supported by Each Model
- Version Description and Requirements
- Inference Service Deployment
- Usage of Key Inference Features
- Inference Service Accuracy Evaluation
- Inference Service Performance Evaluation
- Appendix
Parent topic: LLM Inference
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
The system is busy. Please try again later.
For any further questions, feel free to contact us through the chatbot.
Chatbot