Version Description and Requirements
Version Differences
Use this guide for ModelArts version 6.5.906 or newer. The latest version is 6.5.907. You are advised to use the latest software package and image.
|
Version |
Description |
|---|---|
|
6.5.907 |
Compared with 6.5.906, 6.5.907 has the following changes: 1. LLM inference framework: Qwen3-Embedding series, Qwen3-Reranker series, and Qwen3-Coder-480B-A35B are added. 2. Multimodal inference framework: InternVL3 series and Qwen2.5 VL support 128K sequences. 3. Some stability issues in version 6.5.906 are resolved. |
Resource Specifications
In this document, the model runtime environment is ModelArts Lite Server. Snt9b and Snt9b23 resources are recommended.
Enable Lite Server resources and obtain passwords. Verify SSH access to all servers. Confirm proper network connectivity between them.
If a container is used or shared by multiple users, you should restrict the container from accessing the OpenStack management address (169.254.169.254) to prevent host machine metadata acquisition. For details, see Forbidding Containers to Obtain Host Machine Metadata.
Ascend-vLLM Version
This solution supports vLLM v0.9.0.
Image Version
The table below lists the base image addresses and their versions for this tutorial.
|
Usage |
Address |
Version |
|---|---|---|
|
Snt9b base image |
CN Southwest-Guiyang1: swr.cn-southwest-2.myhuaweicloud.com/atelier/pytorch_ascend:pytorch_2.5.1-cann_8.2.rc1-py_3.11-hce_2.0.2503-aarch64-snt9b-20250729103313-3a25129 CN-Hong Kong: swr.ap-southeast-1.myhuaweicloud.com/atelier/pytorch_ascend:pytorch_2.5.1-cann_8.2.rc1-py_3.11-hce_2.0.2503-aarch64-snt9b-20250729103313-3a25129 |
Cann: CANN 8.2.RC1 PyTorch: pytorch_2.5.1 |
|
Snt9b23 base image |
CN Southwest-Guiyang1: swr.cn-southwest-2.myhuaweicloud.com/atelier/pytorch_ascend:pytorch_2.5.1-cann_8.2.rc1-py_3.11-hce_2.0.2503-aarch64-snt9b23-20250729103313-3a25129 CN-Hong Kong: swr.ap-southeast-1.myhuaweicloud.com/atelier/pytorch_ascend:pytorch_2.5.1-cann_8.2.rc1-py_3.11-hce_2.0.2503-aarch64-snt9b23-20250729103313-3a25129 |
Software Version
Table 3 lists the supported software versions and dependency packages.
|
Software Package Name |
Description |
How to Obtain |
|---|---|---|
|
AscendCloud-6.5.907-20250910155849.zip |
Inference framework and operator code package (suitable for Snt9b) |
Download ModelArts 6.5.907 from Support-E.
NOTE:
If the software information does not appear when opening the download link, you lack access permissions. Contact your company's Huawei technical support for assistance with downloading. |
|
AscendCloud-6.5.907-20250910161027.zip |
Inference framework and operator code package (suitable for Snt9b23) |
Software Package Structure
|——AscendCloud-LLM
├──llm_inference # Inference code
├──ascend_vllm
├── ascend_vllm # Inference source code
├── install.sh # Installation script
├── version.info # Version information
├── Dockerfile # Dockerfile for inference build image
├── vllm_list.patch # Incremental patch for inference based on vLLM
├── vllm_service_profile.patch # Incremental patch for inference based on vLLM
├── vllm_serving_chat.patch # Incremental patch for inference based on vLLM
├── vllm-log-rotating.patch # Incremental patch for inference based on vLLM
├──llm_tools # Inference tool package
├──best_practices # Best practice package
├──launch_server # One-click startup script
├──llm_evaluation # MME accuracy evaluation tool
├──PD_separate # PD aggregation
├──simple_evals # Accuracy evaluation tool
├──acs_bench-1.0.1-py3-none-any.whl # Benchmark performance test tool package
├──acs_service_profiler-1.0.1-py3-none-any.whl # Service profiling collection tool package
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot