Version Description and Requirements

Version Differences

Use this guide for ModelArts version 6.5.906 or newer. The latest version is 6.5.907. You are advised to use the latest software package and image.

**Table 1** Version differences
Version	Description
6.5.907	Compared with 6.5.906, 6.5.907 has the following changes: 1. LLM inference framework: Qwen3-Embedding series, Qwen3-Reranker series, and Qwen3-Coder-480B-A35B are added. 2. Multimodal inference framework: InternVL3 series and Qwen2.5 VL support 128K sequences. 3. Some stability issues in version 6.5.906 are resolved.

Resource Specifications

In this document, the model runtime environment is ModelArts Lite Server. Snt9b and Snt9b23 resources are recommended.

Enable Lite Server resources and obtain passwords. Verify SSH access to all servers. Confirm proper network connectivity between them.

If a container is used or shared by multiple users, you should restrict the container from accessing the OpenStack management address (169.254.169.254) to prevent host machine metadata acquisition. For details, see Forbidding Containers to Obtain Host Machine Metadata.

Ascend-vLLM Version

This solution supports vLLM v0.9.0.

Image Version

The table below lists the base image addresses and their versions for this tutorial.

**Table 2** Base image addresses
Usage	Address	Version
Snt9b base image	CN Southwest-Guiyang1: swr.cn-southwest-2.myhuaweicloud.com/atelier/pytorch_ascend:pytorch_2.5.1-cann_8.2.rc1-py_3.11-hce_2.0.2503-aarch64-snt9b-20250729103313-3a25129 CN-Hong Kong: swr.ap-southeast-1.myhuaweicloud.com/atelier/pytorch_ascend:pytorch_2.5.1-cann_8.2.rc1-py_3.11-hce_2.0.2503-aarch64-snt9b-20250729103313-3a25129	Cann: CANN 8.2.RC1 PyTorch: pytorch_2.5.1
Snt9b23 base image	CN Southwest-Guiyang1: swr.cn-southwest-2.myhuaweicloud.com/atelier/pytorch_ascend:pytorch_2.5.1-cann_8.2.rc1-py_3.11-hce_2.0.2503-aarch64-snt9b23-20250729103313-3a25129 CN-Hong Kong: swr.ap-southeast-1.myhuaweicloud.com/atelier/pytorch_ascend:pytorch_2.5.1-cann_8.2.rc1-py_3.11-hce_2.0.2503-aarch64-snt9b23-20250729103313-3a25129	Cann: CANN 8.2.RC1 PyTorch: pytorch_2.5.1

Software Version

Table 3 lists the supported software versions and dependency packages.

**Table 3** Software packages
Software Package Name	Description	How to Obtain
AscendCloud-6.5.907-20250910155849.zip	Inference framework and operator code package (suitable for Snt9b)	Download ModelArts 6.5.907 from Support-E. NOTE: If the software information does not appear when opening the download link, you lack access permissions. Contact your company's Huawei technical support for assistance with downloading.
AscendCloud-6.5.907-20250910161027.zip	Inference framework and operator code package (suitable for Snt9b23)

Software Package Structure

In this tutorial, the AscendCloud-LLM-xxx.zip software package in AscendCloud-xxx stores the key files of the inference code.

|——AscendCloud-LLM
 ├──llm_inference  # Inference code
    ├──ascend_vllm 
          ├── ascend_vllm              # Inference source code
          ├── install.sh               # Installation script
          ├── version.info             # Version information
          ├── Dockerfile               # Dockerfile for inference build image
          ├── vllm_list.patch          # Incremental patch for inference based on vLLM
          ├── vllm_service_profile.patch   # Incremental patch for inference based on vLLM
          ├── vllm_serving_chat.patch      # Incremental patch for inference based on vLLM
          ├── vllm-log-rotating.patch      # Incremental patch for inference based on vLLM
 ├──llm_tools           # Inference tool package
   ├──best_practices    # Best practice package
   ├──launch_server      # One-click startup script
   ├──llm_evaluation     # MME accuracy evaluation tool
   ├──PD_separate        # PD aggregation
   ├──simple_evals       # Accuracy evaluation tool
   ├──acs_bench-1.0.1-py3-none-any.whl     # Benchmark performance test tool package
   ├──acs_service_profiler-1.0.1-py3-none-any.whl    # Service profiling collection tool package

Parent topic: Adapting Mainstream Open-Source Models to Ascend-vLLM for NPU Inference Based on Lite Server (New)

Previous topic: Minimum Number of PUs and Maximum Sequence Length Supported by Each Model

Next topic: Inference Service Deployment