Updated on 2025-11-04 GMT+08:00

Version Description and Requirements

Version Differences

Use this guide for ModelArts version 6.5.906 or newer. The latest version is 6.5.907. You are advised to use the latest software package and image.

Table 1 Version differences

Version

Description

6.5.907

Compared with 6.5.906, 6.5.907 has the following changes:

1. LLM inference framework: Qwen3-Embedding series, Qwen3-Reranker series, and Qwen3-Coder-480B-A35B are added.

2. Multimodal inference framework: InternVL3 series and Qwen2.5 VL support 128K sequences.

3. Some stability issues in version 6.5.906 are resolved.

Resource Specifications

In this document, the model runtime environment is ModelArts Lite Server. Snt9b and Snt9b23 resources are recommended.

Enable Lite Server resources and obtain passwords. Verify SSH access to all servers. Confirm proper network connectivity between them.

If a container is used or shared by multiple users, you should restrict the container from accessing the OpenStack management address (169.254.169.254) to prevent host machine metadata acquisition. For details, see Forbidding Containers to Obtain Host Machine Metadata.

Ascend-vLLM Version

This solution supports vLLM v0.9.0.

Image Version

The table below lists the base image addresses and their versions for this tutorial.

Table 2 Base image addresses

Usage

Address

Version

Snt9b base image

CN Southwest-Guiyang1:

swr.cn-southwest-2.myhuaweicloud.com/atelier/pytorch_ascend:pytorch_2.5.1-cann_8.2.rc1-py_3.11-hce_2.0.2503-aarch64-snt9b-20250729103313-3a25129

CN-Hong Kong:

swr.ap-southeast-1.myhuaweicloud.com/atelier/pytorch_ascend:pytorch_2.5.1-cann_8.2.rc1-py_3.11-hce_2.0.2503-aarch64-snt9b-20250729103313-3a25129

Cann: CANN 8.2.RC1

PyTorch: pytorch_2.5.1

Snt9b23 base image

CN Southwest-Guiyang1:

swr.cn-southwest-2.myhuaweicloud.com/atelier/pytorch_ascend:pytorch_2.5.1-cann_8.2.rc1-py_3.11-hce_2.0.2503-aarch64-snt9b23-20250729103313-3a25129

CN-Hong Kong:

swr.ap-southeast-1.myhuaweicloud.com/atelier/pytorch_ascend:pytorch_2.5.1-cann_8.2.rc1-py_3.11-hce_2.0.2503-aarch64-snt9b23-20250729103313-3a25129

Software Version

Table 3 lists the supported software versions and dependency packages.

Table 3 Software packages

Software Package Name

Description

How to Obtain

AscendCloud-6.5.907-20250910155849.zip

Inference framework and operator code package (suitable for Snt9b)

Download ModelArts 6.5.907 from Support-E.

NOTE:

If the software information does not appear when opening the download link, you lack access permissions. Contact your company's Huawei technical support for assistance with downloading.

AscendCloud-6.5.907-20250910161027.zip

Inference framework and operator code package (suitable for Snt9b23)

Software Package Structure

In this tutorial, the AscendCloud-LLM-xxx.zip software package in AscendCloud-xxx stores the key files of the inference code.
|——AscendCloud-LLM
 ├──llm_inference  # Inference code
    ├──ascend_vllm 
          ├── ascend_vllm              # Inference source code
          ├── install.sh               # Installation script
          ├── version.info             # Version information
          ├── Dockerfile               # Dockerfile for inference build image
          ├── vllm_list.patch          # Incremental patch for inference based on vLLM
          ├── vllm_service_profile.patch   # Incremental patch for inference based on vLLM
          ├── vllm_serving_chat.patch      # Incremental patch for inference based on vLLM
          ├── vllm-log-rotating.patch      # Incremental patch for inference based on vLLM
 ├──llm_tools           # Inference tool package
   ├──best_practices    # Best practice package
   ├──launch_server      # One-click startup script
   ├──llm_evaluation     # MME accuracy evaluation tool
   ├──PD_separate        # PD aggregation
   ├──simple_evals       # Accuracy evaluation tool
   ├──acs_bench-1.0.1-py3-none-any.whl     # Benchmark performance test tool package
   ├──acs_service_profiler-1.0.1-py3-none-any.whl    # Service profiling collection tool package