Updated on 2025-11-04 GMT+08:00

Graph Mode

What is ASCEND-TURBO-GRAPH?

AscendTurboGraph is a Host graph based on a Capture-Replay architecture. It effectively eliminates Host bottlenecks, supports dynamic shapes for model inputs, and does not require bucketing for graph construction, making the graph construction process faster. In the default mode (when the INFER_MODE environment variable is not set), some models will automatically use the ACLGraph mode to enhance performance.

ASCEND-TURBO-GRAPH Constraints

Currently, AscendTurboGraph only supports large language models (LLMs) of the Qwen2, Qwen2.5, and Qwen3 series architectures, including their quantized models. Due to the lack of adaptation for some operators, other scenarios are not yet supported.

ASCEND-TURBO-GRAPH Parameter Configuration

By default, the AscendTurboGraph mode is used, and the VLLM_PLUGINS environment variable is set as follows:

export VLLM_PLUGINS=ascend_vllm,kv_connectors

If you want to use the eager or acl-graph mode, the VLLM_PLUGINS environment variable should be set as follows:

export VLLM_PLUGINS=ascend
Table 1 Execution mode settings

Execution Mode

Configuration Item

Description

eager

--enforce-eager

The default value is False and has the highest priority.

AscendTurboGraph

(Recommended)

--additional-config='{"ascend_turbo_graph_config": {"enabled": true}}'

Needs to be explicitly set in additional_config.

ACLGraph

N/A

If not set, the default mode is ACLGraph.

ACLGraph mode is an experimental feature in the current version and is not recommended for use. It is recommended to use the AscendTurboGraph mode.