Graph Mode
What is ASCEND-TURBO-GRAPH?
AscendTurboGraph is a Host graph based on a Capture-Replay architecture. It effectively eliminates Host bottlenecks, supports dynamic shapes for model inputs, and does not require bucketing for graph construction, making the graph construction process faster. In the default mode (when the INFER_MODE environment variable is not set), some models will automatically use the ACLGraph mode to enhance performance.
ASCEND-TURBO-GRAPH Constraints
Currently, AscendTurboGraph only supports large language models (LLMs) of the Qwen2, Qwen2.5, and Qwen3 series architectures, including their quantized models. Due to the lack of adaptation for some operators, other scenarios are not yet supported.
ASCEND-TURBO-GRAPH Parameter Configuration
By default, the AscendTurboGraph mode is used, and the VLLM_PLUGINS environment variable is set as follows:
export VLLM_PLUGINS=ascend_vllm,kv_connectors
If you want to use the eager or acl-graph mode, the VLLM_PLUGINS environment variable should be set as follows:
export VLLM_PLUGINS=ascend
|
Execution Mode |
Configuration Item |
Description |
|---|---|---|
|
eager |
--enforce-eager |
The default value is False and has the highest priority. |
|
AscendTurboGraph (Recommended) |
--additional-config='{"ascend_turbo_graph_config": {"enabled": true}}' |
Needs to be explicitly set in additional_config. |
|
ACLGraph |
N/A |
If not set, the default mode is ACLGraph. ACLGraph mode is an experimental feature in the current version and is not recommended for use. It is recommended to use the AscendTurboGraph mode. |
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot