Inference Guide for Wan Series Video Generation Models Adapted to PyTorch NPU via Lite Server
Solution Overview
This section describes how to use NPUs to perform text-to-video inference, image-to-video inference, and text-to-image inference using Wan2.1 and Wan2.2 video generation models in ModelArts Lite Server. To deploy the solution, contact Huawei technical support to purchase Server resources.
Wan series generation models are supported, including Wan2.1-T2V-14B-Diffusers, Wan2.1-T2V-1.3B-Diffusers, Wan2.1-I2V-14B-480P-Diffusers, Wan2.1-I2V-14B-720P-Diffusers, Wan2.2-T2V-A14B-Diffusers, and Wan2.2-I2V-A14B-Diffusers.
Resource Specifications
Use Snt9B or Snt9B23 single-node resources in the Lite Server environment.
Name |
Version |
---|---|
Driver |
25.2.1 |
PyTorch |
pytorch_2.5.1 |
Name |
Version |
---|---|
Driver |
25.2.1 |
PyTorch |
pytorch_2.5.1 |
Obtaining the Software Package and Images
Category |
Name |
How to Obtain |
---|---|---|
Plug-in code package |
AscendCloud-AIGC-6.5.907-xxx.zip in the AscendCloud-6.5.907-xxx.zip software package
NOTE:
xxx in the package name indicates the timestamp, which is subject to the actual package release time. |
Download ModelArts 6.5.907.2 from Support-E.
NOTE:
If the software information does not appear when opening the download link, you lack access permissions. Contact your company's Huawei technical support for assistance with downloading. |
Base image |
Snt9B23: CN North-Ulanqab1, CN East 2, and CN Southwest-Guiyang1 swr.cn-southwest-2.myhuaweicloud.com/atelier/pytorch_ascend:pytorch_2.5.1-cann_8.2.rc1-py_3.11-hce_2.0.2503-aarch64-snt9b23-20250729103313-3a25129 Snt9B: CN East 2 and CN Southwest-Guiyang1 swr.cn-southwest-2.myhuaweicloud.com/atelier/pytorch_ascend:pytorch_2.5.1-cann_8.2.rc1-py_3.11-hce_2.0.2503-aarch64-snt9b-20250729103313-3a25129 |
Pull the image from SWR. |
Base image |
Snt9B23: CN-Hong Kong swr.ap-southeast-1.myhuaweicloud.com/atelier/pytorch_ascend:pytorch_2.5.1-cann_8.2.rc1-py_3.11-hce_2.0.2503-aarch64-snt9b23-20250729103313-3a25129 Snt9B: CN-Hong Kong swr.ap-southeast-1.myhuaweicloud.com/atelier/pytorch_ascend:pytorch_2.5.1-cann_8.2.rc1-py_3.11-hce_2.0.2503-aarch64-snt9b-20250729103313-3a25129 |
Pull the image from SWR. |
Constraints
- This document applies to ModelArts 6.5.907.2. Obtain the required software package and image by referring to Table 3. Follow the version mapping when using this document.
- Ensure that the container can access the Internet.
Step 1: Preparing the Environment
- Enable Lite Server resources and obtain passwords. Verify SSH access to all servers. Confirm proper network connectivity between them.
If a container is used or shared by multiple users, you should restrict the container from accessing the OpenStack management address (169.254.169.254) to prevent host machine metadata acquisition. For details, see Forbidding Containers to Obtain Host Machine Metadata.
- Log in to the server via SSH and check the NPUs. Obtain the NPU device information:
npu-smi info # Run this command on each instance node to view the NPU status. npu-smi info -l | grep Total # Run this command on each instance node to view the total number of PUs.
If an error occurs, the NPU devices on the server may not be properly installed, or the NPU image may be mounted to another container. Install the firmware and driver or release the mounted NPUs.
- Check whether Docker is installed.
docker -v # Check whether Docker is installed.
If Docker is not installed, run this command:
yum install -y docker-engine.aarch64 docker-engine-selinux.noarch docker-runc.aarch64
- Configure IP forwarding for intra-container network accesses. Run the command below to check the value of net.ipv4.ip_forward. Skip this step if the value is 1.
sysctl -p | grep net.ipv4.ip_forward
If the value is not 1, configure IP forwarding:sed -i 's/net\.ipv4\.ip_forward=0/net\.ipv4\.ip_forward=1/g' /etc/sysctl.conf sysctl -p | grep net.ipv4.ip_forward
Step 2: Obtaining the Base Image
Use official images to deploy inference services. For details about the image path {image_url}, see Table 3.
docker pull {image_url}
To log in to the SWR console, log in to the SWR console and obtain the login command by referring to the figure below.
Step 3: Starting the Container Image
- Start the container image. Before starting the container, modify the parameters in ${} according to the parameter description.
Start the Snt9B23 container:
export work_dir="Custom mounted working directory" export container_work_dir="Custom working directory mounted to the container" export container_name="Custom container name" export image_name="Image name or ID" // Start a container to run the image. docker run -itd --net=host \ --privileged \ --device=/dev/davinci_manager \ --device=/dev/devmm_svm \ --device=/dev/hisi_hdc \ --shm-size=256g \ -v /usr/local/dcmi:/usr/local/dcmi \ -v /usr/local/Ascend/driver:/usr/local/Ascend/driver \ -v /var/log/npu/:/usr/slog \ -v /usr/local/sbin/npu-smi:/usr/local/sbin/npu-smi \ -v ${work_dir}:${container_work_dir} \ --name ${container_name} \ ${image_name} \ /bin/bash
Start the Snt9B container:
export work_dir="Custom mounted working directory" export container_work_dir="Custom working directory mounted to the container" export container_name="Custom container name" export image_name="Image name or ID" // Start a container to run the image. docker run -itd --net=bridge \ --device=/dev/davinci0 \ --device=/dev/davinci1 \ --device=/dev/davinci2 \ --device=/dev/davinci3 \ --device=/dev/davinci4 \ --device=/dev/davinci5 \ --device=/dev/davinci6 \ --device=/dev/davinci7 \ --device=/dev/davinci_manager \ --device=/dev/devmm_svm \ --device=/dev/hisi_hdc \ --shm-size=256g \ -v /usr/local/dcmi:/usr/local/dcmi \ -v /usr/local/Ascend/driver:/usr/local/Ascend/driver \ -v /var/log/npu/:/usr/slog \ -v /usr/local/sbin/npu-smi:/usr/local/sbin/npu-smi \ -v ${work_dir}:${container_work_dir} \ --name ${container_name} \ ${image_name} \ /bin/bash
Parameters:
- -v ${work_dir}:${container_work_dir}: host directory to be mounted to the container. The host and container use different file systems. work_dir indicates the working directory on the host. The directory stores files such as code and data required for the project. container_work_dir indicates the directory to be mounted to the container. The two paths can be the same.
- The /home/ma-user directory cannot be mounted to the container. This directory is the home directory of user ma-user. If the container is mounted to /home/ma-user, the container conflicts with the base image when being started. As a result, the base image is unavailable.
- Both the driver and npu-smi must be mounted to the container.
- --name ${container_name}: container name, which is used when you access the container. You can define a container name.
- ${image_name}: name of the base image of the corresponding model. For details, see Table 3.
- --device=/dev/davinci0: mount the corresponding PU to the container. If multiple PUs need to be mounted, add the configuration items one by one.
- -v ${work_dir}:${container_work_dir}: host directory to be mounted to the container. The host and container use different file systems. work_dir indicates the working directory on the host. The directory stores files such as code and data required for the project. container_work_dir indicates the directory to be mounted to the container. The two paths can be the same.
- Access the container through the container name.
Log in to the Snt9B23 as user root.
docker exec -it -u root ${container_name} bash
For Snt9B, the ma-user user is used by default. All the subsequent operations are performed as user ma-user.docker exec -it ${container_name} bash
Step 4: Installing Dependencies and Software Packages
- To use git clone and git lfs commands to download large models, see the following operations:
- Enter the URL below in the browser to download the git-lfs package and upload it to the /home/ma-user directory of the container.
https://github.com/git-lfs/git-lfs/releases/download/v3.2.0/git-lfs-linux-arm64-v3.2.0.tar.gz
Alternatively, download git-lfs to the container for direct use.cd /home/ma-user wget https://github.com/git-lfs/git-lfs/releases/download/v3.2.0/git-lfs-linux-arm64-v3.2.0.tar.gz
- Go to the container and run the git-lfs installation commands.
cd /home/ma-user tar -zxvf git-lfs-linux-arm64-v3.2.0.tar.gz cd git-lfs-3.2.0 sudo sh install.sh
- Disable SSL verification for Git configuration.
git config --global http.sslVerify false
- Enter the URL below in the browser to download the git-lfs package and upload it to the /home/ma-user directory of the container.
- Install the AscendX_Video software package.
- Upload the AscendX_Video software package AscendCloud-AIGC-*.zip to the /home/ma-user directory of the container. For details about how to obtain the package, see Obtaining the Software Package and Images.
- Decompress the AscendCloud-AIGC-*.zip file, and run the following commands to install the Python dependencies:
cd /home/ma-user unzip AscendCloud-AIGC-*.zip -d ./AscendCloud cp -r /home/ma-user/AscendCloud/aigc_inference/torch_npu/ascendx_video ./ cd /home/ma-user/ascendx_video pip install seal-*-linux_aarch64.whl pip install check_device-*-linux_aarch64.whl pip install ascendx_video-*-none-any.whl
- Install the operator environment.
If the Snt9B23 machine is used, run the following command:
cd /home/ma-user/AscendCloud/opp/A3
If the Snt9B machine is used, run the following command:
cd /home/ma-user/AscendCloud/opp/A2
Install the operator:unzip AscendCloud-OPP-*.zip unzip AscendCloud-OPP-*-torch-2.5.1-py311-*.zip -d ./AscendCloud_OPP cd AscendCloud_OPP pip install *.whl mkdir -p /home/ma-user/operate bash ./ascend_cloud_ops_ascend_turbo-*_linux_aarch64.run --install-path=/home/ma-user/operate bash ./ascend_cloud_ops_custom_opp-*_linux_aarch64_ascend910b_ascend910_93.run --install-path=/home/ma-user/operate cd .. unzip AscendCloud-OPS-ADV-*.zip -d ./AscendCloud_OPS-ADV cd AscendCloud_OPS-ADV bash ./CANN-custom_ops-*-linux.aarch64.run --install-path=/home/ma-user/operate
3. Initialize environment variables.
Note that the environment needs to be initialized each time you access the container.
source /home/ma-user/operate/AscendTurbo/set_env.bash source /home/ma-user/operate/vendors/customize/bin/set_env.bash source /home/ma-user/operate/vendors/customize_cloud/bin/set_env.bash
Step 5: Downloading Model Weights
Download the weight file to the container directory. The following lists the model addresses.
- Wan-AI/Wan2.1-T2V-14B-Diffusers: https://huggingface.co/Wan-AI/Wan2.1-T2V-14B-Diffusers
- Wan-AI/Wan2.1-T2V-1.3B-Diffusers: https://huggingface.co/Wan-AI/Wan2.1-T2V-1.3B-Diffusers
- Wan-AI/Wan2.1-I2V-14B-480P-Diffusers: https://huggingface.co/Wan-AI/Wan2.1-I2V-14B-480P-Diffusers
- Wan-AI/Wan2.1-I2V-14B-720P-Diffusers: https://huggingface.co/Wan-AI/Wan2.1-I2V-14B-720P-Diffusers
- Wan-AI/Wan2.2-T2V-A14B-Diffusers: https://huggingface.co/Wan-AI/Wan2.2-T2V-A14B-Diffusers
- Wan-AI/Wan2.2-I2V-A14B-Diffusers: https://huggingface.co/Wan-AI/Wan2.2-I2V-A14B-Diffusers
Save the weights to the /home/ma-user/ascendx_video/weights directory, for example:
weights └──Wan-AI ├──Wan2.1-I2V-14B-480P-Diffusers ├──Wan2.1-I2V-14B-720P-Diffusers ├──Wan2.1-T2V-14B-Diffusers ├──Wan2.1-T2V-1.3B-Diffusers ├──Wan2.2-I2V-A14B-Diffusers └──Wan2.2-T2V-A14B-Diffusers
Step 6: Performing Inference Using the Wan2.1 Text-to-Video Model
The following scripts are stored in the /home/ma-user/ascendx_video/scripts/ directory:
- infer_wan2.1_14b_t2v_480p.sh: 480P inference script of the Wan text-to-video model Wan2.1-T2V-14B.
- infer_wan2.1_14b_t2v_720p.sh: 720P inference script of the Wan text-to-video model Wan2.1-T2V-14B.
- infer_wan2.1_1.3b_t2v.sh: inference script of the Wan text-to-video model Wan2.1-T2V-1.3B.
Run the commands below to start the inference task. The following uses infer_wan2.1_14b_t2v_480p.sh as an example.
cd /home/ma-user/ascendx_video/scripts/ bash infer_wan2.1_14b_t2v_480p.sh
The following describes the parameters of the text-to-video inference script infer_wan2.1_14b_t2v_480p.sh. The parameters of scripts infer_wan2.1_1.3b_t2v.sh and infer_wan2.1_14b_t2v_720p.sh are similar to those of infer_wan2.1_14b_t2v_480p.sh.
export MASTER_ADDR=127.0.0.1 export MASTER_PORT=29505 export PYTORCH_NPU_ALLOC_CONF=expandable_segments:True export MEMORY_FRAGMENTATION=1 export COMBINED_ENABLE=1 export TASK_QUEUE_ENABLE=2 export TOKENIZERS_PARALLELISM=false export ASCEND_RT_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 N_NPUS=8 torchrun --nproc_per_node=$N_NPUS --master_addr $MASTER_ADDR --master_port $MASTER_PORT ../infer.py \ --model Wan2.1-T2V-14B \ --pretrained_model_name_or_path "../weights/Wan-AI/Wan2.1-T2V-14B-Diffusers" \ --save_path ./output.mp4 \ --num_inference_steps 50 \ --width 832 \ --height 480 \ --frames 81 \ --sp $N_NPUS \ --fsdp \ --vae_lightning \ --turbo_mode faiz \ --atten_a8w8 \ --matmul_a8w8 \ --rope_fused \ --seed 42 \ --prompt "A young boy with short brown hair, dressed in a dark blue t-shirt and red pants, is seen playing a KAWAI upright piano with skill and concentration. The piano's glossy black surface reflects the room's lighting, and its white and black keys are arranged in a standard layout, indicating a scene of musical practice or learning. The boy's hands move over the keys, suggesting he is engaged in playing or practicing a piece." \ --negative_prompt "vivid tone, overexposure, static, blurred details, subtitle, style, work, painting, picture, static, overall gray, worst quality, low quality, JPEG compression residue, ugly, incomplete, redundant fingers, poorly drawn hands, poorly drawn face, deformed, disfigured, deformed limbs, finger fusion, static picture, messy background, three legs, many people in the background, walking backward"
- ASCEND_RT_VISIBLE_DEVICES: ID of the used PU.
- N_NPUS: number of used PUs. You are advised to use eight PUs.
- model: supported inference model. Currently, Wan2.1-T2V-14B, Wan2.1-I2V-14B, Wan2.1-T2V-1.3B, Wan2.2-T2V-A14B, and Wan2.2-I2V-A14B are supported.
- pretrained_model_name_or_path: weight address of the corresponding model.
- save_path: path for storing the video generated during inference.
- num_inference_steps: number of inference steps.
- frames, height, width: dimensions of the generated video, including the number of frames, height, and width. Currently, 81 x 480 x 832, 121 x 480 x 832, 81 x 720 x 1280, and 121 x 720 x 1280 are supported.
- prompt, negative_prompt: positive and negative prompts for generating a video.
- sp: sequence parallelism parameter. It is recommended that the value be the same as the number of inference PUs.
- fsdp: data parallelism. None, all, text_encoder, and transformer are supported. The default value is None, indicating that the function is disabled. If this function is enabled, the default value is all, indicating that parallelism is enabled for text_encoder and transformer. If this parameter is set to text_encoder or transformer, parallelism is enabled only for the specified module.
- vae_lightning: VAE acceleration. This parameter is supported only in the multi-PU scenario. If this parameter is not set, VAE acceleration is disabled. Enabling this function can improve VAE performance.
- turbo_mode: acceleration mode. default and faiz are supported. The default value is default, indicating that the function is disabled. faiz is recommended for high performance. If this parameter is not set, the acceleration mode is disabled. Enabling this function can accelerate video inference, but slightly affects the accuracy.
- atten_a8w8: atten quantization acceleration. Set this parameter for high performance. If this parameter is not set, atten quantization acceleration is disabled. Enabling this function can accelerate video inference, but slightly affects the accuracy.
- matmul_a8w8: matmul quantization acceleration. Set this parameter for high performance. If this parameter is not set, matmul quantization acceleration is disabled. Enabling this function can accelerate video inference, but slightly affects the accuracy.
- rope_fused: rotary position encoding fusion operator. Set this parameter for high performance. If this parameter is not set, the fusion operator is disabled. Enabling this function can accelerate video inference, but slightly affects the accuracy.
- seed: Random number seed. The default value is 42, which affects the effect of the generated image.
After the inference task is complete, the generated video file output.mp4 is stored in the save_path directory, and the script is stored in the /home/ma-user/ascendx_video/scripts directory by default. From there, view the inference result.
Step 7: Performing Inference Using the Wan2.1 Image-to-Video Model
Before starting inference using the image-to-video model, download the sample image and save it to the /home/ma-user/ascendx_video/scripts directory.

The following scripts are stored in the /home/ma-user/ascendx_video/scripts/ directory:
- infer_wan2.1_14b_i2v_480p.sh: 480P inference script of the Wan2.1-I2V-14B image-to-video model.
- infer_wan2.1_14b_i2v_720p.sh: 720P inference script of the Wan2.1-I2V-14B image-to-video model.
Run the commands below to start the inference task. The following uses infer_wan2.1_14b_i2v_480p.sh as an example.
cd /home/ma-user/ascendx_video/scripts/ bash infer_wan2.1_14b_i2v_480p.sh
The following describes the parameters of the image-to-video inference script infer_wan2.1_14b_i2v_480p.sh. The parameters of script infer_wan2.1_14b_i2v_720p.sh are similar to those of infer_wan2.1_14b_i2v_480p.sh.
export MASTER_ADDR=127.0.0.1 export MASTER_PORT=29505 export PYTORCH_NPU_ALLOC_CONF=expandable_segments:True export MEMORY_FRAGMENTATION=1 export COMBINED_ENABLE=1 export TASK_QUEUE_ENABLE=2 export TOKENIZERS_PARALLELISM=false export ASCEND_RT_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 N_NPUS=8 torchrun --nproc_per_node=$N_NPUS --master_addr $MASTER_ADDR --master_port $MASTER_PORT ../infer.py \ --model Wan2.1-I2V-14B \ --pretrained_model_name_or_path "../weights/Wan-AI/Wan2.1-I2V-14B-480P-Diffusers" \ --task_type i2v \ --i2v_image_path ./astronaut.jpg \ --save_path ./output.mp4 \ --num_inference_steps 40 \ --width 832 \ --height 480 \ --frames 81 \ --sp $N_NPUS \ --fsdp \ --vae_lightning \ --turbo_mode faiz \ --atten_a8w8 \ --matmul_a8w8 \ --rope_fused \ --seed 42 \ --prompt "An astronaut hatching from an egg, on the surface of the moon, the darkness and depth of space realised in the background. High quality, ultrarealistic detail and breath-taking movie-like camera shot." \ --negative_prompt "vivid tone, overexposure, static, blurred details, subtitle, style, work, painting, picture, static, overall gray, worst quality, low quality, JPEG compression residue, ugly, incomplete, redundant fingers, poorly drawn hands, poorly drawn face, deformed, disfigured, deformed limbs, finger fusion, static picture, messy background, three legs, many people in the background, walking backward"
- task_type: inference task type. The value can be t2v (text-to-video), i2v (image-to-video), or t2i (text-to-image). The default value is i2v.
- i2v_image_path: path of the image used for video generation.
- For other parameters, use the same settings as those of infer_wan_14b_t2v_480p.sh. For details, see Step 6: Performing Inference Using the Wan2.1 Text-to-Video Model.
After the inference task is complete, the generated video file output.mp4 is stored in the save_path directory, and the script is stored in the /home/ma-user/ascendx_video/scripts directory by default. From there, view the inference result.
Step 8: Performing Inference Using the Wan2.1 Text-to-Image Model
The following scripts are stored in the /home/ma-user/ascendx_video/scripts/ directory:
- infer_wan2.1_14b_t2i_480p.sh: 480P inference script of the Wan2.1-T2V-14B text-to-image model.
- infer_wan2.1_14b_t2i_720p.sh: 720P inference script of the Wan2.1-T2V-14B text-to-image model.
Run the commands below to start the inference task. The following uses infer_wan2.1_14b_t2i_480p.sh as an example.
cd /home/ma-user/ascendx_video/scripts/ bash infer_wan2.1_14b_t2i_480p.sh
The following describes the parameters of the text-to-image inference script infer_wan2.1_14b_t2i_480p.sh. The parameters of script infer_wan2.1_14b_t2i_720p.sh are similar to those of infer_wan2.1_14b_t2i_480p.sh.
export MASTER_ADDR=127.0.0.1 export MASTER_PORT=29505 export PYTORCH_NPU_ALLOC_CONF=expandable_segments:True export MEMORY_FRAGMENTATION=1 export COMBINED_ENABLE=1 export TASK_QUEUE_ENABLE=2 export TOKENIZERS_PARALLELISM=false export ASCEND_RT_VISIBLE_DEVICES=0 N_NPUS=1 torchrun --nproc_per_node=$N_NPUS --master_addr $MASTER_ADDR --master_port $MASTER_PORT ../infer.py \ --model Wan2.1-T2V-14B \ --pretrained_model_name_or_path "../weights/Wan-AI/Wan2.1-T2V-14B-Diffusers" \ --task_type t2i \ --save_path ./output.png \ --num_inference_steps 40 \ --width 832 \ --height 480 \ --frames 1 \ --atten_a8w8 \ --matmul_a8w8 \ --rope_fused \ --seed 42 \ --prompt "An astronaut hatching from an egg, on the surface of the moon, the darkness and depth of space realised in the background. High quality, ultrarealistic detail and breath-taking movie-like camera shot." \ --negative_prompt "vivid tone, overexposure, static, blurred details, subtitle, style, work, painting, picture, static, overall gray, worst quality, low quality, JPEG compression residue, ugly, incomplete, redundant fingers, poorly drawn hands, poorly drawn face, deformed, disfigured, deformed limbs, finger fusion, static picture, messy background, three legs, many people in the background, walking backward"
The parameters are the same as those of infer_wan_14b_t2v.sh. For details, see Step 6: Performing Inference Using the Wan2.1 Text-to-Video Model.
After the inference task is complete, the generated image file output.png is stored in the save_path directory, and the script is stored in the /home/ma-user/ascendx_video/scripts directory by default. From there, view the inference result.
Step 9: Performing Inference Using the Wan2.2 Text-to-Video Model
The following scripts are stored in the /home/ma-user/ascendx_video/scripts/ directory:
- infer_wan2.2_14b_t2v_480p.sh: 480P inference script of the Wan text-to-video model Wan2.2-T2V-A14B-Diffusers.
- infer_wan2.2_14b_t2v_720p.sh: 720P inference script of the Wan text-to-video model Wan2.2-T2V-A14B-Diffusers.
Run the commands below to start the inference task. The following uses infer_wan2.2_14b_t2v_480p.sh as an example.
cd /home/ma-user/ascendx_video/scripts/ bash infer_wan2.2_14b_t2v_480p.sh
export MASTER_ADDR=127.0.0.1 export MASTER_PORT=29505 export PYTORCH_NPU_ALLOC_CONF=expandable_segments:True export MEMORY_FRAGMENTATION=1 export COMBINED_ENABLE=1 export TASK_QUEUE_ENABLE=2 export TOKENIZERS_PARALLELISM=false export ASCEND_RT_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 N_NPUS=8 torchrun --nproc_per_node=$N_NPUS --master_addr $MASTER_ADDR --master_port $MASTER_PORT ../infer.py \ --model Wan2.2-T2V-A14B \ --pretrained_model_name_or_path ../weights/Wan-AI/Wan2.2-T2V-A14B-Diffusers \ --task_type t2v \ --save_path ./output.mp4 \ --num_inference_steps 40 \ --width 832 \ --height 480 \ --frames 81 \ --sp $N_NPUS \ --fsdp text_encoder \ --vae_lightning \ --inf_vram_blocks_num 1 \ --vae_lightning \ --atten_a8w8 \ --matmul_a8w8 \ --rope_fused \ --guidance_scale 3.0 \ --guidance_scale_2 4.0 \ --seed 42 \ --prompt "An astronaut hatching from an egg, on the surface of the moon, the darkness and depth of space realised in the background. High quality, ultrarealistic detail and breath-taking movie-like camera shot." \ --negative_prompt "vivid tone, overexposure, static, blurred details, subtitle, style, work, painting, picture, static, overall gray, worst quality, low quality, JPEG compression residue, ugly, incomplete, redundant fingers, poorly drawn hands, poorly drawn face, deformed, disfigured, deformed limbs, finger fusion, static picture, messy background, three legs, many people in the background, walking backward"
- inf_vram_blocks_num: GPU memory optimization. Currently, only 1 is supported. If this function is enabled, the fsdp text_encoder parameter is required.
- guidance_scale: no classifier guidance for the transformer. Set this parameter based on the corresponding model.
- guidance_scale_2: no classifier guidance for transformer_2 of Wan2.2. Set this parameter based on the corresponding model.
The parameters are the same as those of infer_wan_14b_t2v.sh. For details, see Step 6: Performing Inference Using the Wan2.1 Text-to-Video Model.
After the inference task is complete, the generated video file output.mp4 is stored in the save_path directory, and the script is stored in the /home/ma-user/ascendx_video/scripts directory by default. From there, view the inference result.
Step 10: Performing Inference Using the Wan2.2 Image-to-Video Model
Before starting inference using the image-to-video model, download the sample image and save it to the /home/ma-user/ascendx_video/scripts directory.

The following scripts are stored in the /home/ma-user/ascendx_video/scripts/ directory:
- infer_wan2.2_14b_i2v_480p.sh: 480P inference script of the Wan image-to-video model Wan2.2-I2V-A14B-Diffusers.
- infer_wan2.2_14b_i2v_720p.sh: 720P inference script of the Wan image-to-video model Wan2.2-I2V-A14B-Diffusers.
Run the commands below to start the inference task. The following uses infer_wan2.2_14b_i2v_480p.sh as an example.
cd /home/ma-user/ascendx_video/scripts/ bash infer_wan2.2_14b_i2v_480p.sh
The following describes the parameters of the text-to-video inference script infer_wan2.2_14b_i2v_480p.sh. The parameters of script infer_wan2.2_14b_i2v_720p.sh are similar to those of infer_wan2.2_14b_i2v_480p.sh.
export MASTER_ADDR=127.0.0.1 export MASTER_PORT=29505 export PYTORCH_NPU_ALLOC_CONF=expandable_segments:True export MEMORY_FRAGMENTATION=1 export COMBINED_ENABLE=1 export TASK_QUEUE_ENABLE=2 export TOKENIZERS_PARALLELISM=false export ASCEND_RT_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 N_NPUS=8 torchrun --nproc_per_node=$N_NPUS --master_addr $MASTER_ADDR --master_port $MASTER_PORT ../infer.py \ --model Wan2.2-I2V-A14B \ --pretrained_model_name_or_path ../weights/Wan-AI/Wan2.2-I2V-A14B-Diffusers \ --task_type i2v \ --i2v_image_path ./astronaut.jpg \ --save_path ./output.mp4 \ --num_inference_steps 40 \ --width 832 \ --height 480 \ --frames 81 \ --sp $N_NPUS \ --fsdp \ --vae_lightning \ --atten_a8w8 \ --matmul_a8w8 \ --rope_fused \ --guidance_scale 3.5 \ --guidance_scale_2 3.5 \ --seed 42 \ --prompt "An astronaut hatching from an egg, on the surface of the moon, the darkness and depth of space realised in the background. High quality, ultrarealistic detail and breath-taking movie-like camera shot." \ --negative_prompt "vivid tone, overexposure, static, blurred details, subtitle, style, work, painting, picture, static, overall gray, worst quality, low quality, JPEG compression residue, ugly, incomplete, redundant fingers, poorly drawn hands, poorly drawn face, deformed, disfigured, deformed limbs, finger fusion, static picture, messy background, three legs, many people in the background, walking backward"
The parameters are the same as those of infer_wan_14b_t2v.sh. For details, see Step 6: Performing Inference Using the Wan2.1 Text-to-Video Model.
After the inference task is complete, the generated video file output.mp4 is stored in the save_path directory, and the script is stored in the /home/ma-user/ascendx_video/scripts directory by default. From there, view the inference result.
Step 11: Performing Inference Using the Wan2.2 Text-to-Image Model
The following scripts are stored in the /home/ma-user/ascendx_video/scripts/ directory:
- infer_wan2.2_14b_t2i_480p.sh: 480P inference script of the Wan text-to-image model Wan2.2-T2V-A14B-Diffusers.
- infer_wan2.2_14b_t2i_720p.sh: 720P inference script of the Wan text-to-image model Wan2.2-T2V-A14B-Diffusers.
Run the commands below to start the inference task. The following uses infer_wan2.2_14b_t2i_480p.sh as an example.
cd /home/ma-user/ascendx_video/scripts/ bash infer_wan2.2_14b_t2i_480p.sh
The following describes the parameters of the text-to-image inference script infer_wan2.2_14b_t2i_480p.sh. The parameters of script infer_wan2.2_14b_t2i_720p.sh are similar to those of infer_wan2.2_14b_t2i_480p.sh.
export MASTER_ADDR=127.0.0.1 export MASTER_PORT=29505 export PYTORCH_NPU_ALLOC_CONF=expandable_segments:True export MEMORY_FRAGMENTATION=1 export COMBINED_ENABLE=1 export TASK_QUEUE_ENABLE=2 export TOKENIZERS_PARALLELISM=false export ASCEND_RT_VISIBLE_DEVICES=0,1 N_NPUS=2 torchrun --nproc_per_node=$N_NPUS --master_addr $MASTER_ADDR --master_port $MASTER_PORT ../infer.py \ --model Wan2.2-T2V-A14B \ --pretrained_model_name_or_path ../weights/Wan-AI/Wan2.2-T2V-A14B-Diffusers \ --task_type t2i \ --save_path ./output.png \ --num_inference_steps 40 \ --width 832 \ --height 480 \ --frames 1 \ --atten_a8w8 \ --matmul_a8w8 \ --rope_fused \ --guidance_scale 3.0 \ --guidance_scale_2 4.0 \ --seed 42 \ --prompt "An astronaut hatching from an egg, on the surface of the moon, the darkness and depth of space realised in the background. High quality, ultrarealistic detail and breath-taking movie-like camera shot." \ --negative_prompt "vivid tone, overexposure, static, blurred details, subtitle, style, work, painting, picture, static, overall gray, worst quality, low quality, JPEG compression residue, ugly, incomplete, redundant fingers, poorly drawn hands, poorly drawn face, deformed, disfigured, deformed limbs, finger fusion, static picture, messy background, three legs, many people in the background, walking backward"
The parameters are the same as those of infer_wan_14b_t2v.sh. For details, see Step 6: Performing Inference Using the Wan2.1 Text-to-Video Model.
After the inference task is complete, the generated image file output.png is stored in the save_path directory, and the script is stored in the /home/ma-user/ascendx_video/scripts directory by default. From there, view the inference result.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot