Help Center/ ModelArts/ Best Practices/ Video Generation Model Training and Inference/ Inference Guide for Wan Series Video Generation Models Adapted to PyTorch NPU via Lite Server
Updated on 2025-10-14 GMT+08:00

Inference Guide for Wan Series Video Generation Models Adapted to PyTorch NPU via Lite Server

Solution Overview

This section describes how to use NPUs to perform text-to-video inference, image-to-video inference, and text-to-image inference using Wan2.1 and Wan2.2 video generation models in ModelArts Lite Server. To deploy the solution, contact Huawei technical support to purchase Server resources.

Wan series generation models are supported, including Wan2.1-T2V-14B-Diffusers, Wan2.1-T2V-1.3B-Diffusers, Wan2.1-I2V-14B-480P-Diffusers, Wan2.1-I2V-14B-720P-Diffusers, Wan2.2-T2V-A14B-Diffusers, and Wan2.2-I2V-A14B-Diffusers.

Resource Specifications

Use Snt9B or Snt9B23 single-node resources in the Lite Server environment.

Table 1 Snt9B23 environment requirements

Name

Version

Driver

25.2.1

PyTorch

pytorch_2.5.1

Table 2 Snt9B environment requirements

Name

Version

Driver

25.2.1

PyTorch

pytorch_2.5.1

Obtaining the Software Package and Images

Table 3 Software package and images to be obtained

Category

Name

How to Obtain

Plug-in code package

AscendCloud-AIGC-6.5.907-xxx.zip in the AscendCloud-6.5.907-xxx.zip software package

NOTE:

xxx in the package name indicates the timestamp, which is subject to the actual package release time.

Download ModelArts 6.5.907.2 from Support-E.

NOTE:

If the software information does not appear when opening the download link, you lack access permissions. Contact your company's Huawei technical support for assistance with downloading.

Base image

Snt9B23: CN North-Ulanqab1, CN East 2, and CN Southwest-Guiyang1

swr.cn-southwest-2.myhuaweicloud.com/atelier/pytorch_ascend:pytorch_2.5.1-cann_8.2.rc1-py_3.11-hce_2.0.2503-aarch64-snt9b23-20250729103313-3a25129

Snt9B: CN East 2 and CN Southwest-Guiyang1

swr.cn-southwest-2.myhuaweicloud.com/atelier/pytorch_ascend:pytorch_2.5.1-cann_8.2.rc1-py_3.11-hce_2.0.2503-aarch64-snt9b-20250729103313-3a25129

Pull the image from SWR.

Base image

Snt9B23: CN-Hong Kong

swr.ap-southeast-1.myhuaweicloud.com/atelier/pytorch_ascend:pytorch_2.5.1-cann_8.2.rc1-py_3.11-hce_2.0.2503-aarch64-snt9b23-20250729103313-3a25129

Snt9B: CN-Hong Kong

swr.ap-southeast-1.myhuaweicloud.com/atelier/pytorch_ascend:pytorch_2.5.1-cann_8.2.rc1-py_3.11-hce_2.0.2503-aarch64-snt9b-20250729103313-3a25129

Pull the image from SWR.

Constraints

  • This document applies to ModelArts 6.5.907.2. Obtain the required software package and image by referring to Table 3. Follow the version mapping when using this document.
  • Ensure that the container can access the Internet.

Step 1: Preparing the Environment

  1. Enable Lite Server resources and obtain passwords. Verify SSH access to all servers. Confirm proper network connectivity between them.

    If a container is used or shared by multiple users, you should restrict the container from accessing the OpenStack management address (169.254.169.254) to prevent host machine metadata acquisition. For details, see Forbidding Containers to Obtain Host Machine Metadata.

  2. Log in to the server via SSH and check the NPUs. Obtain the NPU device information:
    npu-smi info                    # Run this command on each instance node to view the NPU status.
    npu-smi info -l | grep Total    # Run this command on each instance node to view the total number of PUs.

    If an error occurs, the NPU devices on the server may not be properly installed, or the NPU image may be mounted to another container. Install the firmware and driver or release the mounted NPUs.

  3. Check whether Docker is installed.
    docker -v   # Check whether Docker is installed.

    If Docker is not installed, run this command:

    yum install -y docker-engine.aarch64 docker-engine-selinux.noarch docker-runc.aarch64
  4. Configure IP forwarding for intra-container network accesses. Run the command below to check the value of net.ipv4.ip_forward. Skip this step if the value is 1.
    sysctl -p | grep net.ipv4.ip_forward
    If the value is not 1, configure IP forwarding:
    sed -i 's/net\.ipv4\.ip_forward=0/net\.ipv4\.ip_forward=1/g' /etc/sysctl.conf
    sysctl -p | grep net.ipv4.ip_forward

Step 2: Obtaining the Base Image

Use official images to deploy inference services. For details about the image path {image_url}, see Table 3.

docker pull {image_url}

To log in to the SWR console, log in to the SWR console and obtain the login command by referring to the figure below.

Step 3: Starting the Container Image

  1. Start the container image. Before starting the container, modify the parameters in ${} according to the parameter description.
    Start the Snt9B23 container:
    export work_dir="Custom mounted working directory"
    export container_work_dir="Custom working directory mounted to the container"
    export container_name="Custom container name"
    export image_name="Image name or ID"
    // Start a container to run the image.
    docker run  -itd --net=host \
        --privileged \
        --device=/dev/davinci_manager \
        --device=/dev/devmm_svm \
        --device=/dev/hisi_hdc \
        --shm-size=256g \
        -v /usr/local/dcmi:/usr/local/dcmi \
        -v /usr/local/Ascend/driver:/usr/local/Ascend/driver \
        -v /var/log/npu/:/usr/slog \
        -v /usr/local/sbin/npu-smi:/usr/local/sbin/npu-smi \
        -v ${work_dir}:${container_work_dir} \
        --name ${container_name} \
        ${image_name} \
        /bin/bash

    Start the Snt9B container:

    export work_dir="Custom mounted working directory"
    export container_work_dir="Custom working directory mounted to the container"
    export container_name="Custom container name"
    export image_name="Image name or ID"
    // Start a container to run the image.
    docker run  -itd --net=bridge \
        --device=/dev/davinci0 \
        --device=/dev/davinci1 \
        --device=/dev/davinci2 \
        --device=/dev/davinci3 \
        --device=/dev/davinci4 \
        --device=/dev/davinci5 \
        --device=/dev/davinci6 \
        --device=/dev/davinci7 \
        --device=/dev/davinci_manager \
        --device=/dev/devmm_svm \
        --device=/dev/hisi_hdc \
        --shm-size=256g \
        -v /usr/local/dcmi:/usr/local/dcmi \
        -v /usr/local/Ascend/driver:/usr/local/Ascend/driver \
        -v /var/log/npu/:/usr/slog \
        -v /usr/local/sbin/npu-smi:/usr/local/sbin/npu-smi \
        -v ${work_dir}:${container_work_dir} \
        --name ${container_name} \
        ${image_name} \
        /bin/bash

    Parameters:

    • -v ${work_dir}:${container_work_dir}: host directory to be mounted to the container. The host and container use different file systems. work_dir indicates the working directory on the host. The directory stores files such as code and data required for the project. container_work_dir indicates the directory to be mounted to the container. The two paths can be the same.
      • The /home/ma-user directory cannot be mounted to the container. This directory is the home directory of user ma-user. If the container is mounted to /home/ma-user, the container conflicts with the base image when being started. As a result, the base image is unavailable.
      • Both the driver and npu-smi must be mounted to the container.
    • --name ${container_name}: container name, which is used when you access the container. You can define a container name.
    • ${image_name}: name of the base image of the corresponding model. For details, see Table 3.
    • --device=/dev/davinci0: mount the corresponding PU to the container. If multiple PUs need to be mounted, add the configuration items one by one.
  2. Access the container through the container name.

    Log in to the Snt9B23 as user root.

    docker exec -it -u root ${container_name} bash
    For Snt9B, the ma-user user is used by default. All the subsequent operations are performed as user ma-user.
    docker exec -it ${container_name} bash

Step 4: Installing Dependencies and Software Packages

  1. To use git clone and git lfs commands to download large models, see the following operations:
    1. Enter the URL below in the browser to download the git-lfs package and upload it to the /home/ma-user directory of the container.
      https://github.com/git-lfs/git-lfs/releases/download/v3.2.0/git-lfs-linux-arm64-v3.2.0.tar.gz
      Alternatively, download git-lfs to the container for direct use.
      cd /home/ma-user
      wget https://github.com/git-lfs/git-lfs/releases/download/v3.2.0/git-lfs-linux-arm64-v3.2.0.tar.gz
    2. Go to the container and run the git-lfs installation commands.
      cd /home/ma-user
      tar -zxvf git-lfs-linux-arm64-v3.2.0.tar.gz 
      cd git-lfs-3.2.0 
      sudo sh install.sh
    3. Disable SSL verification for Git configuration.
      git config --global http.sslVerify false
  2. Install the AscendX_Video software package.
    1. Upload the AscendX_Video software package AscendCloud-AIGC-*.zip to the /home/ma-user directory of the container. For details about how to obtain the package, see Obtaining the Software Package and Images.
    2. Decompress the AscendCloud-AIGC-*.zip file, and run the following commands to install the Python dependencies:
      cd /home/ma-user
      unzip AscendCloud-AIGC-*.zip -d ./AscendCloud
      cp -r /home/ma-user/AscendCloud/aigc_inference/torch_npu/ascendx_video ./
      cd /home/ma-user/ascendx_video
      pip install seal-*-linux_aarch64.whl
      pip install check_device-*-linux_aarch64.whl
      pip install ascendx_video-*-none-any.whl
    3. Install the operator environment.

      If the Snt9B23 machine is used, run the following command:

      cd /home/ma-user/AscendCloud/opp/A3

      If the Snt9B machine is used, run the following command:

      cd /home/ma-user/AscendCloud/opp/A2
      Install the operator:
      unzip AscendCloud-OPP-*.zip
      unzip AscendCloud-OPP-*-torch-2.5.1-py311-*.zip -d ./AscendCloud_OPP
      cd AscendCloud_OPP
      pip install *.whl
      mkdir -p /home/ma-user/operate
      bash ./ascend_cloud_ops_ascend_turbo-*_linux_aarch64.run --install-path=/home/ma-user/operate
      bash ./ascend_cloud_ops_custom_opp-*_linux_aarch64_ascend910b_ascend910_93.run --install-path=/home/ma-user/operate
      cd ..
      unzip AscendCloud-OPS-ADV-*.zip -d ./AscendCloud_OPS-ADV
      cd AscendCloud_OPS-ADV
      bash ./CANN-custom_ops-*-linux.aarch64.run --install-path=/home/ma-user/operate

3. Initialize environment variables.

Note that the environment needs to be initialized each time you access the container.

source /home/ma-user/operate/AscendTurbo/set_env.bash
source /home/ma-user/operate/vendors/customize/bin/set_env.bash
source /home/ma-user/operate/vendors/customize_cloud/bin/set_env.bash

Step 5: Downloading Model Weights

Download the weight file to the container directory. The following lists the model addresses.

Save the weights to the /home/ma-user/ascendx_video/weights directory, for example:

weights
└──Wan-AI
    ├──Wan2.1-I2V-14B-480P-Diffusers
    ├──Wan2.1-I2V-14B-720P-Diffusers
    ├──Wan2.1-T2V-14B-Diffusers
    ├──Wan2.1-T2V-1.3B-Diffusers
    ├──Wan2.2-I2V-A14B-Diffusers
    └──Wan2.2-T2V-A14B-Diffusers

Step 6: Performing Inference Using the Wan2.1 Text-to-Video Model

The following scripts are stored in the /home/ma-user/ascendx_video/scripts/ directory:

  • infer_wan2.1_14b_t2v_480p.sh: 480P inference script of the Wan text-to-video model Wan2.1-T2V-14B.
  • infer_wan2.1_14b_t2v_720p.sh: 720P inference script of the Wan text-to-video model Wan2.1-T2V-14B.
  • infer_wan2.1_1.3b_t2v.sh: inference script of the Wan text-to-video model Wan2.1-T2V-1.3B.

Run the commands below to start the inference task. The following uses infer_wan2.1_14b_t2v_480p.sh as an example.

cd /home/ma-user/ascendx_video/scripts/
bash infer_wan2.1_14b_t2v_480p.sh

The following describes the parameters of the text-to-video inference script infer_wan2.1_14b_t2v_480p.sh. The parameters of scripts infer_wan2.1_1.3b_t2v.sh and infer_wan2.1_14b_t2v_720p.sh are similar to those of infer_wan2.1_14b_t2v_480p.sh.

export MASTER_ADDR=127.0.0.1
export MASTER_PORT=29505

export PYTORCH_NPU_ALLOC_CONF=expandable_segments:True
export MEMORY_FRAGMENTATION=1
export COMBINED_ENABLE=1
export TASK_QUEUE_ENABLE=2
export TOKENIZERS_PARALLELISM=false

export ASCEND_RT_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
N_NPUS=8
torchrun --nproc_per_node=$N_NPUS --master_addr $MASTER_ADDR --master_port $MASTER_PORT ../infer.py \
         --model Wan2.1-T2V-14B \
         --pretrained_model_name_or_path "../weights/Wan-AI/Wan2.1-T2V-14B-Diffusers" \
         --save_path ./output.mp4 \
         --num_inference_steps 50 \
         --width 832 \
         --height 480 \
         --frames 81 \
         --sp $N_NPUS \
         --fsdp \
         --vae_lightning \
         --turbo_mode faiz \
         --atten_a8w8 \
         --matmul_a8w8 \
         --rope_fused \
         --seed 42 \
         --prompt "A young boy with short brown hair, dressed in a dark blue t-shirt and red pants, is seen playing a KAWAI upright piano with skill and concentration. The piano's glossy black surface reflects the room's lighting, and its white and black keys are arranged in a standard layout, indicating a scene of musical practice or learning. The boy's hands move over the keys, suggesting he is engaged in playing or practicing a piece." \
         --negative_prompt "vivid tone, overexposure, static, blurred details, subtitle, style, work, painting, picture, static, overall gray, worst quality, low quality, JPEG compression residue, ugly, incomplete, redundant fingers, poorly drawn hands, poorly drawn face, deformed, disfigured, deformed limbs, finger fusion, static picture, messy background, three legs, many people in the background, walking backward"
  • ASCEND_RT_VISIBLE_DEVICES: ID of the used PU.
  • N_NPUS: number of used PUs. You are advised to use eight PUs.
  • model: supported inference model. Currently, Wan2.1-T2V-14B, Wan2.1-I2V-14B, Wan2.1-T2V-1.3B, Wan2.2-T2V-A14B, and Wan2.2-I2V-A14B are supported.
  • pretrained_model_name_or_path: weight address of the corresponding model.
  • save_path: path for storing the video generated during inference.
  • num_inference_steps: number of inference steps.
  • frames, height, width: dimensions of the generated video, including the number of frames, height, and width. Currently, 81 x 480 x 832, 121 x 480 x 832, 81 x 720 x 1280, and 121 x 720 x 1280 are supported.
  • prompt, negative_prompt: positive and negative prompts for generating a video.
  • sp: sequence parallelism parameter. It is recommended that the value be the same as the number of inference PUs.
  • fsdp: data parallelism. None, all, text_encoder, and transformer are supported. The default value is None, indicating that the function is disabled. If this function is enabled, the default value is all, indicating that parallelism is enabled for text_encoder and transformer. If this parameter is set to text_encoder or transformer, parallelism is enabled only for the specified module.
  • vae_lightning: VAE acceleration. This parameter is supported only in the multi-PU scenario. If this parameter is not set, VAE acceleration is disabled. Enabling this function can improve VAE performance.
  • turbo_mode: acceleration mode. default and faiz are supported. The default value is default, indicating that the function is disabled. faiz is recommended for high performance. If this parameter is not set, the acceleration mode is disabled. Enabling this function can accelerate video inference, but slightly affects the accuracy.
  • atten_a8w8: atten quantization acceleration. Set this parameter for high performance. If this parameter is not set, atten quantization acceleration is disabled. Enabling this function can accelerate video inference, but slightly affects the accuracy.
  • matmul_a8w8: matmul quantization acceleration. Set this parameter for high performance. If this parameter is not set, matmul quantization acceleration is disabled. Enabling this function can accelerate video inference, but slightly affects the accuracy.
  • rope_fused: rotary position encoding fusion operator. Set this parameter for high performance. If this parameter is not set, the fusion operator is disabled. Enabling this function can accelerate video inference, but slightly affects the accuracy.
  • seed: Random number seed. The default value is 42, which affects the effect of the generated image.

After the inference task is complete, the generated video file output.mp4 is stored in the save_path directory, and the script is stored in the /home/ma-user/ascendx_video/scripts directory by default. From there, view the inference result.

Step 7: Performing Inference Using the Wan2.1 Image-to-Video Model

Before starting inference using the image-to-video model, download the sample image and save it to the /home/ma-user/ascendx_video/scripts directory.

Figure 1 Example

The following scripts are stored in the /home/ma-user/ascendx_video/scripts/ directory:

  • infer_wan2.1_14b_i2v_480p.sh: 480P inference script of the Wan2.1-I2V-14B image-to-video model.
  • infer_wan2.1_14b_i2v_720p.sh: 720P inference script of the Wan2.1-I2V-14B image-to-video model.

Run the commands below to start the inference task. The following uses infer_wan2.1_14b_i2v_480p.sh as an example.

cd /home/ma-user/ascendx_video/scripts/
bash infer_wan2.1_14b_i2v_480p.sh

The following describes the parameters of the image-to-video inference script infer_wan2.1_14b_i2v_480p.sh. The parameters of script infer_wan2.1_14b_i2v_720p.sh are similar to those of infer_wan2.1_14b_i2v_480p.sh.

export MASTER_ADDR=127.0.0.1
export MASTER_PORT=29505

export PYTORCH_NPU_ALLOC_CONF=expandable_segments:True
export MEMORY_FRAGMENTATION=1
export COMBINED_ENABLE=1
export TASK_QUEUE_ENABLE=2
export TOKENIZERS_PARALLELISM=false
export ASCEND_RT_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
N_NPUS=8
torchrun --nproc_per_node=$N_NPUS --master_addr $MASTER_ADDR --master_port $MASTER_PORT ../infer.py \
         --model Wan2.1-I2V-14B \
         --pretrained_model_name_or_path "../weights/Wan-AI/Wan2.1-I2V-14B-480P-Diffusers" \
         --task_type i2v \
         --i2v_image_path ./astronaut.jpg \
         --save_path ./output.mp4 \
         --num_inference_steps 40 \
         --width 832 \
         --height 480 \
         --frames 81 \
         --sp $N_NPUS \
         --fsdp \
         --vae_lightning \
         --turbo_mode faiz \
         --atten_a8w8 \
         --matmul_a8w8 \
         --rope_fused \
         --seed 42 \
         --prompt "An astronaut hatching from an egg, on the surface of the moon, the darkness and depth of space realised in the background. High quality, ultrarealistic detail and breath-taking movie-like camera shot." \
         --negative_prompt "vivid tone, overexposure, static, blurred details, subtitle, style, work, painting, picture, static, overall gray, worst quality, low quality, JPEG compression residue, ugly, incomplete, redundant fingers, poorly drawn hands, poorly drawn face, deformed, disfigured, deformed limbs, finger fusion, static picture, messy background, three legs, many people in the background, walking backward"
  • task_type: inference task type. The value can be t2v (text-to-video), i2v (image-to-video), or t2i (text-to-image). The default value is i2v.
  • i2v_image_path: path of the image used for video generation.
  • For other parameters, use the same settings as those of infer_wan_14b_t2v_480p.sh. For details, see Step 6: Performing Inference Using the Wan2.1 Text-to-Video Model.

After the inference task is complete, the generated video file output.mp4 is stored in the save_path directory, and the script is stored in the /home/ma-user/ascendx_video/scripts directory by default. From there, view the inference result.

Step 8: Performing Inference Using the Wan2.1 Text-to-Image Model

The following scripts are stored in the /home/ma-user/ascendx_video/scripts/ directory:

  • infer_wan2.1_14b_t2i_480p.sh: 480P inference script of the Wan2.1-T2V-14B text-to-image model.
  • infer_wan2.1_14b_t2i_720p.sh: 720P inference script of the Wan2.1-T2V-14B text-to-image model.

Run the commands below to start the inference task. The following uses infer_wan2.1_14b_t2i_480p.sh as an example.

cd /home/ma-user/ascendx_video/scripts/
bash infer_wan2.1_14b_t2i_480p.sh

The following describes the parameters of the text-to-image inference script infer_wan2.1_14b_t2i_480p.sh. The parameters of script infer_wan2.1_14b_t2i_720p.sh are similar to those of infer_wan2.1_14b_t2i_480p.sh.

export MASTER_ADDR=127.0.0.1
export MASTER_PORT=29505

export PYTORCH_NPU_ALLOC_CONF=expandable_segments:True
export MEMORY_FRAGMENTATION=1
export COMBINED_ENABLE=1
export TASK_QUEUE_ENABLE=2
export TOKENIZERS_PARALLELISM=false
export ASCEND_RT_VISIBLE_DEVICES=0
N_NPUS=1
torchrun --nproc_per_node=$N_NPUS --master_addr $MASTER_ADDR --master_port $MASTER_PORT ../infer.py \
         --model Wan2.1-T2V-14B \
         --pretrained_model_name_or_path "../weights/Wan-AI/Wan2.1-T2V-14B-Diffusers" \
         --task_type t2i \
         --save_path ./output.png \
         --num_inference_steps 40 \
         --width 832 \
         --height 480 \
         --frames 1 \
         --atten_a8w8 \
         --matmul_a8w8 \
         --rope_fused \
         --seed 42 \
         --prompt "An astronaut hatching from an egg, on the surface of the moon, the darkness and depth of space realised in the background. High quality, ultrarealistic detail and breath-taking movie-like camera shot." \
         --negative_prompt "vivid tone, overexposure, static, blurred details, subtitle, style, work, painting, picture, static, overall gray, worst quality, low quality, JPEG compression residue, ugly, incomplete, redundant fingers, poorly drawn hands, poorly drawn face, deformed, disfigured, deformed limbs, finger fusion, static picture, messy background, three legs, many people in the background, walking backward"

The parameters are the same as those of infer_wan_14b_t2v.sh. For details, see Step 6: Performing Inference Using the Wan2.1 Text-to-Video Model.

After the inference task is complete, the generated image file output.png is stored in the save_path directory, and the script is stored in the /home/ma-user/ascendx_video/scripts directory by default. From there, view the inference result.

Step 9: Performing Inference Using the Wan2.2 Text-to-Video Model

The following scripts are stored in the /home/ma-user/ascendx_video/scripts/ directory:

  • infer_wan2.2_14b_t2v_480p.sh: 480P inference script of the Wan text-to-video model Wan2.2-T2V-A14B-Diffusers.
  • infer_wan2.2_14b_t2v_720p.sh: 720P inference script of the Wan text-to-video model Wan2.2-T2V-A14B-Diffusers.

Run the commands below to start the inference task. The following uses infer_wan2.2_14b_t2v_480p.sh as an example.

cd /home/ma-user/ascendx_video/scripts/
bash infer_wan2.2_14b_t2v_480p.sh
The following describes the parameters of the text-to-video inference script infer_wan2.2_14b_t2v_480p.sh. The parameters of script infer_wan2.2_14b_t2v_720p.sh are similar to those of infer_wan2.2_14b_t2v_480p.sh.
export MASTER_ADDR=127.0.0.1
export MASTER_PORT=29505

export PYTORCH_NPU_ALLOC_CONF=expandable_segments:True
export MEMORY_FRAGMENTATION=1
export COMBINED_ENABLE=1
export TASK_QUEUE_ENABLE=2
export TOKENIZERS_PARALLELISM=false
export ASCEND_RT_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
N_NPUS=8
torchrun --nproc_per_node=$N_NPUS --master_addr $MASTER_ADDR --master_port $MASTER_PORT ../infer.py \
         --model Wan2.2-T2V-A14B \
         --pretrained_model_name_or_path ../weights/Wan-AI/Wan2.2-T2V-A14B-Diffusers \
         --task_type t2v \
         --save_path ./output.mp4 \
         --num_inference_steps 40 \
         --width 832 \
         --height 480 \
         --frames 81 \
         --sp $N_NPUS \
         --fsdp text_encoder \
         --vae_lightning \
         --inf_vram_blocks_num 1 \
         --vae_lightning \
         --atten_a8w8 \
         --matmul_a8w8 \
         --rope_fused \
         --guidance_scale 3.0 \
         --guidance_scale_2 4.0 \
         --seed 42 \
         --prompt "An astronaut hatching from an egg, on the surface of the moon, the darkness and depth of space realised in the background. High quality, ultrarealistic detail and breath-taking movie-like camera shot." \
         --negative_prompt "vivid tone, overexposure, static, blurred details, subtitle, style, work, painting, picture, static, overall gray, worst quality, low quality, JPEG compression residue, ugly, incomplete, redundant fingers, poorly drawn hands, poorly drawn face, deformed, disfigured, deformed limbs, finger fusion, static picture, messy background, three legs, many people in the background, walking backward"
  • inf_vram_blocks_num: GPU memory optimization. Currently, only 1 is supported. If this function is enabled, the fsdp text_encoder parameter is required.
  • guidance_scale: no classifier guidance for the transformer. Set this parameter based on the corresponding model.
  • guidance_scale_2: no classifier guidance for transformer_2 of Wan2.2. Set this parameter based on the corresponding model.

The parameters are the same as those of infer_wan_14b_t2v.sh. For details, see Step 6: Performing Inference Using the Wan2.1 Text-to-Video Model.

After the inference task is complete, the generated video file output.mp4 is stored in the save_path directory, and the script is stored in the /home/ma-user/ascendx_video/scripts directory by default. From there, view the inference result.

Step 10: Performing Inference Using the Wan2.2 Image-to-Video Model

Before starting inference using the image-to-video model, download the sample image and save it to the /home/ma-user/ascendx_video/scripts directory.

Figure 2 Example

The following scripts are stored in the /home/ma-user/ascendx_video/scripts/ directory:

  • infer_wan2.2_14b_i2v_480p.sh: 480P inference script of the Wan image-to-video model Wan2.2-I2V-A14B-Diffusers.
  • infer_wan2.2_14b_i2v_720p.sh: 720P inference script of the Wan image-to-video model Wan2.2-I2V-A14B-Diffusers.

Run the commands below to start the inference task. The following uses infer_wan2.2_14b_i2v_480p.sh as an example.

cd /home/ma-user/ascendx_video/scripts/
bash infer_wan2.2_14b_i2v_480p.sh

The following describes the parameters of the text-to-video inference script infer_wan2.2_14b_i2v_480p.sh. The parameters of script infer_wan2.2_14b_i2v_720p.sh are similar to those of infer_wan2.2_14b_i2v_480p.sh.

export MASTER_ADDR=127.0.0.1
export MASTER_PORT=29505

export PYTORCH_NPU_ALLOC_CONF=expandable_segments:True
export MEMORY_FRAGMENTATION=1
export COMBINED_ENABLE=1
export TASK_QUEUE_ENABLE=2
export TOKENIZERS_PARALLELISM=false
export ASCEND_RT_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
N_NPUS=8
torchrun --nproc_per_node=$N_NPUS --master_addr $MASTER_ADDR --master_port $MASTER_PORT ../infer.py \
         --model Wan2.2-I2V-A14B \
         --pretrained_model_name_or_path ../weights/Wan-AI/Wan2.2-I2V-A14B-Diffusers \
         --task_type i2v \
         --i2v_image_path ./astronaut.jpg \
         --save_path ./output.mp4 \
         --num_inference_steps 40 \
         --width 832 \
         --height 480 \
         --frames 81 \
         --sp $N_NPUS \
         --fsdp \
         --vae_lightning \
         --atten_a8w8 \
         --matmul_a8w8 \
         --rope_fused \
         --guidance_scale 3.5 \
         --guidance_scale_2 3.5 \
         --seed 42 \
         --prompt "An astronaut hatching from an egg, on the surface of the moon, the darkness and depth of space realised in the background. High quality, ultrarealistic detail and breath-taking movie-like camera shot." \
         --negative_prompt "vivid tone, overexposure, static, blurred details, subtitle, style, work, painting, picture, static, overall gray, worst quality, low quality, JPEG compression residue, ugly, incomplete, redundant fingers, poorly drawn hands, poorly drawn face, deformed, disfigured, deformed limbs, finger fusion, static picture, messy background, three legs, many people in the background, walking backward"

The parameters are the same as those of infer_wan_14b_t2v.sh. For details, see Step 6: Performing Inference Using the Wan2.1 Text-to-Video Model.

After the inference task is complete, the generated video file output.mp4 is stored in the save_path directory, and the script is stored in the /home/ma-user/ascendx_video/scripts directory by default. From there, view the inference result.

Step 11: Performing Inference Using the Wan2.2 Text-to-Image Model

The following scripts are stored in the /home/ma-user/ascendx_video/scripts/ directory:

  • infer_wan2.2_14b_t2i_480p.sh: 480P inference script of the Wan text-to-image model Wan2.2-T2V-A14B-Diffusers.
  • infer_wan2.2_14b_t2i_720p.sh: 720P inference script of the Wan text-to-image model Wan2.2-T2V-A14B-Diffusers.

Run the commands below to start the inference task. The following uses infer_wan2.2_14b_t2i_480p.sh as an example.

cd /home/ma-user/ascendx_video/scripts/
bash infer_wan2.2_14b_t2i_480p.sh

The following describes the parameters of the text-to-image inference script infer_wan2.2_14b_t2i_480p.sh. The parameters of script infer_wan2.2_14b_t2i_720p.sh are similar to those of infer_wan2.2_14b_t2i_480p.sh.

export MASTER_ADDR=127.0.0.1
export MASTER_PORT=29505

export PYTORCH_NPU_ALLOC_CONF=expandable_segments:True
export MEMORY_FRAGMENTATION=1
export COMBINED_ENABLE=1
export TASK_QUEUE_ENABLE=2
export TOKENIZERS_PARALLELISM=false
export ASCEND_RT_VISIBLE_DEVICES=0,1
N_NPUS=2
torchrun --nproc_per_node=$N_NPUS --master_addr $MASTER_ADDR --master_port $MASTER_PORT ../infer.py \
         --model Wan2.2-T2V-A14B \
         --pretrained_model_name_or_path ../weights/Wan-AI/Wan2.2-T2V-A14B-Diffusers \
         --task_type t2i \
         --save_path ./output.png \
         --num_inference_steps 40 \
         --width 832 \
         --height 480 \
         --frames 1 \
         --atten_a8w8 \
         --matmul_a8w8 \
         --rope_fused \
         --guidance_scale 3.0 \
         --guidance_scale_2 4.0 \
         --seed 42 \
         --prompt "An astronaut hatching from an egg, on the surface of the moon, the darkness and depth of space realised in the background. High quality, ultrarealistic detail and breath-taking movie-like camera shot." \
         --negative_prompt "vivid tone, overexposure, static, blurred details, subtitle, style, work, painting, picture, static, overall gray, worst quality, low quality, JPEG compression residue, ugly, incomplete, redundant fingers, poorly drawn hands, poorly drawn face, deformed, disfigured, deformed limbs, finger fusion, static picture, messy background, three legs, many people in the background, walking backward"

The parameters are the same as those of infer_wan_14b_t2v.sh. For details, see Step 6: Performing Inference Using the Wan2.1 Text-to-Video Model.

After the inference task is complete, the generated image file output.png is stored in the save_path directory, and the script is stored in the /home/ma-user/ascendx_video/scripts directory by default. From there, view the inference result.