Adapting Diffusers and ComfyUI Kits to PyTorch NPU for Inference Using ModelArts Lite Server (6.5.907)

This guide explains how to deploy the Stable Diffusion and HUNYUAN text-to-image models, using the Diffusers and ComfyUI frameworks on the ModelArts Lite Server. It also covers running these models with NPU-based inference.

Solution Overview

This solution describes how to use NPU compute resources to deploy the Diffusers and ComfyUI frameworks for Server-based inference. First, contact Huawei's enterprise technical support team to buy the required Server resources.

This solution is designed exclusively for enterprise users.

Resource Specifications

You are advised to use ModelArts Lite Server's Snt9B and Snt9B23 resources for inference deployment.

**Table 1** Environment requirements
Name	Version
driver	25.2.1
PyTorch	pytorch_2.5.1

Obtaining Software Packages and Images

**Table 2** Obtaining software packages and images
Category	Name	How to Obtain
Plug-in code package	AscendCloud-6.5.907-xxx.zip in the AscendCloud-6.5.907 software package xxx in the file name indicates the timestamp. The timestamp is the actual release time of the package.	Download ModelArts 6.5.907.2 from Support-E. NOTE: If the software information does not appear when opening the download link, you lack access permissions. Contact your company's Huawei technical support for assistance with downloading.
Snt9b base image	CN Southwest-Guiyang1: swr.cn-southwest-2.myhuaweicloud.com/atelier/pytorch_ascend:pytorch_2.5.1-cann_8.2.rc1-py_3.11-hce_2.0.2503-aarch64-snt9b-20250729103313-3a25129 CN-Hong Kong: swr.ap-southeast-1.myhuaweicloud.com/atelier/pytorch_ascend:pytorch_2.5.1-cann_8.2.rc1-py_3.11-hce_2.0.2503-aarch64-snt9b-20250729103313-3a25129	Pull the image from SWR.
Snt9b23 base image	CN Southwest-Guiyang1: swr.cn-southwest-2.myhuaweicloud.com/atelier/pytorch_ascend:pytorch_2.5.1-cann_8.2.rc1-py_3.11-hce_2.0.2503-aarch64-snt9b23-20250729103313-3a25129 CN-Hong Kong: swr.ap-southeast-1.myhuaweicloud.com/atelier/pytorch_ascend:pytorch_2.5.1-cann_8.2.rc1-py_3.11-hce_2.0.2503-aarch64-snt9b23-20250729103313-3a25129	Pull the image from SWR.

**Table 3** Features
Kit	Model
Diffusers	SD1.5 SDXL SD3.5 HUNYUAN
ComfyUI	SD1.5 SDXL SD3.5

Step 1: Preparing the Environment

Enable Lite Server resources and obtain passwords. Verify SSH access to all servers. Confirm proper network connectivity between them.

If no resource specifications are available when you purchase Server resources, contact Huawei Cloud technical support.

If a container is used or shared by multiple users, you should restrict the container from accessing the OpenStack management address (169.254.169.254) to prevent host machine metadata acquisition. For details, see Forbidding Containers to Obtain Host Machine Metadata.
Check the environment.
1. Log in to the server via SSH and check the NPU status. Obtain the NPU device information:
```
npu-smi info
```
  If an error occurs, the NPU devices on the server may not be properly installed, or the NPU image may be mounted to another container. Install the firmware and driver or release the mounted NPUs.
2. Check whether Docker is installed.
```
docker -v   # Check whether Docker is installed.
```
  If Docker is not installed, run this command:
```
yum install -y docker-engine.aarch64 docker-engine-selinux.noarch docker-runc.aarch64
```
3. Configure IP forwarding for intra-container network accesses. Run the following command to check the value of net.ipv4.ip_forward. Skip this step if the value is 1.
```
sysctl -p | grep net.ipv4.ip_forward
```
  If the value is not 1, configure IP forwarding:
```
sed -i 's/net\.ipv4\.ip_forward=0/net\.ipv4\.ip_forward=1/g' /etc/sysctl.conf 
sysctl -p | grep net.ipv4.ip_forward
```
Obtain the base image. Use official images to deploy inference services. For details about the image path {image_url}, see Table 2.
```
docker pull {image_url}
```
To log in to the SWR console, log in to the SWR console and obtain the login command by referring to the figure below.

Step 2: Starting the Container Image

Before starting the snt9b container image, modify the parameters in ${} based on the parameter description. Add or modify parameters as needed.

docker run -itd \
--name ${container_name} \
-v /sys/fs/cgroup:/sys/fs/cgroup:ro \
-p 8183:8183 \
-v /etc/localtime:/etc/localtime \
-v /usr/local/Ascend/driver:/usr/local/Ascend/driver \
-v /usr/local/bin/npu-smi:/usr/local/bin/npu-smi \
--shm-size 60g \
--device=/dev/davinci_manager \
--device=/dev/hisi_hdc \
--device=/dev/devmm_svm \
--device=/dev/davinci3 \
--network=host \
${image_name} bash

Parameter description:

--name ${container_name}: container name, which is used when you access the container. You can define a container name, for example, comfyui.
--device=/dev/davinci3: Mounts /dev/davinci3 of the host to /dev/davinci3 of the container. You can run the npu-smi info command to view the idle PU number. After changing the davinci number, you can change the mounted PU.
To start multi-PU inference, mount multiple PUs, for example, add --device=/dev/davinci2.
${image_name} indicates the image name.
-p 8183:8183: Enables a port. You can access the container service using http://host IP address:8183. (If the port number is in use, change it to another one.)

Access the snt9b container. Replace ${container_name} with the actual container name, for example, comfyui.
```
docker exec -it ${container_name} bash
```
Start the snt9b23 container image. Before starting the container, modify the parameters in ${} according to the parameter description. Add or modify parameters as needed.
```
docker run -itd \
--privileged \
--name ${container_name} \
-v /sys/fs/cgroup:/sys/fs/cgroup:ro \
-p 8183:8183 \
-v /etc/localtime:/etc/localtime \
-v /usr/local/Ascend/driver:/usr/local/Ascend/driver \
-v /usr/local/bin/npu-smi:/usr/local/bin/npu-smi \
--shm-size 60g \
--device=/dev/davinci_manager \
--device=/dev/hisi_hdc \
--device=/dev/devmm_svm \
--device=/dev/davinci3 \
--network=host \
${image_name} bash
```
Parameter description:
- --name ${container_name}: container name, which is used when you access the container. You can define a container name, for example, comfyui.
- --device=/dev/davinci3: Mounts /dev/davinci3 of the host to /dev/davinci3 of the container. You can run the npu-smi info command to view the idle PU number. After changing the davinci number, you can change the mounted PU.
- To start multi-PU inference, mount multiple PUs, for example, add --device=/dev/davinci2.
- ${image_name} indicates the image name.
- -p 8183:8183: Enables a port. You can access the container service using http://host IP address:8183. (If the port number is in use, change it to another one.)
Access the snt9b23 container. Replace ${container_name} with the actual container name, for example, comfyui.
```
docker exec -itu root ${container_name} bash
```

Step 3: Deploying Diffusers

Installing Dependencies and Model Packages

Run the command below to log in to Hugging Face and enter the token of your account to automatically download the model weights:
After the login is successful, start the Diffusers inference script to automatically download the model weights.
```
huggingface-cli login
```
You can also manually download the model weights and upload them to the /home/ma-user directory of the container. The official download addresses (login required) are as follows:
- Stable Diffusion 1.5: https://huggingface.co/stable-diffusion-v1-5/stable-diffusion-v1-5
- Stable Diffusion XL: https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/tree/main
- Stable Diffusion 3.5 Medium: https://huggingface.co/stabilityai/stable-diffusion-3.5-medium/tree/main
- Stable Diffusion 3.5 Large: https://huggingface.co/stabilityai/stable-diffusion-3.5-large/tree/main
- Hunyuan: https://huggingface.co/Tencent-Hunyuan/HunyuanDiT-Diffusers/tree/main
Install the plug-in code package.
1. Upload the AscendCloud-AIGC-xxx.zip plug-in code package to the /home/ma-user/temp directory of the container and decompress the package. For details about how to obtain the plug-in code package, see Table 2.
```
mkdir -p /home/ma-user/temp
cd /home/ma-user/temp
unzip AscendCloud-AIGC-*.zip # Decompress the package.
```
2. Decompress the AIGC package, go to the /home/ma-user/temp/aigc_inference/torch_npu/utils/ascend_diffusers directory, and install the ascend_diffusers package.
```
cd /home/ma-user/temp/aigc_inference/torch_npu/utils/ascend_diffusers
pip install -e .
```
3. Decompress the AIGC package, go to the /home/ma-user/temp/aigc_inference/torch_npu/utils/AscendX-MM directory, and install the AscendX-MM package.
```
cd /home/ma-user/temp/aigc_inference/torch_npu/utils/AscendX-MM
pip install -e .
```

Starting the Service

export MODEL_PATH='Path of the downloaded Hugging Face model', for example, /home/ma-user/stable-diffusion-3.5-medium. To let the system download the model automatically, do not add the model_id parameter.
cd /home/ma-user/temp/aigc_inference/torch_npu/diffusers/0.31.0/examples

The commands below start single-PU model inference. For details about the parameters, see the Readme file in the /home/ma-user/temp/aigc_inference/torch_npu/diffusers directory.

Commands for starting Stable Diffusion 1.5 model inference:

pip install diffusers==0.30.2
python sd_inference_example.py --model_name sd15 --model_id ${MODEL_PATH} --prompt 'a dog' --num_inference_steps 20 --width 512 768 1024 --height 512 768 1024

Commands for starting Stable Diffusion XL model inference:

pip install diffusers==0.30.2
python sd_inference_example.py --model_name sdxl --model_id ${MODEL_PATH} --prompt 'a dog' --num_inference_steps 20 --width 768 1024 --height 768 1024

Commands for starting Stable Diffusion 3.5 model inference:

pip install diffusers==0.31.0
python sd_inference_example.py --model_name sd35 --model_id ${MODEL_PATH} --prompt 'a dog' --num_inference_steps 28 --width 512 768 1024 --height 512 768 1024

Commands for starting Hunyuan model inference:

pip install diffusers==0.30.2
export INF_NAN_MODE_FORCE_DISABLE=1
python sd_inference_example.py --model_name hunyuan --model_id ${MODEL_PATH} --prompt 'a dog' --num_inference_steps 20 --width 512 768 1024 --height 512 768 1024

Step 4: Deploying ComfyUI

Installing Dependencies and Model Packages

Download the ComfyUI software package.
Download the ComfyUI source code.
```
git clone -b as0.3.45 https://github.com/mountain-lee1/ComfyUI.git
cd ComfyUI
```
If you cannot download the ComfyUI source code with the previous method, follow these steps: Download the source code to your PC and then upload it to the container, as shown in Figure 1.
1. Log in to https://github.com/mountain-lee1/ComfyUI, switch the tag to as0.3.45, click Code, and download the ComfyUI source code to the local PC by clicking Download ZIP.
  Figure 1 Downloading the ComfyUI source code
  
  Connect to the internet to access GitHub and download open-source software. Set up the network proxy if needed.
2. Upload the downloaded ComfyUI-as0.3.45.zip file to the /home/ma-user/ directory of the container and decompress the package.
```
cd /home/ma-user/
unzip ComfyUI-as0.3.45.zip
cd ComfyUI-as0.3.45
```
Install the dependencies and change torch in requirements.txt to torch==2.5.1.
```
pip install -r requirements.txt
```
Download the model weights.
sd1.5: Copy v1-5-pruned-emaonly.safetensors to the ComfyUI/models/checkpoints directory.

https://huggingface.co/runwayml/stable-diffusion-v1-5/resolve/main/v1-5-pruned-emaonly.safetensors

sdxl: Copy sd_xl_base_1.0.safetensors to the ComfyUI/models/checkpoints directory.

https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/resolve/main/sd_xl_base_1.0.safetensors

sd3.5: Copy sd3.5_medium.safetensors to the ComfyUI/models/checkpoints directory.

https://www.modelscope.cn/models/cutemodel/comfyui-sd3.5-medium/file/view/master/sd3.5_medium.safetensors?status=2

Copy diffusion_pytorch_model.safetensors to the ComfyUI/models/vae directory.

https://www.modelscope.cn/models/cutemodel/comfyui-sd3.5-medium/file/view/master/sd3.5vae.safetensors?status=2

In addition, you need to download three text_encoder-related models and copy them to the ComfyUI/models/clip directory.

https://huggingface.co/Comfy-Org/stable-diffusion-3.5-fp8/blob/main/text_encoders/clip_l.safetensors

https://huggingface.co/Comfy-Org/stable-diffusion-3.5-fp8/blob/main/text_encoders/clip_g.safetensors

https://huggingface.co/Comfy-Org/stable-diffusion-3.5-fp8/blob/main/text_encoders/t5xxl_fp16.safetensors

You also need to download the workflow required for inference using the ComfyUI framework:

https://openart.ai/workflows/sneakyrobot/sd35-basic/CX6pkiT9lzJPlTpF9Cgu
Install the plug-in code package.
1. Upload the AscendCloud-AIGC-xxx.zip plug-in code package to the /home/ma-user/ directory of the container and decompress the package. For details about how to obtain the plug-in code package, see Table 2.
```
cd /home/ma-user/
unzip AscendCloud-AIGC-*.zip 
```
2. Go to the ComfyUI/custom_nodes directory and copy the aigc_inference/torch_npu/comfyui/0.3.7/comfyui_ascend_node folder extracted from the AIGC package to the ComfyUI/custom_nodes directory.
```
cd ComfyUI/custom_nodes
cp -r /home/ma-user/aigc_inference/torch_npu/comfyui/0.3.45/comfyui_ascend_node /home/ma-user/ComfyUI/custom_nodes
```
3. Go to the aigc_inference/torch_npu/utils/ascend_diffusers directory and install the ascend_diffusers package.
```
cd /home/ma-user/aigc_inference/torch_npu/utils/ascend_diffusers
pip install -e .
```
4. Go to the aigc_inference/torch_npu/utils/AscendX-MM directory and install the AscendX-MM package.
```
cd /home/ma-user/aigc_inference/torch_npu/utils/AscendX-MM
pip install -e .
```

Enabling the High-Performance Mode

Enable the high-performance mode for the Stable Diffusion model and start the service.

export CACHE_MODE=1

Starting the Service

Run the ifconfig command to obtain the container IP address. (If the ifconfig command is invalid, use the ip addr command or other methods to obtain the container IP address.)
Figure 2 Obtaining the snt9b container IP address

Figure 3 Obtaining the snt9b23 container IP address

Go to the directory.

cd /home/ma-user/ComfyUI/custom_nodes
git config --global http.sslVerify false # Download nodes based on different workflows.
git clone https://github.com/ltdrdata/ComfyUI-Manager ComfyUI/custom_nodes/ComfyUI-Manager  # Download the ComfyUI manager for downloading nodes later.
cd /home/ma-user/ComfyUI

Start the service:

python main.py --port 8183 --listen 172.17.0.7 --force-fp16 --bf16-unet

Access the frontend page using http://{Host IP address}:8183.
1. Execute text-to-image.
  Figure 4 Accessing the frontend page
  
  Besides the default workflow, you can load additional ones like the one from Stable Diffusion 3.5. If the workflow has uninstalled nodes, it might cause errors. To fix this, use the new ComfyUI manager to download and install the missing nodes. Once done, restart the ComfyUI service, preferably using the terminal start command. Then, choose the right model for each node and run the inference service.
  
  Plan the new NPU checkpoints based on the arrows in the preceding figure.
  
  Figure 5 Planning checkpoints
  
  Select the weight file to be used from ckpt_name and click Queue Prompt to send it to the inference queue for inference.
  
  Figure 6 Entering the inference queue
  
  The following figure shows the success result.
  
  Figure 7 Inference succeeded