Adapting Diffusers and ComfyUI Kits to PyTorch NPU for Inference Using ModelArts Lite Server (6.5.907)
This guide explains how to deploy the Stable Diffusion and HUNYUAN text-to-image models, using the Diffusers and ComfyUI frameworks on the ModelArts Lite Server. It also covers running these models with NPU-based inference.
Solution Overview
This solution describes how to use NPU compute resources to deploy the Diffusers and ComfyUI frameworks for Server-based inference. First, contact Huawei's enterprise technical support team to buy the required Server resources.
This solution is designed exclusively for enterprise users.
Resource Specifications
You are advised to use ModelArts Lite Server's Snt9B and Snt9B23 resources for inference deployment.
Name |
Version |
---|---|
driver |
25.2.1 |
PyTorch |
pytorch_2.5.1 |
Obtaining Software Packages and Images
Category |
Name |
How to Obtain |
---|---|---|
Plug-in code package |
AscendCloud-6.5.907-xxx.zip in the AscendCloud-6.5.907 software package xxx in the file name indicates the timestamp. The timestamp is the actual release time of the package. |
Download ModelArts 6.5.907.2 from Support-E.
NOTE:
If the software information does not appear when opening the download link, you lack access permissions. Contact your company's Huawei technical support for assistance with downloading. |
Snt9b base image |
CN Southwest-Guiyang1: swr.cn-southwest-2.myhuaweicloud.com/atelier/pytorch_ascend:pytorch_2.5.1-cann_8.2.rc1-py_3.11-hce_2.0.2503-aarch64-snt9b-20250729103313-3a25129 CN-Hong Kong: swr.ap-southeast-1.myhuaweicloud.com/atelier/pytorch_ascend:pytorch_2.5.1-cann_8.2.rc1-py_3.11-hce_2.0.2503-aarch64-snt9b-20250729103313-3a25129 |
Pull the image from SWR. |
Snt9b23 base image |
CN Southwest-Guiyang1: swr.cn-southwest-2.myhuaweicloud.com/atelier/pytorch_ascend:pytorch_2.5.1-cann_8.2.rc1-py_3.11-hce_2.0.2503-aarch64-snt9b23-20250729103313-3a25129 CN-Hong Kong: swr.ap-southeast-1.myhuaweicloud.com/atelier/pytorch_ascend:pytorch_2.5.1-cann_8.2.rc1-py_3.11-hce_2.0.2503-aarch64-snt9b23-20250729103313-3a25129 |
Pull the image from SWR. |
Kit |
Model |
---|---|
Diffusers |
SD1.5 SDXL SD3.5 HUNYUAN |
ComfyUI |
SD1.5 SDXL SD3.5 |
Step 1: Preparing the Environment
- Enable Lite Server resources and obtain passwords. Verify SSH access to all servers. Confirm proper network connectivity between them.
If no resource specifications are available when you purchase Server resources, contact Huawei Cloud technical support.
If a container is used or shared by multiple users, you should restrict the container from accessing the OpenStack management address (169.254.169.254) to prevent host machine metadata acquisition. For details, see Forbidding Containers to Obtain Host Machine Metadata.
- Check the environment.
- Log in to the server via SSH and check the NPU status. Obtain the NPU device information:
npu-smi info
If an error occurs, the NPU devices on the server may not be properly installed, or the NPU image may be mounted to another container. Install the firmware and driver or release the mounted NPUs.
- Check whether Docker is installed.
docker -v # Check whether Docker is installed.
If Docker is not installed, run this command:
yum install -y docker-engine.aarch64 docker-engine-selinux.noarch docker-runc.aarch64
- Configure IP forwarding for intra-container network accesses. Run the following command to check the value of net.ipv4.ip_forward. Skip this step if the value is 1.
sysctl -p | grep net.ipv4.ip_forward
If the value is not 1, configure IP forwarding:sed -i 's/net\.ipv4\.ip_forward=0/net\.ipv4\.ip_forward=1/g' /etc/sysctl.conf sysctl -p | grep net.ipv4.ip_forward
- Log in to the server via SSH and check the NPU status. Obtain the NPU device information:
- Obtain the base image. Use official images to deploy inference services. For details about the image path {image_url}, see Table 2.
docker pull {image_url}
To log in to the SWR console, log in to the SWR console and obtain the login command by referring to the figure below.
Step 2: Starting the Container Image
docker run -itd \ --name ${container_name} \ -v /sys/fs/cgroup:/sys/fs/cgroup:ro \ -p 8183:8183 \ -v /etc/localtime:/etc/localtime \ -v /usr/local/Ascend/driver:/usr/local/Ascend/driver \ -v /usr/local/bin/npu-smi:/usr/local/bin/npu-smi \ --shm-size 60g \ --device=/dev/davinci_manager \ --device=/dev/hisi_hdc \ --device=/dev/devmm_svm \ --device=/dev/davinci3 \ --network=host \ ${image_name} bash
Parameter description:
- --name ${container_name}: container name, which is used when you access the container. You can define a container name, for example, comfyui.
- --device=/dev/davinci3: Mounts /dev/davinci3 of the host to /dev/davinci3 of the container. You can run the npu-smi info command to view the idle PU number. After changing the davinci number, you can change the mounted PU.
- To start multi-PU inference, mount multiple PUs, for example, add --device=/dev/davinci2.
- ${image_name} indicates the image name.
- -p 8183:8183: Enables a port. You can access the container service using http://host IP address:8183. (If the port number is in use, change it to another one.)
- Access the snt9b container. Replace ${container_name} with the actual container name, for example, comfyui.
docker exec -it ${container_name} bash
- Start the snt9b23 container image. Before starting the container, modify the parameters in ${} according to the parameter description. Add or modify parameters as needed.
docker run -itd \ --privileged \ --name ${container_name} \ -v /sys/fs/cgroup:/sys/fs/cgroup:ro \ -p 8183:8183 \ -v /etc/localtime:/etc/localtime \ -v /usr/local/Ascend/driver:/usr/local/Ascend/driver \ -v /usr/local/bin/npu-smi:/usr/local/bin/npu-smi \ --shm-size 60g \ --device=/dev/davinci_manager \ --device=/dev/hisi_hdc \ --device=/dev/devmm_svm \ --device=/dev/davinci3 \ --network=host \ ${image_name} bash
Parameter description:
- --name ${container_name}: container name, which is used when you access the container. You can define a container name, for example, comfyui.
- --device=/dev/davinci3: Mounts /dev/davinci3 of the host to /dev/davinci3 of the container. You can run the npu-smi info command to view the idle PU number. After changing the davinci number, you can change the mounted PU.
- To start multi-PU inference, mount multiple PUs, for example, add --device=/dev/davinci2.
- ${image_name} indicates the image name.
- -p 8183:8183: Enables a port. You can access the container service using http://host IP address:8183. (If the port number is in use, change it to another one.)
- Access the snt9b23 container. Replace ${container_name} with the actual container name, for example, comfyui.
docker exec -itu root ${container_name} bash
Step 3: Deploying Diffusers
Installing Dependencies and Model Packages
- Run the command below to log in to Hugging Face and enter the token of your account to automatically download the model weights:
After the login is successful, start the Diffusers inference script to automatically download the model weights.
huggingface-cli login
You can also manually download the model weights and upload them to the /home/ma-user directory of the container. The official download addresses (login required) are as follows:
- Stable Diffusion 1.5: https://huggingface.co/stable-diffusion-v1-5/stable-diffusion-v1-5
- Stable Diffusion XL: https://huggingface.co/stabilityai/stable-diffusion-xl-base-1.0/tree/main
- Stable Diffusion 3.5 Medium: https://huggingface.co/stabilityai/stable-diffusion-3.5-medium/tree/main
- Stable Diffusion 3.5 Large: https://huggingface.co/stabilityai/stable-diffusion-3.5-large/tree/main
- Hunyuan: https://huggingface.co/Tencent-Hunyuan/HunyuanDiT-Diffusers/tree/main
- Install the plug-in code package.
- Upload the AscendCloud-AIGC-xxx.zip plug-in code package to the /home/ma-user/temp directory of the container and decompress the package. For details about how to obtain the plug-in code package, see Table 2.
mkdir -p /home/ma-user/temp cd /home/ma-user/temp unzip AscendCloud-AIGC-*.zip # Decompress the package.
- Decompress the AIGC package, go to the /home/ma-user/temp/aigc_inference/torch_npu/utils/ascend_diffusers directory, and install the ascend_diffusers package.
cd /home/ma-user/temp/aigc_inference/torch_npu/utils/ascend_diffusers pip install -e .
- Decompress the AIGC package, go to the /home/ma-user/temp/aigc_inference/torch_npu/utils/AscendX-MM directory, and install the AscendX-MM package.
cd /home/ma-user/temp/aigc_inference/torch_npu/utils/AscendX-MM pip install -e .
- Upload the AscendCloud-AIGC-xxx.zip plug-in code package to the /home/ma-user/temp directory of the container and decompress the package. For details about how to obtain the plug-in code package, see Table 2.
export MODEL_PATH='Path of the downloaded Hugging Face model', for example, /home/ma-user/stable-diffusion-3.5-medium. To let the system download the model automatically, do not add the model_id parameter. cd /home/ma-user/temp/aigc_inference/torch_npu/diffusers/0.31.0/examples
The commands below start single-PU model inference. For details about the parameters, see the Readme file in the /home/ma-user/temp/aigc_inference/torch_npu/diffusers directory.
- Commands for starting Stable Diffusion 1.5 model inference:
pip install diffusers==0.30.2 python sd_inference_example.py --model_name sd15 --model_id ${MODEL_PATH} --prompt 'a dog' --num_inference_steps 20 --width 512 768 1024 --height 512 768 1024
- Commands for starting Stable Diffusion XL model inference:
pip install diffusers==0.30.2 python sd_inference_example.py --model_name sdxl --model_id ${MODEL_PATH} --prompt 'a dog' --num_inference_steps 20 --width 768 1024 --height 768 1024
- Commands for starting Stable Diffusion 3.5 model inference:
pip install diffusers==0.31.0 python sd_inference_example.py --model_name sd35 --model_id ${MODEL_PATH} --prompt 'a dog' --num_inference_steps 28 --width 512 768 1024 --height 512 768 1024
- Commands for starting Hunyuan model inference:
pip install diffusers==0.30.2 export INF_NAN_MODE_FORCE_DISABLE=1 python sd_inference_example.py --model_name hunyuan --model_id ${MODEL_PATH} --prompt 'a dog' --num_inference_steps 20 --width 512 768 1024 --height 512 768 1024
Step 4: Deploying ComfyUI
Installing Dependencies and Model Packages
- Download the ComfyUI software package.
Download the ComfyUI source code.
git clone -b as0.3.45 https://github.com/mountain-lee1/ComfyUI.git cd ComfyUI
If you cannot download the ComfyUI source code with the previous method, follow these steps: Download the source code to your PC and then upload it to the container, as shown in Figure 1.- Log in to https://github.com/mountain-lee1/ComfyUI, switch the tag to as0.3.45, click Code, and download the ComfyUI source code to the local PC by clicking Download ZIP.
Connect to the internet to access GitHub and download open-source software. Set up the network proxy if needed.
- Upload the downloaded ComfyUI-as0.3.45.zip file to the /home/ma-user/ directory of the container and decompress the package.
cd /home/ma-user/ unzip ComfyUI-as0.3.45.zip cd ComfyUI-as0.3.45
- Log in to https://github.com/mountain-lee1/ComfyUI, switch the tag to as0.3.45, click Code, and download the ComfyUI source code to the local PC by clicking Download ZIP.
- Install the dependencies and change torch in requirements.txt to torch==2.5.1.
pip install -r requirements.txt
- Download the model weights.
sd1.5: Copy v1-5-pruned-emaonly.safetensors to the ComfyUI/models/checkpoints directory.
https://huggingface.co/runwayml/stable-diffusion-v1-5/resolve/main/v1-5-pruned-emaonly.safetensors
sdxl: Copy sd_xl_base_1.0.safetensors to the ComfyUI/models/checkpoints directory.
sd3.5: Copy sd3.5_medium.safetensors to the ComfyUI/models/checkpoints directory.
Copy diffusion_pytorch_model.safetensors to the ComfyUI/models/vae directory.
In addition, you need to download three text_encoder-related models and copy them to the ComfyUI/models/clip directory.
https://huggingface.co/Comfy-Org/stable-diffusion-3.5-fp8/blob/main/text_encoders/clip_l.safetensors
https://huggingface.co/Comfy-Org/stable-diffusion-3.5-fp8/blob/main/text_encoders/clip_g.safetensors
You also need to download the workflow required for inference using the ComfyUI framework:
https://openart.ai/workflows/sneakyrobot/sd35-basic/CX6pkiT9lzJPlTpF9Cgu
- Install the plug-in code package.
- Upload the AscendCloud-AIGC-xxx.zip plug-in code package to the /home/ma-user/ directory of the container and decompress the package. For details about how to obtain the plug-in code package, see Table 2.
cd /home/ma-user/ unzip AscendCloud-AIGC-*.zip
- Go to the ComfyUI/custom_nodes directory and copy the aigc_inference/torch_npu/comfyui/0.3.7/comfyui_ascend_node folder extracted from the AIGC package to the ComfyUI/custom_nodes directory.
cd ComfyUI/custom_nodes cp -r /home/ma-user/aigc_inference/torch_npu/comfyui/0.3.45/comfyui_ascend_node /home/ma-user/ComfyUI/custom_nodes
- Go to the aigc_inference/torch_npu/utils/ascend_diffusers directory and install the ascend_diffusers package.
cd /home/ma-user/aigc_inference/torch_npu/utils/ascend_diffusers pip install -e .
- Go to the aigc_inference/torch_npu/utils/AscendX-MM directory and install the AscendX-MM package.
cd /home/ma-user/aigc_inference/torch_npu/utils/AscendX-MM pip install -e .
- Upload the AscendCloud-AIGC-xxx.zip plug-in code package to the /home/ma-user/ directory of the container and decompress the package. For details about how to obtain the plug-in code package, see Table 2.
Enabling the High-Performance Mode
export CACHE_MODE=1
Starting the Service
- Run the ifconfig command to obtain the container IP address. (If the ifconfig command is invalid, use the ip addr command or other methods to obtain the container IP address.)
Figure 2 Obtaining the snt9b container IP addressFigure 3 Obtaining the snt9b23 container IP address
- Go to the directory.
cd /home/ma-user/ComfyUI/custom_nodes git config --global http.sslVerify false # Download nodes based on different workflows. git clone https://github.com/ltdrdata/ComfyUI-Manager ComfyUI/custom_nodes/ComfyUI-Manager # Download the ComfyUI manager for downloading nodes later. cd /home/ma-user/ComfyUI
- Start the service:
python main.py --port 8183 --listen 172.17.0.7 --force-fp16 --bf16-unet
- Access the frontend page using http://{Host IP address}:8183.
- Execute text-to-image.
Figure 4 Accessing the frontend page
Besides the default workflow, you can load additional ones like the one from Stable Diffusion 3.5. If the workflow has uninstalled nodes, it might cause errors. To fix this, use the new ComfyUI manager to download and install the missing nodes. Once done, restart the ComfyUI service, preferably using the terminal start command. Then, choose the right model for each node and run the inference service.
Plan the new NPU checkpoints based on the arrows in the preceding figure.
Figure 5 Planning checkpointsSelect the weight file to be used from ckpt_name and click Queue Prompt to send it to the inference queue for inference.
Figure 6 Entering the inference queueThe following figure shows the success result.
Figure 7 Inference succeeded
- Execute text-to-image.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot