Multimodal Model Inference Performance Test

Currently, performance tests of multimodal model inference supports the "language + image" and "language + video" modes. The acs-bench tool is used to implement these tests. For details about how to install the acs-bench tool, see Installing the acs-bench Tool.

Constraints

The current version supports "language + image" and "language + video" multimodal performance tests.

Obtaining Datasets

Generate an offline dataset containing images and text. For details about the parameters, see Dataset Generation Parameter Description.

# Generate a random dataset (image and text). Assume that the token length is 100 and the image length and width are (250, 250).
acs-bench generate dataset \
--tokenizer ./model/Qwen-2.5-VL-72B/ \
--output-path ./ \
--num-requests 10 \
--input-length 100 \
--modal-type "image-text" \
--config-option image_height:250,image_width:250

The generated JSON dataset file ${input-length}_image_${image_height}_${image_width}.json is stored in the --output-path directory.

Generate an offline dataset containing videos and text. For details about the parameters, see Dataset Generation Parameter Description.

# Generate a random dataset (video and text). Assume that the token length is 100, the video frame length and width are (250, 250), and the video duration and number of frames are (3, 25).
acs-bench generate dataset \
--tokenizer ./model/Qwen-2.5-VL-72B/ \
--output-path ./ \
--num-requests 10 \
--input-length 100 \
--modal-type "video-text" \
--config-option image_height:250,image_width:250,duration:3,fsp:25

The generated JSON dataset file ${input-length}_video_${image_height}_${image_width}_${duration}_${fps}.json is stored in the --output-path directory.

The --input-length and --num-requests parameters in the dataset generation command only support single values.

If you need to generate datasets with different specifications, modify the --input-length or --num-requests parameters to the desired values and then execute the command.

Performance Stress Testing Mode Verification

An example of using the acs-bench prof command for multimodal model performance stress testing is shown below. For details about parameter descriptions, see Parameter Descriptions for Usage Example. For details about output artifact descriptions, see Artifact Description.

# Take "image + text" as an example: Use a thread pool for concurrent testing. The default backend concurrency mode is threading-pool. You can also choose the asynchronous coroutine concurrency mode asyncio, the multi-process mode processing-pool, or the multi-thread mode threading-pool.
$ acs-bench prof \
--tokenizer ./model/Qwen-2.5-VL-72B/ \
--provider ./provider/providers.yaml \
--input-path ./built_in_dataset/ \
--concurrency-backend threading-pool \
--backend openai-chat --warmup 1 \
--epochs 2 \
--num-requests 1,2,4,8 --concurrency 1,2,4,8 \
--input-length 128,2048 --output-length 128,2048 \
--modal-type "image-text" \
--config-option image_height:250,image_width:250 \
--benchmark-save-path ./output_path/

# Take "video + text" as an example: Use a thread pool for concurrent testing. The default backend concurrency mode is threading-pool. You can also choose the asynchronous coroutine concurrency mode asyncio, the multi-process mode processing-pool, or the multi-thread mode threading-pool.
$ acs-bench prof \
--tokenizer ./model/Qwen-2.5-VL-72B/ \
--provider ./provider/providers.yaml \
--input-path ./built_in_dataset/ \
--concurrency-backend threading-pool \
--backend openai-chat --warmup 1 \
--epochs 2 \
--num-requests 1,2,4,8 --concurrency 1,2,4,8 \
--input-length 128,2048 --output-length 128,2048 \
--modal-type "video-text" \
--config-option image_height:250,image_width:250,duration:3,fsp:25 \
--benchmark-save-path ./output_path/

Ramp-Up Mode Verification

An example of using the acs-bench prof command for multimodal model ramp-up stress testing is shown below. For details about parameter descriptions, see Parameter Descriptions for Usage Example. For details about output artifact descriptions, see Artifact Description.

# Take "image + text" as an example: Use a thread pool for ramp-up testing. The default backend concurrency mode is threading-pool. You can also choose the asynchronous coroutine concurrency mode asyncio, the multi-process mode processing-pool, or the multi-thread mode threading-pool.
$ acs-bench prof \
--tokenizer ./model/Qwen-2.5-VL-72B/ \
--provider ./provider/providers.yaml \
--input-path ./built_in_dataset/ \
--concurrency-backend threading-pool \
--backend openai-chat --warmup 1 \
--epochs 2 \
--use-climb --climb-mode linear --growth-rate 2 --init-concurrency 1 --growth-interval 5000 \
--num-requests 1,2,4,8 --concurrency 1,2,4,8 \
--input-length 128,2048 --output-length 128,2048 \
--modal-type "image-text" \
--config-option image_height:250,image_width:250 \
--benchmark-save-path ./output_path/

# Take "video + text" as an example: Use a thread pool for ramp-up testing. The default backend concurrency mode is threading-pool. You can also choose the asynchronous coroutine concurrency mode asyncio, the multi-process mode processing-pool, or the multi-thread mode threading-pool.
$ acs-bench prof \
--tokenizer ./model/Qwen-2.5-VL-72B/ \
--provider ./provider/providers.yaml \
--input-path ./built_in_dataset/ \
--concurrency-backend threading-pool \
--backend openai-chat --warmup 1 \
--epochs 2 \
--use-climb --climb-mode linear --growth-rate 2 --init-concurrency 1 --growth-interval 5000 \
--num-requests 1,2,4,8 --concurrency 1,2,4,8 \
--input-length 128,2048 --output-length 128,2048 \
--modal-type "video-text" \
--config-option image_height:250,image_width:250,duration:3,fsp:25 \
--benchmark-save-path ./output_path/

The --input-length parameter values in the performance stress testing and ramp-up testing examples must exist in the pre-generated dataset. If they do not exist, refer to Obtaining Datasets to generate a dataset with the corresponding input length.

Parent topic: Inference Service Performance Evaluation

Previous topic: LLM Inference Performance Test

Next topic: Obtaining Model Inference Profiling Data

Feedback

Was this page helpful?

Helpful Not helpful

Provide feedback

Thank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.

The system is busy. Please try again later.

Which of the following issues have you encountered?

Content is inconsistent with the product UI

Unclear descriptions

Lack of examples or code

Incorrect steps

Can't find what I need

Lack of best practices

Feedback (optional)

0/500

Select at least one type of issue, and enter your comments or suggestions.

Enter a maximum of 500 characters.

Submit Cancel

For any further questions, feel free to contact us through the chatbot.

Chatbot