Multimodal Model Inference Performance Test
Currently, performance tests of multimodal model inference supports the "language + image" and "language + video" modes. The acs-bench tool is used to implement these tests. For details about how to install the acs-bench tool, see Installing the acs-bench Tool.
Constraints
The current version supports "language + image" and "language + video" multimodal performance tests.
Obtaining Datasets
- Generate an offline dataset containing images and text. For details about the parameters, see Dataset Generation Parameter Description.
# Generate a random dataset (image and text). Assume that the token length is 100 and the image length and width are (250, 250). acs-bench generate dataset \ --tokenizer ./model/Qwen-2.5-VL-72B/ \ --output-path ./ \ --num-requests 10 \ --input-length 100 \ --modal-type "image-text" \ --config-option image_height:250,image_width:250
The generated JSON dataset file ${input-length}_image_${image_height}_${image_width}.json is stored in the --output-path directory.
- Generate an offline dataset containing videos and text. For details about the parameters, see Dataset Generation Parameter Description.
# Generate a random dataset (video and text). Assume that the token length is 100, the video frame length and width are (250, 250), and the video duration and number of frames are (3, 25). acs-bench generate dataset \ --tokenizer ./model/Qwen-2.5-VL-72B/ \ --output-path ./ \ --num-requests 10 \ --input-length 100 \ --modal-type "video-text" \ --config-option image_height:250,image_width:250,duration:3,fsp:25
The generated JSON dataset file ${input-length}_video_${image_height}_${image_width}_${duration}_${fps}.json is stored in the --output-path directory.
The --input-length and --num-requests parameters in the dataset generation command only support single values.
If you need to generate datasets with different specifications, modify the --input-length or --num-requests parameters to the desired values and then execute the command.
Performance Stress Testing Mode Verification
An example of using the acs-bench prof command for multimodal model performance stress testing is shown below. For details about parameter descriptions, see Parameter Descriptions for Usage Example. For details about output artifact descriptions, see Artifact Description.
# Take "image + text" as an example: Use a thread pool for concurrent testing. The default backend concurrency mode is threading-pool. You can also choose the asynchronous coroutine concurrency mode asyncio, the multi-process mode processing-pool, or the multi-thread mode threading-pool. $ acs-bench prof \ --tokenizer ./model/Qwen-2.5-VL-72B/ \ --provider ./provider/providers.yaml \ --input-path ./built_in_dataset/ \ --concurrency-backend threading-pool \ --backend openai-chat --warmup 1 \ --epochs 2 \ --num-requests 1,2,4,8 --concurrency 1,2,4,8 \ --input-length 128,2048 --output-length 128,2048 \ --modal-type "image-text" \ --config-option image_height:250,image_width:250 \ --benchmark-save-path ./output_path/ # Take "video + text" as an example: Use a thread pool for concurrent testing. The default backend concurrency mode is threading-pool. You can also choose the asynchronous coroutine concurrency mode asyncio, the multi-process mode processing-pool, or the multi-thread mode threading-pool. $ acs-bench prof \ --tokenizer ./model/Qwen-2.5-VL-72B/ \ --provider ./provider/providers.yaml \ --input-path ./built_in_dataset/ \ --concurrency-backend threading-pool \ --backend openai-chat --warmup 1 \ --epochs 2 \ --num-requests 1,2,4,8 --concurrency 1,2,4,8 \ --input-length 128,2048 --output-length 128,2048 \ --modal-type "video-text" \ --config-option image_height:250,image_width:250,duration:3,fsp:25 \ --benchmark-save-path ./output_path/
Ramp-Up Mode Verification
An example of using the acs-bench prof command for multimodal model ramp-up stress testing is shown below. For details about parameter descriptions, see Parameter Descriptions for Usage Example. For details about output artifact descriptions, see Artifact Description.
# Take "image + text" as an example: Use a thread pool for ramp-up testing. The default backend concurrency mode is threading-pool. You can also choose the asynchronous coroutine concurrency mode asyncio, the multi-process mode processing-pool, or the multi-thread mode threading-pool. $ acs-bench prof \ --tokenizer ./model/Qwen-2.5-VL-72B/ \ --provider ./provider/providers.yaml \ --input-path ./built_in_dataset/ \ --concurrency-backend threading-pool \ --backend openai-chat --warmup 1 \ --epochs 2 \ --use-climb --climb-mode linear --growth-rate 2 --init-concurrency 1 --growth-interval 5000 \ --num-requests 1,2,4,8 --concurrency 1,2,4,8 \ --input-length 128,2048 --output-length 128,2048 \ --modal-type "image-text" \ --config-option image_height:250,image_width:250 \ --benchmark-save-path ./output_path/ # Take "video + text" as an example: Use a thread pool for ramp-up testing. The default backend concurrency mode is threading-pool. You can also choose the asynchronous coroutine concurrency mode asyncio, the multi-process mode processing-pool, or the multi-thread mode threading-pool. $ acs-bench prof \ --tokenizer ./model/Qwen-2.5-VL-72B/ \ --provider ./provider/providers.yaml \ --input-path ./built_in_dataset/ \ --concurrency-backend threading-pool \ --backend openai-chat --warmup 1 \ --epochs 2 \ --use-climb --climb-mode linear --growth-rate 2 --init-concurrency 1 --growth-interval 5000 \ --num-requests 1,2,4,8 --concurrency 1,2,4,8 \ --input-length 128,2048 --output-length 128,2048 \ --modal-type "video-text" \ --config-option image_height:250,image_width:250,duration:3,fsp:25 \ --benchmark-save-path ./output_path/
The --input-length parameter values in the performance stress testing and ramp-up testing examples must exist in the pre-generated dataset. If they do not exist, refer to Obtaining Datasets to generate a dataset with the corresponding input length.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot