Obtaining Model Inference Profiling Data
PyTorch Profiler is a performance analysis tool provided by PyTorch, used to deeply analyze performance bottlenecks during the model training/inference process, helping developers optimize computational efficiency, memory usage, and hardware utilization.
Ascend PyTorch Profiler is fully aligned with the usage in PyTorch-GPU scenarios, supporting the collection of PyTorch layer operator information, CANN layer operator information, underlying NPU operator information, and operator memory usage information, enabling a comprehensive analysis of the performance status of PyTorch AI tasks.
However, using PyTorch Profiler can result in large data volumes, longer data collection times, and performance overhead that may lead to inaccurate data and distorted results. To address these issues, a lightweight performance analysis tool called Service Profiler has been introduced, which is used to analyze performance issues at the service request level. Service Profiler currently gathers profiling data of interest to users by pre-instrumenting key points within the service framework. The current capabilities supported include observing the batch size of internal service requests, sequence length, and the execution time of a single batch iteration.
Constraints
Before using Service Profiler, ensure that the inference service can be started and handle requests normally. The Service Profiler is now included in the versioned image as a Python library.
Checking if the Service Profiler Tool is Installed
In ModelArts 6.5.906 and later, the acs_service_profiler-1.0.1-py3-none-any.whl package is installed by default, so there is no need for a separate installation. The package is located in the llm_tools directory within the AscendCloud-LLM-xxx.zip software package.
Check if the acs-service-profiler tool is already installed:
$ pip show acs-service-profiler
If it is not installed, refer to Installing the acs-bench Tool for instructions on installing the acs-bench tool. The installation command is as follows:
$ pip install llm_tools/acs_service_profiler-*-py3-none-any.whl
Note: Both Ascend PyTorch Profiler and Service Profiler are features enabled during the performance tuning phase of development and are not recommended for use in production service states. Generally, using Ascend PyTorch Profiler involves collecting a small amount of request data (one or two requests) for analysis, while Service Profiler collects data over a period of requests (hundreds or thousands) for analysis. The following section explains how to collect data using Ascend PyTorch Profiler and Service Profiler in a real-time service scenario.
Real-Time Service Profiling Through start_profile and stop_profile
- Before starting the inference service, set the environment variables:
export VLLM_TORCH_PROFILER_DIR=/home/ma-user/profiler_dir # Enable Ascend PyTorch Profiler # export VLLM_SERVICE_PROFILER_DIR =/home/ma-user/profiler_dir # Enable Service Profiler
VLLM_TORCH_PROFILER_DIR/VLLM_SERVICE_PROFILER_DIR is used to enable the Ascend PyTorch Profiler or Service Profiler. The collected profiler data is stored in the path specified by the environment variable. Note that both cannot be enabled simultaneously.
- After setting the environment variables, start the inference service.
For details about how to start the inference service, see Starting an LLM-powered Inference Service.
- Send a start_profile POST request.
curl -X POST http://${IP}:${PORT}/start_profileParameters
- IP: The IP address where the service is deployed.
- PORT: The port where the service is deployed.
- Send an actual request.
For sending actual requests, see LLM Inference Performance Test.
- Send a stop_profile POST request.
curl -X POST http://${IP}:${PORT}/stop_profileThe parameters are same as the start_profile POST request.
- Perform post-processing and visualization.
For visualizing data collected by Ascend PyTorch Profiler, it is recommended to use the MindStudio Insight tool. The visualization effect is shown in the following figure.

For more information on MindStudio Insight, see the MindStudio Insight tool documentation.
To visualize data collected by Service Profiler, use the acsprof tool for post-processing and then visualize the data in a web page that supports the Google tracing format. The specific steps are as follows:
Post-Processing to Generate Visualization Filesacsprof export -i ${input_path}The following table describes the parameters.
Parameter
Type
Description
Mandatory
-i / --input_path
String
Specifies the path to the Service Profiler collection folder, supporting both parent and subfolders.
Yes
-o / --output_path
String
Specifies the output path for the post-processed files, defaulting to the input folder path.
No
-f / --force_reparse
Bool
Specifies whether to perform forced re-parsing for already parsed folders. The default value is False (no forced re-parsing). In scenarios where multiple batches of data are collected, the first batch will be parsed automatically, and subsequent batches will not be parsed automatically. Set this to True to enable forced re-parsing.
No
Example:
acsprof export -i /home/ma-user/profiler_dir
Normal log output

Post-processing parses the profiler data once more. It exports metrics like TTFT, TPOT, and framework throughput. It also creates a visualization file named trace_view.json. For multiple instances, the tool combines their timeline data into an overview_trace_view.json file. You can drag these files into chrome://tracing/ or https://ui.perfetto.dev/ for visual analysis.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot


