Help Center/ ModelArts/ FAQs/ ModelArts Standard Inference Deployment/ How Do I Speed Up Real-Time Service Prediction in ModelArts?

Updated on 2025-10-24 GMT+08:00

View PDF

How Do I Speed Up Real-Time Service Prediction in ModelArts?

When deploying a real-time service, select instance specifications with better performance for faster prediction. For example, use GPs instead of CPUs.
When deploying a real-time service, add the number of instances.
If you set the number of instances to 1, the standalone computing mode is used. If you set the number of instances to a value greater than 1, the distributed computing mode is used. Configure this parameter based on site requirements.
The inference speed is closely related to the model complexity. Try to optimize the model for faster prediction.
ModelArts provides model version management to facilitate source tracing and repeated model tuning.

Parent topic: ModelArts Standard Inference Deployment

Feedback

Was this page helpful?

Helpful Not helpful

Provide feedback

Thank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.

The system is busy. Please try again later.

Which of the following issues have you encountered?

Content is inconsistent with the product UI

Unclear descriptions

Lack of examples or code

Incorrect steps

Can't find what I need

Lack of best practices

Feedback (optional)

0/500

Select at least one type of issue, and enter your comments or suggestions.

Enter a maximum of 500 characters.

Submit Cancel

For any further questions, feel free to contact us through the chatbot.