Help Center/ ModelArts/ FAQs/ Service Deployment/ Service Deployment/ Real-Time Services/ How Do I Speed Up Real-Time Prediction?
Updated on 2024-06-15 GMT+08:00

How Do I Speed Up Real-Time Prediction?

  • When deploying a real-time service, select the compute nodes with higher specifications for better performance. For example, use GPUs instead of CPUs.
  • When deploying a real-time service, add the number of compute nodes.

    If you set Compute Nodes to 1, standalone computing is used. If you set Compute Nodes to a value greater than 1, distributed computing is used. Configure this parameter based on site requirements.

  • The inference speed is closely related to the model complexity. Try to optimize the model for faster prediction.
    ModelArts provides model version management to facilitate source tracing and repeated model tuning.
    Figure 1 Deploying a real-time service