Model Deployment

Generally, AI model deployment and large-scale implementation are complex.

For example, in a smart transportation project, the trained model needs to be deployed to the cloud, edges, and devices. It takes time and effort to deploy the model on the devices, for example, deploying a model on cameras of different specifications and vendors. ModelArts supports one-click deployment of a trained model on various devices for different application scenarios. In addition, it provides a set of secure and reliable one-stop deployment modes for individual developers, enterprises, and device manufacturers.

Figure 1 Process of deploying a model
  • The real-time inference service features high concurrency, low latency, and elastic scaling, and supports multi-model gray release and A/B testing.
  • Models can be deployed as real-time and batch inference services on the cloud, edge, and devices.
  • Models can be directly pushed to edge nodes with just one click. You only need to select the edge node to which you want to push the model.
  • ModelArts is optimized based on the high-performance AI inference chip Ascend 310. It can process PBs of inference data within a single day, publish over 1 million inference APIs on the cloud, and control inference network latency to milliseconds.