Help Center> ModelArts> Model Inference> Best Practices

Best Practices

Free Trial: One-Click Deployment of the Model for Identifying Shopping Mall and Supermarket Commodities: ModelArts AI Gallery provides a large number of free models for you to deploy with one click and experience AI.
Creating an AI Application Using a Custom Image: This section provides an example of importing a model from a custom image on ModelArts to help you quickly get familiar with the platform.
Managing Atlas 500 and Deploying a Model as an Edge Service: Huawei Atlas AI computing platform is based on Huawei Ascend AI processors. It provides a full-scenario AI infrastructure solution across devices, edge devices, and the cloud. Atlas edge devices connect to Huawei Cloud Intelligent EdgeFabric (IEF) and AI development platform ModelArts so that AI models can be quickly deployed on Atlas devices, meet application requirements in complex scenarios such as security protection, transportation, communities, campuses, shopping malls, and supermarkets.
Enabling an Inference Service to Access the Internet: This section describes how to enable an inference service to access the Internet.
End-to-End O&M of Inference Services : The end-to-end O&M of ModelArts inference services involves the entire AI process including algorithm development, service O&M, and service running.
Creating an AI Application Using a Custom Engine: You can select your image stored in SWR as the engine and specify a file directory in OBS as the model package. In this way, bring-your-own images can be used to meet your dedicated requirements.
High-Speed Access to Inference Services Through VPC Peering: By using such a connection, your service requests are directly sent to instances.
Full-Process Development of WebSocket Real-Time Services: If you select WebSocket during real-time service deployment, the API URL is a WebSocket address after the service is deployed. This case describes the full development process of a WebSocket real-time service.
Using a Large Model to Create an AI Application and Deploying a Real-Time Service: Large models keep expanding and can contain hundreds of billions and even trillions of parameters. A large model with hundreds of billions of parameters can exceed 200 GB, and poses new requirements for version management and production deployment of the platform. For example, importing AI applications requires dynamic adjustment of the tenant storage quota. Slow model loading and startup require a flexible timeout configuration in deployment. The service recovery time needs to be shortened in the event that the model needs to be reloaded upon a restart caused by a load exception. To meet the preceding requirements, the ModelArts inference platform provides a solution to AI application management and service deployment in large model scenarios.
Migrating the TensorFlow Serving Framework to a Custom Inference Engine: The migration of the TensorFlow Serving framework to ModelArts inference framework requires reconstruction of native TF Serving images. After that, ModelArts model version management and dynamic model loading can be used. This section describes how to adapt a native TF Serving image to a ModelArts custom inference engine.

Last Article: Viewing Monitoring Metrics

Next Article: Change History

Did this article solve your problem?

Thank you for your score！Your feedback would help us improve the website.

Products

Compute

Application

Dedicated Cloud

Storage

Management & Deployment

Migration

Network

Enterprise Intelligence

Video

Database

Edge Cloud Services

DevCloud

Security

Cloud Communications

Internet of Things

Solutions

Industry-Specific Solutions

General-Purpose Solutions

Security

DevOps

Enterprise Intelligence

Essential Platform

Big Data

Visual Cognition

Speech and Semantics

Support

Help Center

Customer Services

Developers

Console

语言 - Language

中国站 - 简体中文

中国站 - English

International - 简体中文

International - English

Help Center

Best Practices