Creating an Endpoint on ModelArts Studio (MaaS)

MaaS allows users to create endpoints. By specifying model parameters through these endpoints, users can manage traffic distribution and fine-tune operations across various service scenarios or model versions.

Operation Scenarios

During the development and operation of AI applications, enterprises and developers encounter issues such as chaotic inference service call management, difficulties in traffic control, and ambiguous cost accounting. Multiple business lines sharing the same inference service lead to resource competition and unstable service performance. Additionally, the lack of effective call restrictions makes it challenging to track the resource consumption of each business module.

MaaS offers endpoints, enabling users to establish independent call entry points, set rate-limiting rules, and achieve precise fee tracking based on endpoint names. This assists users in effectively managing inference service resources and optimizing usage costs.

Constraints

This function is only available in the CN-Hong Kong region. Resources cannot be called across regions.
Each account can have up to ten endpoints.
Each endpoint must have a unique name under the same account. A deleted endpoint name cannot be used when you create a new endpoint.
After an endpoint is created, the model service cannot be modified.
The created endpoints must comply with the rules and specifications of the platform and cannot be called in violation of regulations.

Billing

Endpoints are free to create. However, you may be charged for calling the model service or using resources. Check your service costs in the Billing Center by searching with the endpoint name.

ModelArts bills you for using its real-time services. For details, see Inference Deployment Billing Items.

Prerequisites

You have deployed a model as a real-time service on ModelArts. The configuration requirements for deploying real-time services are as follows:

In Service Call Settings, set Authentication Mode to None.
In Network Settings, enable Public Network Access and Private Network Connection Approval.

Creating an Endpoint

Log in to the ModelArts Studio (MaaS) console and select the target region on the top navigation bar.
In the navigation pane on the left, choose Real-Time Inference.
Click the Endpoint tab and click Create Endpoint in the upper right corner of the page.

On the Create Endpoint dialog box, configure parameters.

**Table 1** Parameters for creating an endpoint
Parameter	Description
Name	Name of an endpoint. The endpoint name must be unique and cannot contain special characters. The value can contain 1 to 64 characters.
Description	Description of an endpoint. The value can contain up to 256 characters.
Service Source	Select AI Development Platform ModelArts - Online Services. AI Development Platform ModelArts - Online Services: new-version real-time services, which are charged by ModelArts.
Model service	Click Select Model Service. In the Select Model Service dialog box, select the region and service as required, and click OK.
Access Point Flow Control	Select Access Point Flow Control and manually set the RPM and TPM traffic controls for the endpoint. When all endpoints using the same model share its total traffic limit equally, they will not compete for the available quota. Each endpoint allows you to configure specific RPM and TPM traffic controls, as long as they stay within your account's traffic limits. The values of RPM and TPM must be positive integers.

Confirm the configuration and billing information and click Create Now.
Once you create the endpoint, its details appear in the Endpoint tab. From there, you can see that its status is In use and call or test it online.

Figure 1 Endpoint created

To obtain bill details, click next to the endpoint ID to copy the ID. Go to the Billing Center, choose Billing > Transactions and Detailed Bills, and click Bill Details. From there, view the bill details using the obtained ID.

Figure 2 Copying an ID

Trying an Endpoint Online

Online experience is available only when the endpoint is in the In use status.

Log in to the ModelArts Studio (MaaS) console and select the target region on the top navigation bar.
In the navigation pane on the left, choose Real-Time Inference.
Click the Endpoint tab and click Try in the Operation column of the target endpoint.
For more information about online experience, see Trying Text-based Dialogue in ModelArts Studio (MaaS).

Calling an Endpoint

An endpoint can be called only when it is in the In use status. AI generates content from service calls. It does not reflect MaaS's views. The platform does not guarantee the legality, authenticity, and accuracy of the content, and does not assume any legal liability.

Log in to the ModelArts Studio (MaaS) console and select the target region on the top navigation bar.
In the navigation pane on the left, choose Real-Time Inference.
Click the Endpoint tab and click View Call Description in the Operation column of the target endpoint.
On the View Call Description page, obtain the API key as prompted, copy the call example, replace the API information and API key, and call the API.
- The model Parameter column in the Endpoint tab shows the model's name used in the code during service calls. You can call different endpoints based on different model parameters.
- For details about how to create an API key, see Managing API Keys in ModelArts Studio (MaaS).
- For details about the parameters in the call example, see Sending a Chat Request (Chat/Post).

Viewing the Call Statistics of an Endpoint

View call data and metrics for a custom endpoint during a chosen time frame. This includes call counts, failures, and token totals. These insights help learn about service usage, monitor performance shifts, assess models, identify issues, fix problems, and improve efficiency.

Log in to the ModelArts Studio (MaaS) console and select the target region on the top navigation bar.
In the navigation pane on the left, choose Real-Time Inference.
Click the Endpoint tab. Click in the Call Statistics column of the target endpoint to go to the Service Call Details page and check the details.
For more information about call statistics, see Viewing the Call Data and Monitoring Metrics of Real-Time Inference on ModelArts Studio (MaaS).

Editing an Endpoint

You can modify the information about an endpoint, such as the description and traffic limit. The model service of an endpoint cannot be modified.

Log in to the ModelArts Studio (MaaS) console and select the target region on the top navigation bar.
In the navigation pane on the left, choose Real-Time Inference.
Click the Endpoint tab and choose More > Edit in the Operation column of the target endpoint.
In the Edit Endpoint dialog box, modify parameters as required and click Update.
For details about the parameters, see Table 1.

Disabling or Enabling an Endpoint

An endpoint can be disabled when its status is In use. When you disable an endpoint, its inference feature turns off but can be reactivated later. Because of billing delays, you might continue receiving bills for the service even after disabling it.

An endpoint can be enabled when its status is Disabled.

Log in to the ModelArts Studio (MaaS) console and select the target region on the top navigation bar.
In the navigation pane on the left, choose Real-Time Inference. Choose Endpoint and perform the following operations as required.
- Disabling an endpoint
  1. Choose More > Disable in the Operation column of the target endpoint.
  2. In the displayed dialog box, enter YES and click OK.
    After the endpoint is disabled, the status of the endpoint is Disabled.
- Enabling an Endpoint
  1. Choose More > Enable in the Operation column of the target endpoint.
  2. In the displayed dialog box, click OK.
    After the endpoint is enabled, the status of the endpoint is In use.

Deleting an Endpoint

You can delete an endpoint if it is no longer needed. After an endpoint is deleted, the inference capability of the endpoint is disabled, and all information about the endpoint is deleted and cannot be restored. Exercise caution when performing this operation.

Because of billing delays, you might continue receiving bills for the service even after deleting it.

Log in to the ModelArts Studio (MaaS) console and select the target region on the top navigation bar.
In the navigation pane on the left, choose Real-Time Inference.
Click the Endpoint tab and choose More > Delete in the Operation column of the target endpoint.
In the displayed dialog box, confirm the information, enter DELETE, and click OK.
After the endpoint is deleted, it is not displayed in the Endpoint tab.

FAQs

What should I do if the number of endpoints I have created reaches the limit?
You can delete the endpoints that are no longer needed and then create new ones.
How can I determine the number of tokens consumed?
You can view the total number of tokens, input tokens, output tokens, and other information for model service calls on the Call Statistics page. For details, see "Viewing the Call Statistics of an Endpoint".
How long does it take for changes to the traffic limiting settings of an endpoint to take effect?
After you save the changes, the traffic limiting settings will take effect immediately, and subsequent calls will be executed according to the new rules.
What should I do if the billing status of my endpoint changes to invalid?
If the billing status of your endpoint becomes invalid, it indicates that the resource was frozen due to account arrears and exceeded the resource retention period, resulting in deletion. You may proceed to delete this endpoint and recreate it as needed.