Creating an Endpoint

Description

During the development and operation of AI applications, enterprises and developers encounter issues such as chaotic inference service call management, difficulties in throttling, and ambiguous cost accounting. Multiple business lines sharing the same inference service lead to resource competition and unstable service performance. Additionally, the lack of effective call restrictions makes it challenging to track the resource consumption of each business module. MaaS offers endpoints, enabling users to establish independent call entry points, set rate-limiting rules, and achieve precise fee tracking based on endpoint names. This assists users in effectively managing inference service resources and optimizing usage costs.

Constraints

This function is only supported in CN-Hong Kong.
This function allows you to create custom endpoints based exclusively on built-in services.
Each account can have up to 30 endpoints.
Endpoints with the same name cannot exist under the same account. A deleted endpoint cannot be used during creation.
After an endpoint is created, the model service cannot be modified.
The created endpoints must comply with the rules and specifications of the platform and cannot be called in violation of regulations.

Billing

Endpoints are free to create. However, you may be charged for calling the model service or using resources. Check your service costs in the Billing Center by searching with the endpoint name.

Calling a built-in real-time inference service: The service is billed by token. The billing mode is the same as that of the selected foundation model. For details, see Model Service Prices.
ModelArts bills you for using its real-time services. For details, see Inference Deployment Billing Items.

Prerequisites

You have subscribed to a built-in service on MaaS or created a real-time service on ModelArts.

URI

POST /v1/{project_id}/maas/services/custom-endpoint/endpoint

**Table 1** URI parameters
Parameter	Mandatory	Type	Description
project-id	Yes	String	Definition: Project ID. For details about how to obtain the project ID, see Obtaining a Project ID and Name. Constraints: N/A. Range: N/A. Default Value: N/A.

Request Parameters

**Table 2** Request header parameters
Parameter	Mandatory	Type	Description
X-Auth-Token	Yes	String	Definition: User token. The token can be obtained by calling the IAM API used to obtain a user token. The value of X-Subject-Token in the response header is the user token. For details, see Authentication. Constraints: N/A. Range: N/A. Default Value: N/A.
Content-Type	Yes	String	Definition: Type of the message body. The value is fixed to application/json. Constraints: N/A. Range: N/A. Default Value: N/A.

**Table 3** Request body parameters
Parameter	Mandatory	Type	Description
model_name	No	String	Definition: When the resource type is custom_from_maas, set this parameter to the name of the associated model. The name is case sensitive. For details, see the ID in Obtaining the Model List (Models/GET). Constraints: N/A. Range: N/A. Default Value: N/A.
source	Yes	String	Definition: Resource type. Constraints: N/A. Range: custom_from_maas: Built-in service on the MaaS Real-Time Inference page, which is billed by MaaS. custom_from_modelarts_v2: ModelArts new-version real-time service, which is billed by ModelArts. Default Value: N/A.
rpm	No	Integer	Definition: Number of requests processed per minute. Constraints: This parameter is required when configuring throttling for an endpoint. For endpoints created from a built-in model, the maximum throttling value cannot exceed the throttling limit of the foundation model. Range: Each model has its own allowable range. You can view the corresponding range in ModelArts Studio (MaaS) console > Real-Time Inference > Built-in Services, under the Model flow limiting column for each model. Default Value: N/A.
tpm	No	Integer	Definition: Tokens processed per minute (input + output). Constraints: This parameter is required when configuring throttling for an endpoint. For endpoints created from a built-in service, the maximum throttling value cannot exceed the throttling limit of the foundation model. Range: Each model has its own allowable range. You can view the corresponding range in ModelArts Studio (MaaS) console > Real-Time Inference > Built-in Services, under the Model flow limiting column for each model. Default Value: N/A.
endpoint_name	Yes	String	Definition: Endpoint name entered by you. Constraints: The name of a user must be unique. Range: Enter 1 to 64 characters. The name must start with a letter, and can only contain letters, digits, hyphens (-), underscores (_), and dots (.). Default Value: N/A.
remark	No	String	Definition: Description of an endpoint. Constraints: N/A. Range: A string of up to 256 characters. Default Value: N/A.
region	No	String	Definition: When the resource type is custom_from_modelarts_v2, set this parameter to the region to be associated with the model. For details about how to obtain the value, see region_id in Obtaining Region Information. Constraints: N/A. Range: N/A. Default Value: N/A.
infer_service_id	No	String	Definition: When the resource type is custom_from_modelarts_v2, set this parameter to the inference service ID to be associated with the model. For details about how to obtain the value, see id of InferServerInfo in Obtaining Inference Service Information. Constraints: N/A. Range: N/A. Default Value: N/A.
workspace_id	No	String	Definition: When the resource type is custom_from_modelarts_v2, set this parameter to your workspace ID. For details about how to obtain the value, see ID in Obtaining Workspace Information. Constraints: N/A. Range: N/A. Default Value: N/A.
moderation	No	Boolean	Definition: Specifies whether to enable content guard. Configure content guard if you have a V2 or higher account level. Content guard is enabled by default. NOTE: To view your account level, log in to the MaaS console, click the username in the upper right corner, click Basic Information, and view the level next to the account name. Constraints: N/A. Range: true: Content guard blocks harmful content in inputs and outputs during model inference. Enabling it might slow down processing. false: When disabled, the model relies on their native security features. Default Value: true.
agreement_id	No	String	Definition: An agreement needs to be signed to enable or disable content guard. The protocol ID is transferred here. For details about how to obtain the value, see agreement_id in Obtaining the Latest Content Guard Disclaimer. Constraints: This parameter is mandatory when moderation is set to false. Range: N/A. Default Value: N/A.
template_id	Yes	String	Definition: Model template ID required for creating a custom endpoint. The following uses the Chrome browser as an example to describe how to obtain the model template ID: On the Model Square page of the MaaS console, press F12 to open the developer tool, click the Network tab, search for templates, and click the name starting with model-templates. On the Response tab page, search for the model name as required, for example, Qwen3-32B. The value of display_name is Qwen3-32B-32K, which is the model template ID. Figure 1 Obtaining the value of display_name Constraints: N/A. Range: N/A. Default Value: N/A.

Response Parameters

**Table 4** Response body parameters
Parameter	Type	Description
id	String	Definition: ID of an endpoint, which is generated after the endpoint is created. Range: N/A.
served_model_name	String	Definition: Name of the model called by the endpoint, which consists of the foundation model and six random characters. Range: N/A.
created_at	String	Definition: Creation time. Range: N/A.

**Table 5** Error response parameters
Parameter	Type	Description
error_msg	String	Definition: Error description. Range: N/A.
error_code	String	Definition: Error code, indicating the error type. Range: N/A.

Request Example

MaaS non-multimodal model: The following example shows how to use the DeepSeek-V3.1 model to create an endpoint of the custom_from_maas type. Replace the model and data files as required.

/v1/{project_id}/maas/services/custom-endpoint/endpoint
{
"model_name": "DeepSeek-V3.1 ",
"endpoint_name": "DeepSeek-V3.1",
"remark": ""DeepSeek-V3.1",
"source": "custom_from_maas",
"moderation": true,
"tpm": 0,
"rpm": 0,
"agreement_id": "af247c14-2bee-4d78-a5e8-a419ea62****"
}

ModelArts real-time service: Create an endpoint of the custom_from_modelarts_v2 type. Replace the model and data files as required.

/v1/{project_id}/maas/services/custom-endpoint/endpoint

{
	"endpoint_name": "test_endpoint",
	"source": "custom_from_modelarts_v2",
	"moderation": true,
	"agreement_id": "af247c14-2bee-4d78-a5e8-a419ea******",
	"region": "cn-southwest-2",
	"infer_service_id": "1b27760b-f9d9-42f1-8eea-e68aba******",
	"workspace_id": "0"
}

Response Example

MaaS non-multimodal model: Use the DeepSeek-V3.1 model to create an endpoint of the custom_from_maas type. Example responses are as follows:
- Success response. Status code: 200.
```
{
"id": "c4513589-df2e-4d58-ab0c-d5a6f2******",  
"served_model_name": "deepseek-v3.1-4ZGlnU",
"created_at": "2025-12-09T11:32:46Z"
}
```
- Failure response. Status code: 400.
```
{  
"error_code": "ModelArts.0103", 
"error_msg": "error reason"
 }
```

ModelArts real-time service: Create an endpoint of the custom_from_modelarts_v2 type. Example responses are as follows:

Success response. Status code: 200.

{
"id": "c4513589-df2e-4d58-ab0c-d5a6f2******",  
"served_model_name": "dpsk-v3-vllm-a3-02-4ZGlnU",
"created_at": "2025-12-09T11:32:46Z"
}

Failure response. Status code: 400.

{  
"error_code": "ModelArts.0103", 
"error_msg": "error reason"
 }

Status Codes

For details, see Status Codes.

Error Codes

For details, see Error Codes.

Parent Topic: Endpoint

Previous topic: Endpoint

Next topic: Editing an Endpoint

Feedback

Was this page helpful?

Helpful Not helpful

Provide feedback

Thank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.

The system is busy. Please try again later.

For any further questions, feel free to contact us through the chatbot.

Chatbot