Creating an Endpoint
Description
During the development and operation of AI applications, enterprises and developers encounter issues such as chaotic inference service call management, difficulties in throttling, and ambiguous cost accounting. Multiple business lines sharing the same inference service lead to resource competition and unstable service performance. Additionally, the lack of effective call restrictions makes it challenging to track the resource consumption of each business module. MaaS offers endpoints, enabling users to establish independent call entry points, set rate-limiting rules, and achieve precise fee tracking based on endpoint names. This assists users in effectively managing inference service resources and optimizing usage costs.
Constraints
- This function is only supported in CN-Hong Kong.
- Each account can have up to ten endpoints.
- Endpoints with the same name cannot exist under the same account. A deleted endpoint cannot be used during creation.
- After an endpoint is created, the model service cannot be modified.
- The created endpoints must comply with the rules and specifications of the platform and cannot be called in violation of regulations.
Billing
Endpoints are free to create. However, you may be charged for calling the model service or using resources. Check your service costs in the Billing Center by searching with the endpoint name.
- Calling a built-in real-time inference service: The service is billed by token. The billing mode is the same as that of the selected foundation model. For details, see Model Service Prices.
- ModelArts bills you for using its real-time services. For details, see Inference Deployment Billing Items.
Prerequisites
You have subscribed to a built-in service on MaaS or created a real-time service on ModelArts.
URI
POST /v1/{project_id}/maas/services/custom-endpoint/endpoint
Parameter | Mandatory | Type | Description |
|---|---|---|---|
project-id | Yes | String | Definition: Project ID. For details about how to obtain the project ID, see Obtaining a Project ID and Name. Constraints: N/A. Range: N/A. Default Value: N/A. |
Request Parameters
Parameter | Mandatory | Type | Description |
|---|---|---|---|
X-Auth-Token | Yes | String | Definition: User token. The token can be obtained by calling the IAM API used to obtain a user token. The value of X-Subject-Token in the response header is the user token. For details, see Authentication. Constraints: N/A. Range: N/A. Default Value: N/A. |
Content-Type | Yes | String | Definition: Type of the message body. The value is fixed to application/json. Constraints: N/A. Range: N/A. Default Value: N/A. |
Parameter | Mandatory | Type | Description |
|---|---|---|---|
model_name | No | String | Definition: When the resource type is custom_from_maas, set this parameter to the name of the associated model. The name is case sensitive. For details, see the ID in Obtaining the Model List (Models/GET). Constraints: N/A. Range: N/A. Default Value: N/A. |
source | Yes | String | Definition: Resource type. Constraints: N/A. Range:
Default Value: N/A. |
rpm | No | Integer | Definition: Number of requests processed per minute. Constraints: This parameter is required when configuring throttling for an endpoint. For endpoints created from a built-in model, the maximum throttling value cannot exceed the throttling limit of the foundation model. Range: Each model has its own allowable range. You can view the corresponding range in ModelArts Studio (MaaS) console > Real-Time Inference > Built-in Services, under the Model flow limiting column for each model. Default Value: N/A. |
tpm | No | Integer | Definition: Tokens processed per minute (input + output). Constraints: This parameter is required when configuring throttling for an endpoint. For endpoints created from a built-in service, the maximum throttling value cannot exceed the throttling limit of the foundation model. Range: Each model has its own allowable range. You can view the corresponding range in ModelArts Studio (MaaS) console > Real-Time Inference > Built-in Services, under the Model flow limiting column for each model. Default Value: N/A. |
endpoint_name | Yes | String | Definition: Endpoint name entered by you. Constraints: The name of a user must be unique. Range: Enter 1 to 64 characters. The name must start with a letter, and can only contain letters, digits, hyphens (-), underscores (_), and dots (.). Default Value: N/A. |
remark | No | String | Definition: Description of an endpoint. Constraints: N/A. Range: A string of up to 256 characters. Default Value: N/A. |
region | No | String | Definition: When the resource type is custom_from_modelarts_v2, set this parameter to the region to be associated with the model. For details about how to obtain the value, see region_id in Obtaining Region Information. Constraints: N/A. Range: N/A. Default Value: N/A. |
infer_service_id | No | String | Definition: When the resource type is custom_from_modelarts_v2, set this parameter to the inference service ID to be associated with the model. For details about how to obtain the value, see id of InferServerInfo in Obtaining Inference Service Information. Constraints: N/A. Range: N/A. Default Value: N/A. |
workspace_id | No | String | Definition: When the resource type is custom_from_modelarts_v2, set this parameter to your workspace ID. For details about how to obtain the value, see ID in Obtaining Workspace Information. Constraints: N/A. Range: N/A. Default Value: N/A. |
moderation | No | Boolean | Definition: Specifies whether to enable content guard. Configure content guard if you have a V2 or higher account level. Content guard is enabled by default. NOTE: To view your account level, log in to the ModelArts Studio (MaaS) console, click the username in the upper right corner, click Basic Information, and view the level next to the account name. Constraints: This parameter is optional. The default value is true. Range:
Default Value: true. |
agreement_id | No | String | Definition: An agreement needs to be signed to enable or disable content guard. The protocol ID is transferred here. For details about how to obtain the value, see agreement_id in Obtaining the Latest Content Guard Disclaimer. Constraints: This parameter is mandatory when moderation is set to false. Range: N/A. Default Value: N/A. |
Response Parameters
Parameter | Type | Description |
|---|---|---|
id | String | Definition: ID of an endpoint, which is generated after the endpoint is created. Range: N/A. |
served_model_name | String | Definition: Name of the model called by the endpoint, which consists of the foundation model and six random characters. Range: N/A. |
created_at | String | Definition: Creation time. Range: N/A. |
Parameter | Type | Description |
|---|---|---|
error_msg | String | Definition: Error description. Range: N/A. |
error_code | String | Definition: Error code, indicating the error type. Range: N/A. |
Request Example
- Use the DeepSeek-V3.1 model to create an endpoint of the custom_from_maas type. Replace the model and data files as required.
/v1/{project_id}/maas/services/custom-endpoint/endpoint { "model_name": "DeepSeek-V3.1 ", "endpoint_name": "DeepSeek-Test", "remark": "DeepSeek endpoint test", "source": "custom_from_maas", "moderation": true, "tpm": 0, "rpm": 0, "agreement_id": "af247c14-2bee-4d78-a5e8-a419ea62b6c6" } - Create an endpoint of the custom_from_modelarts_v2 type. Replace the model and data files as required.
/v1/{project_id}/maas/services/custom-endpoint/endpoint { "endpoint_name": "test_endpoint", "source": "custom_from_modelarts_v2", "moderation": true, "agreement_id": "af247c14-2bee-4d78-a5e8-a419ea62b6c6", "region": "cn-southwest-2", "infer_service_id": "1b27760b-f9d9-42f1-8eea-e68aba09f039", "workspace_id": "0" }
Response Example
- The following is a response example for using the DeepSeek-V3.1 model to create an endpoint of the custom_from_maas type.
- Success response. Status code: 200.
{ "id": "c4513589-df2e-4d58-ab0c-d5a6f2******", "served_model_name": "deepseek-v3.1-4ZGlnU", "created_at": "2025-12-09T11:32:46Z" } - Failure response. Status code: 400.
{ "error_code": "ModelArts.0103", "error_msg": "error reason" }
- Success response. Status code: 200.
- The following is a response example for creating a custom access point of the custom_from_modelarts_v2 type.
- Success response. Status code: 200.
{ "id": "c4513589-df2e-4d58-ab0c-d5a6f2******", "served_model_name": "dpsk-v3-vllm-a3-02-4ZGlnU", "created_at": "2025-12-09T11:32:46Z" } - Failure response. Status code: 400.
{ "error_code": "ModelArts.0103", "error_msg": "error reason" }
- Success response. Status code: 200.
Status Codes
For details, see Status Codes.
Error Codes
For details, see Error Codes.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot
