Help Center/MaaS/API Reference/Endpoint/Creating an Endpoint
Updated on 2026-02-26 GMT+08:00

Creating an Endpoint

Description

During the development and operation of AI applications, enterprises and developers encounter issues such as chaotic inference service call management, difficulties in throttling, and ambiguous cost accounting. Multiple business lines sharing the same inference service lead to resource competition and unstable service performance. Additionally, the lack of effective call restrictions makes it challenging to track the resource consumption of each business module. MaaS offers endpoints, enabling users to establish independent call entry points, set rate-limiting rules, and achieve precise fee tracking based on endpoint names. This assists users in effectively managing inference service resources and optimizing usage costs.

Constraints

  • This function is only supported in CN-Hong Kong.
  • Each account can have up to ten endpoints.
  • Endpoints with the same name cannot exist under the same account. A deleted endpoint cannot be used during creation.
  • After an endpoint is created, the model service cannot be modified.
  • The created endpoints must comply with the rules and specifications of the platform and cannot be called in violation of regulations.

Billing

Endpoints are free to create. However, you may be charged for calling the model service or using resources. Check your service costs in the Billing Center by searching with the endpoint name.

  • Calling a built-in real-time inference service: The service is billed by token. The billing mode is the same as that of the selected foundation model. For details, see Model Service Prices.
  • ModelArts bills you for using its real-time services. For details, see Inference Deployment Billing Items.

Prerequisites

You have subscribed to a built-in service on MaaS or created a real-time service on ModelArts.

URI

POST /v1/{project_id}/maas/services/custom-endpoint/endpoint

Table 1 URI parameters

Parameter

Mandatory

Type

Description

project-id

Yes

String

Definition: Project ID. For details about how to obtain the project ID, see Obtaining a Project ID and Name.

Constraints: N/A.

Range: N/A.

Default Value: N/A.

Request Parameters

Table 2 Request header parameters

Parameter

Mandatory

Type

Description

X-Auth-Token

Yes

String

Definition: User token. The token can be obtained by calling the IAM API used to obtain a user token. The value of X-Subject-Token in the response header is the user token. For details, see Authentication.

Constraints: N/A.

Range: N/A.

Default Value: N/A.

Content-Type

Yes

String

Definition: Type of the message body. The value is fixed to application/json.

Constraints: N/A.

Range: N/A.

Default Value: N/A.

Table 3 Request body parameters

Parameter

Mandatory

Type

Description

model_name

No

String

Definition: When the resource type is custom_from_maas, set this parameter to the name of the associated model. The name is case sensitive. For details, see the ID in Obtaining the Model List (Models/GET).

Constraints: N/A.

Range: N/A.

Default Value: N/A.

source

Yes

String

Definition: Resource type.

Constraints: N/A.

Range:

  • custom_from_maas: Built-in service on the MaaS Real-Time Inference page, which is billed by MaaS.
  • custom_from_modelarts_v2: ModelArts new-version real-time service, which is billed by ModelArts.

Default Value: N/A.

rpm

No

Integer

Definition: Number of requests processed per minute.

Constraints: This parameter is required when configuring throttling for an endpoint. For endpoints created from a built-in model, the maximum throttling value cannot exceed the throttling limit of the foundation model.

Range: Each model has its own allowable range. You can view the corresponding range in ModelArts Studio (MaaS) console > Real-Time Inference > Built-in Services, under the Model flow limiting column for each model.

Default Value: N/A.

tpm

No

Integer

Definition: Tokens processed per minute (input + output).

Constraints: This parameter is required when configuring throttling for an endpoint. For endpoints created from a built-in service, the maximum throttling value cannot exceed the throttling limit of the foundation model.

Range: Each model has its own allowable range. You can view the corresponding range in ModelArts Studio (MaaS) console > Real-Time Inference > Built-in Services, under the Model flow limiting column for each model.

Default Value: N/A.

endpoint_name

Yes

String

Definition: Endpoint name entered by you.

Constraints: The name of a user must be unique.

Range: Enter 1 to 64 characters. The name must start with a letter, and can only contain letters, digits, hyphens (-), underscores (_), and dots (.).

Default Value: N/A.

remark

No

String

Definition: Description of an endpoint.

Constraints: N/A.

Range: A string of up to 256 characters.

Default Value: N/A.

region

No

String

Definition: When the resource type is custom_from_modelarts_v2, set this parameter to the region to be associated with the model. For details about how to obtain the value, see region_id in Obtaining Region Information.

Constraints: N/A.

Range: N/A.

Default Value: N/A.

infer_service_id

No

String

Definition: When the resource type is custom_from_modelarts_v2, set this parameter to the inference service ID to be associated with the model. For details about how to obtain the value, see id of InferServerInfo in Obtaining Inference Service Information.

Constraints: N/A.

Range: N/A.

Default Value: N/A.

workspace_id

No

String

Definition: When the resource type is custom_from_modelarts_v2, set this parameter to your workspace ID. For details about how to obtain the value, see ID in Obtaining Workspace Information.

Constraints: N/A.

Range: N/A.

Default Value: N/A.

moderation

No

Boolean

Definition: Specifies whether to enable content guard. Configure content guard if you have a V2 or higher account level. Content guard is enabled by default.

NOTE:

To view your account level, log in to the ModelArts Studio (MaaS) console, click the username in the upper right corner, click Basic Information, and view the level next to the account name.

Constraints: This parameter is optional. The default value is true.

Range:

  • true: Content guard blocks harmful content in inputs and outputs during model inference. Enabling it might slow down processing.
  • false: When disabled, the model relies on their native security features.

Default Value: true.

agreement_id

No

String

Definition: An agreement needs to be signed to enable or disable content guard. The protocol ID is transferred here. For details about how to obtain the value, see agreement_id in Obtaining the Latest Content Guard Disclaimer.

Constraints: This parameter is mandatory when moderation is set to false.

Range: N/A.

Default Value: N/A.

Response Parameters

Table 4 Response body parameters

Parameter

Type

Description

id

String

Definition: ID of an endpoint, which is generated after the endpoint is created.

Range: N/A.

served_model_name

String

Definition: Name of the model called by the endpoint, which consists of the foundation model and six random characters.

Range: N/A.

created_at

String

Definition: Creation time.

Range: N/A.

Table 5 Error response parameters

Parameter

Type

Description

error_msg

String

Definition: Error description.

Range: N/A.

error_code

String

Definition: Error code, indicating the error type.

Range: N/A.

Request Example

  • Use the DeepSeek-V3.1 model to create an endpoint of the custom_from_maas type. Replace the model and data files as required.
    /v1/{project_id}/maas/services/custom-endpoint/endpoint
    
    {
    "model_name": "DeepSeek-V3.1 ",
    "endpoint_name": "DeepSeek-Test",
    "remark": "DeepSeek endpoint test",
    "source": "custom_from_maas",
    "moderation": true,
    "tpm": 0,
    "rpm": 0,
    "agreement_id": "af247c14-2bee-4d78-a5e8-a419ea62b6c6"
    }
    
  • Create an endpoint of the custom_from_modelarts_v2 type. Replace the model and data files as required.
    /v1/{project_id}/maas/services/custom-endpoint/endpoint
    
    {
    	"endpoint_name": "test_endpoint",
    	"source": "custom_from_modelarts_v2",
    	"moderation": true,
    	"agreement_id": "af247c14-2bee-4d78-a5e8-a419ea62b6c6",
    	"region": "cn-southwest-2",
    	"infer_service_id": "1b27760b-f9d9-42f1-8eea-e68aba09f039",
    	"workspace_id": "0"
    }

Response Example

  • The following is a response example for using the DeepSeek-V3.1 model to create an endpoint of the custom_from_maas type.
    • Success response. Status code: 200.
      {
      "id": "c4513589-df2e-4d58-ab0c-d5a6f2******",  
      "served_model_name": "deepseek-v3.1-4ZGlnU",
      "created_at": "2025-12-09T11:32:46Z"
      }
    • Failure response. Status code: 400.
      {  
      "error_code": "ModelArts.0103", 
      "error_msg": "error reason"
       }
  • The following is a response example for creating a custom access point of the custom_from_modelarts_v2 type.
    • Success response. Status code: 200.
      {
      "id": "c4513589-df2e-4d58-ab0c-d5a6f2******",  
      "served_model_name": "dpsk-v3-vllm-a3-02-4ZGlnU",
      "created_at": "2025-12-09T11:32:46Z"
      }
    • Failure response. Status code: 400.
      {  
      "error_code": "ModelArts.0103", 
      "error_msg": "error reason"
       }

Status Codes

For details, see Status Codes.

Error Codes

For details, see Error Codes.