Sending a Chat Request (Chat/POST)
MaaS provides powerful real-time inference. You can deploy models on dedicated instances. This chapter describes the specifications for calling chat APIs.
Description
Parameter |
Description |
Example Value |
---|---|---|
url |
API URL for calling the model service. |
https://api.modelarts-maas.com/v1/chat/completions |
model |
model parameter in an API call |
Obtain the value from the View Call Description page. For more information, see Calling a Model Service in ModelArts Studio (MaaS). |
Creating a Chat Request
- Authentication description
MaaS inference services support API key authentication. The authentication header is in the following format:
'Authorization': 'Bearer API key of the region where the service is deployed'
- The request and response parameters are as follows.
Table 2 Request parameters Parameter
Mandatory
Default Value
Type
Description
model
Yes
None
Str
Model name for calling. For details about the value, see Table 1.
messages
Yes
N/A
Array
Input question. role shows the role, and content shows the dialog content. Example:
"messages": [ {"role": "system","content": "You are a helpful AI assistant."}, {"role": "user","content": "Which number is larger, 9.11 or 9.8?"} ]
For more information, see Table 3.
stream_options
No
None
Object
Specifies whether to display the number of used tokens during streaming output. This parameter is only valid when stream is set to True. You need to set stream_options to {"include_usage": true} to print the number of tokens used. For more information, see Table 4.
max_tokens
No
None
Int
Maximum length of a model reply.
top_k
No
-1
Int
Determines how many of the highest ranking tokens are considered. The value -1 indicates that all tokens are considered.
Decreasing the value can reduce the sampling time.
top_p
No
1.0
Float
A floating point number that controls the cumulative probability of the first several tokens to be considered.
Value range: 0 to 1
The value 1 indicates that all tokens are considered.
temperature
No
1.0
Float
A floating-point number that controls the sampling randomness. Smaller values make the model more deterministic, while larger values make it more creative. The value 0 indicates greedy sampling.
stop
No
None
None/Str/List
A list of strings used to stop generation. The output does not contain the stop strings.
For example, if the value is set to ["You," "Good"], text generation will stop once either You or Good is reached.
stream
No
False
Bool
Controls whether to enable streaming inference. The default value is False, indicating that streaming inference is disabled.
n
No
1
Int
Multiple normal results are returned.
- If beam_search is not used, the recommended value range of n is 1 ≤ n ≤10. If n is greater than 1, ensure that greedy_sample is not used for sampling, that is, top_k is greater than 1 and temperature is greater than 0.
- If beam_search is used, the recommended value range of n is 1 < n ≤ 10. If n is 1, the inference request will fail.
NOTE:
- For optimal performance, keep n at 10 or below. Large values of n can significantly slow down processing. Inadequate video RAM may cause inference requests to fail.
- You cannot set n higher than 1 for DeepSeek-R1 and DeepSeek-V3.
use_beam_search
No
False
Bool
Controls whether to use beam_search to replace sampling.
When this parameter is used, the following parameters must be configured as required:
- n: > 1
- top_p: 1.0
- top_k: -1
- temperature: 0.0
NOTE:
You cannot set n higher than 1 for DeepSeek-R1 and DeepSeek-V3.
presence_penalty
No
0.0
Float
Applies rewards or penalties based on the presence of new words in the generated text. The value range is [-2.0,2.0].
frequency_penalty
No
0.0
Float
Applies rewards or penalties based on the frequency of each word in the generated text. The value range is [-2.0,2.0].
length_penalty
No
1.0
Float
Imposes a larger penalty on longer sequences in a beam search process.
When this parameter is used, the following parameters must be configured as required:
- top_k: -1
- use_beam_search: true
- best_of: > 1
NOTE:You cannot set length_penalty for DeepSeek-R1 and DeepSeek-V3.
ignore_eos
No
False
Bool
Indicates whether to ignore EOS and continue to generate tokens.
Table 3 Request parameter messages Parameter
Mandatory
Default Value
Type
Description
role
Yes
None
Str
The following roles are supported:
- user: user
- assistant: chat assistant (model)
- system: assistant's personality
content
Yes
None
Str
- When role is set to system, this parameter indicates the AI model's personality.
{"role": "system","content": "You are a helpful AI assistant."}
- When role is set to user, this parameter indicates the question asked by the user.
{"role": "user","content": "Which number is larger, 9.11 or 9.8?"}
- When role is set to assistant, this parameter indicates the content output by the AI model.
{"role": "assistant","content": "9.11 is larger than 9.8."}
Table 4 Request parameter stream_options Parameter
Mandatory
Default Value
Type
Description
include_usage
No
true
Bool
Specifies whether the streaming response outputs token usage information.
- true: Each chunk outputs a usage field that shows the total token usage.
- false: The token usage is not displayed.
Table 5 Response parameters Parameter
Type
Description
id
Str
Unique ID of the request.
object
Str
chat.completion type: Multi-turn dialogs are returned.
created
Int
Timestamp.
model
Str
Model name for calling.
choices
Array
Model output, including the index and message parameters. In message:
- content is the model's final reply.
- reasoning content is the model's deep thinking content (for DeepSeek models only).
usage
Object
Statistics on tokens consumed by the request:
- This parameter is returned by default for non-streaming requests.
- This parameter is returned by default for streaming requests. Each chunk outputs a usage field that shows the token usage.
Parameters:
- prompt tokens: number of input tokens.
- completion tokens: number of output tokens.
- total tokens: total number of tokens.
prompt_logprobs
Float
Log probability. You can use this to measure the model's confidence in its output or to explore other options the model provides.
Example Request
- Python request example:
# coding=utf-8 import requests import json if __name__ == '__main__': url = "https://api.modelarts-maas.com/v1/chat/completions" # Send request. headers = { 'Content-Type': 'application/json', 'Authorization': 'Bearer yourApiKey' # Replace yourApiKey with the actual API key. } data = { "model": "DeepSeek-R1", "max_tokens": 20, "messages": [ {"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "Hello"} ], # Controls whether to enable streaming inference. The default value is False, indicating that streaming inference is disabled. "stream": False, # Controls whether to show the number of tokens used during streaming output. This parameter is valid only when stream is set to True. # "stream_options": { "include_usage": True }, # A floating-point number that controls the sampling randomness. Smaller values make the model more deterministic, while larger values make it more creative. The value 0 indicates greedy sampling. The default value is 1.0. "temperature": 1.0 } resp = requests.post(url, headers=headers, data=json.dumps(data), verify=False) # Print result. print(resp.status_code) print(resp.text)
- OpenAI SDK request example:
from openai import OpenAI client = OpenAI( base_url = "https://api.modelarts-maas.com/v1", api_key = "yourApiKey" ) completion = client.chat.completions.create( model="DeepSeek-R1", messages=[{"role":"user","content":"Which number is larger, 9.11 or 9.8?"}], temperature=0.6, top_p=0.7, max_tokens=4096, stream=True ) for chunk in completion: if chunk.choices[0].delta.content is not None: print(chunk.choices[0].delta.content, end="")
Response Example
{ "id":"chat-71406e38b0d248c9b284709f8435****", "object":"chat.completion", "created":1740809549, "model":"DeepSeek-R1", "choices":[ { "index":0, "message":{ "role":"assistant", "content":"\n\n Compare 9.11 and 9.8.:\n\n1. **Compare the integer part**: The integer part of both is 9, which is equal.\n2. **Compare the tenths place**:\n - The tenths place of 9.11 is **1**\n - 9.8 can be considered as 9.80, and its tenths place is **8**\n - **8 > 1**, so 9.8 is larger.\n\n**Conclusion**:\n**9.8 > 9.11**\n(When comparing decimals, line up the digits and compare them directly.)", "reasoning_content": "Well, I now need to compare 9.11 and 9.8 which is larger. First of all, I have to recall the method of comparing decimals. When comparing decimals, start by comparing the integer parts. If the integer parts are the same, compare the tenths and hundredths of the decimal parts in sequence until the larger number is determined. \n\n The integer parts of the two numbers are both 9, so they are the same. Next, compare the tenths. The tenth digit of 9.11 is 1, while the tenth digit of 9.8 is 8. This can be problematic, as some people might directly treat 9.8 as 9.80, or focus on comparing the digits in the tenths place. \n\n Now, comparing the tenths place, 9.8 has an 8, while 9.11 has a 1. Clearly, 8 is greater than 1. So, should we conclude that 9.8 is greater than 9.11? \n\n However, it is important to note that some people might incorrectly assume that the more decimal places a number has, the larger its value. But this is not true; for instance, 0.9 is greater than 0.8999. Thus, having more decimal places does not necessarily mean a larger value. \n\n Additionally, the decimal parts of the two numbers can be aligned to have the same number of digits for comparison. For instance, 9.8 can be written as 9.80, where the tenths place is 8 and the hundredths place is 0. On the other hand, for 9.11, the tenths place is 1 and the hundredths place is 1. Since 8 in the tenths place is greater than 1, 9.80 (which is 9.8) is greater than 9.11. \n\n Therefore, the final conclusion is that 9.8 is larger than 9.11.\n", "tool_calls":[] }, "logprobs":null, "finish_reason":"stop", "stop_reason":null } ], "usage":{ "prompt_tokens":21, "total_tokens":437, "completion_tokens":416 }, "prompt_logprobs":null }
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot