Qwen Third-Party VL Model
Function
This Qwen2.5-VL model provides capabilities including image recognition, precise visual positioning, text recognition and understanding, document parsing, and video comprehension.
URI
The multi-modal inference service provides two inference APIs:
- Pangu inference API (V1 inference API)
- OpenAI-compatible API (V2 inference API)
The authentication modes of V1 and V2 APIs are different, and the request body and response body of V1 and V2 APIs are slightly different. Inference API lists the URIs of the two APIs.
|
API Type |
API URI |
|---|---|
|
V1 inference API |
POST /v1/{project_id}/deployments/{deployment_id}/chat/completions |
|
V2 inference API |
POST /api/v2/chat/completions |
Additional parameters are required for the URI of the V1 inference API. For details about the parameters, see URI parameters.
|
Parameter |
Mandatory |
Type |
Description |
|---|---|---|---|
|
project_id |
Yes |
String |
Definition Project ID. For details about how to obtain a project ID, see Obtaining the Project ID. Constraints N/A Range N/A Default Value N/A |
|
deployment_id |
Yes |
String |
Definition Model deployment ID. For details about how to obtain the deployment ID, see Obtaining the Model Deployment ID. Constraints N/A Range N/A Default Value N/A |
Request Parameters
The authentication modes of the V1 and V2 inference APIs are different, and the request and response parameters are also different. The details are as follows:
Header parameters
- The V1 inference API supports both token-based authentication and API key authentication. The request header parameters for the two authentication modes are as follows:
- Request header parameters (token-based authentication) lists the request header parameters for token-based authentication.
Table 3 Request header parameters (token-based authentication) Parameter
Mandatory
Type
Description
X-Auth-Token
Yes
String
Definition
User token.
Used to obtain the permission required to call APIs. The token is the value of X-Subject-Token in the response header in Token-based Authentication.
Constraints
N/A
Range
N/A
Default Value
N/A
Content-Type
Yes
String
Definition
MIME type of the request body.
Constraints
N/A
Range
N/A
Default Value
application/json
- API key authentication lists the request header parameters for request header parameters (API key authentication).
Table 4 Request header parameters (API key authentication) Parameter
Mandatory
Type
Description
X-Apig-AppCode
Yes
String
Definition
API key.
Used to obtain the permission required to call APIs. The API key is the value of X-Apig-AppCode in the response header in API key authentication.
Constraints
N/A
Range
N/A
Default Value
N/A
Content-Type
Yes
String
Definition
MIME type of the request body.
Constraints
N/A
Range
N/A
Default Value
application/json
- The V2 inference API supports only API key authentication. For details about the request header parameters, see Table 5.
Table 5 Request header parameters (OpenAI-compatible API key authentication) Parameter
Mandatory
Type
Description
Authorization
Yes
String
Definition
A character string consisting of Bearer and the API key obtained from created application access. A space is required between Bearer and the API key. An example is Bearer d59******9C3.
Constraints
N/A
Range
N/A
Default Value
N/A
Content-Type
Yes
String
Definition
MIME type of the request body.
Constraints
N/A
Range
N/A
Default Value
application/json
Request body parameters
The request body parameters of the V1 and V2 inference APIs are the same, as described in Table 6.
|
Parameter |
Mandatory |
Type |
Description |
|---|---|---|---|
|
messages |
Yes |
Array of message objects |
Definition Multi-turn dialogue question-answer pairs. Constraints N/A Range Array length: 1–20 Default Value N/A |
|
model |
V1 inference API: No V2 inference API: Yes |
String |
Definition Name of the inference service model, which is the value of Deployed_Model specified during inference service deployment. You can obtain the value on the inference service details page. This parameter is mandatory for the V2 inference API and is not required for the V1 inference API. Constraints N/A Range The value contains 1 to 64 characters. Default Value N/A |
|
stream |
No |
Boolean |
Definition Whether to enable streaming mode. Constraints N/A Range
Default Value false |
|
temperature |
No |
Float |
Definition Used to control the diversity and creativity of the generated text. The value ranges from 0 to 1, and 0 indicates the lowest diversity. A lower temperature produces more deterministic outputs. A higher temperature, for example, 0.9, produces more creative outputs. temperature is one of the key parameters that affect the output quality and diversity of an LLM. Other parameters, like top_p, can also be used to adjust the behavior and preferences of the LLM. However, do not use these parameters at the same time. Constraints N/A Range Minimum value: 0 Maximum value: 1 Default Value Default value: 0.3 |
|
top_p |
No |
Float |
Definition An alternative to sampling with temperature, called nucleus sampling, where the model only takes into account the tokens with the probability mass determined by the top_p parameter. You are advised to change the value of top_p or temperature, but do not change both. You are advised to change the value of top_p or temperature to adjust the generating text tendency. Do not change both two parameters. Constraints N/A Range [0, 1] Default Value Default value: 0 |
|
max_tokens |
No |
Integer |
Definition Used to control the length and quality of chat replies. Generally, a large max_tokens value can generate a long and complete reply, but may also increase the risk of generating irrelevant or duplicate content. A small max_tokens value can generate short and concise replies, but may also cause incomplete or discontinuous content. Therefore, you need to select a proper max_tokens value based on scenarios and requirements. Constraints The minimum value is 1. Range N/A Default Value N/A |
|
presence_penalty |
No |
Float |
Definition Penalty given to repetition in the generated text. Positive presence penalty values penalize new tokens based on whether they have appeared in the text so far, increasing the model's likelihood of talking about new topics. This parameter helps to make the output more creative and diverse by reducing the likelihood of repetition. Constraints N/A Range Minimum value: -2 Maximum value: 2 Default Value Default value: 0 |
|
frequency_penalty |
No |
Float |
Definition How the model penalizes new tokens based on their frequency to decrease the likelihood of repeated words and phrases in the output. Constraints N/A Range Minimum value: -2 Maximum value: 2 Default Value 0 |
|
Parameter |
Mandatory |
Type |
Description |
|---|---|---|---|
|
role |
V1 inference API: No V2 inference API: Yes |
String |
Definition Role in a dialogue. The value can be system, user, or assistant. If you want the model to answer questions as a specific persona, set role to system. If you do not use a specific persona, set role to user. In a dialogue request, you need to set role only once. In a multi-turn dialogue, set role to user for the scenario when a user enters prompts and to assistant for inference results. Constraints N/A Range [system, user, assistant] Default Value N/A |
|
content |
Yes |
Array of content objects |
Definition Q&A pair text. Constraints Minimum length: 1 Range N/A Default Value N/A |
|
Parameter |
Mandatory |
Type |
Description |
|---|---|---|---|
|
type |
Yes |
String |
Definition Input content type. Constraints N/A Range
Default Value N/A |
|
text |
No |
String |
Definition Q&A pair text. Constraints Minimum length: 1 This parameter is mandatory when type is set to text. Range N/A Default Value N/A |
|
image_url |
No text and image_url cannot be empty at the same time. |
image_url object |
Definition Images in question-answer pairs. Constraints This parameter is mandatory when type is set to image_url. Range N/A Default Value N/A |
|
Parameter |
Mandatory |
Type |
Description |
|---|---|---|---|
|
url |
Yes |
String |
Definition A character string consisting of the identifier and Base64 code of the image. Constraints The value must be in the format of data:image/jpg;base64,{base64_str}. base64_str indicates the Base64 code of an image, for example, ......qkf/z. Range N/A Default Value N/A |
Response Parameters
Non-streaming response (with stream set to false or not specified)
Status code: 200
|
Parameter |
Type |
Description |
|---|---|---|
|
id |
String |
Definition Response ID Constraints N/A Range N/A Default Value N/A |
|
created |
Integer |
Definition Response time Constraints N/A Range N/A Default Value N/A |
|
choices |
Array of ChatChoice objects |
Definition Model's replies Constraints N/A Range N/A Default Value N/A |
|
usage |
CompletionUsage object |
Definition Token statistics Constraints N/A Range N/A Default Value N/A |
|
Parameter |
Type |
Description |
|---|---|---|
|
index |
Integer |
Definition Reply index Constraints N/A Range N/A Default Value N/A |
|
message |
Array of MessageItem objects |
Definition Model's response Constraints N/A Range N/A Default Value N/A |
|
Parameter |
Type |
Description |
|---|---|---|
|
role |
String |
Definition Role Constraints N/A Range N/A Default Value N/A |
|
content |
String |
Definition Model's response Constraints N/A Range N/A Default Value N/A |
|
Parameter |
Type |
Description |
|---|---|---|
|
completion_tokens |
Number |
Definition Number of tokens in the generated completion Constraints N/A Range N/A Default Value N/A |
|
prompt_tokens |
Number |
Definition Number of tokens in the provided prompt Constraints N/A Range N/A Default Value N/A |
|
total_tokens |
Number |
Definition Total number of tokens used, including both the prompt and completion tokens Constraints N/A Range N/A Default Value N/A |
Streaming response (with stream set to true)
Status code: 200
|
Parameter |
Type |
Description |
|---|---|---|
|
data |
Definition If stream is set to true, messages generated by the model will be returned in streaming mode. The generated text is returned incrementally. Each data field contains a part of the generated text until all data fields are returned. Constraints N/A Range N/A Default Value N/A |
|
Parameter |
Type |
Description |
|---|---|---|
|
id |
String |
Definition Unique identifier of the dialogue. Constraints N/A Range N/A Default Value N/A |
|
created |
Integer |
Definition Unix timestamp (in seconds) when the chat was created. The timestamps of each chunk in the streaming response are the same. Constraints N/A Range N/A Default Value N/A |
|
model |
String |
Definition Name of the model that generates the completion. Constraints N/A Range N/A Default Value N/A |
|
object |
String |
Definition Object type, which is chat.completion.chunk. Constraints N/A Range N/A Default Value N/A |
|
choices |
ChatCompletionResponseStreamChoice |
Definition A list of completion choices generated by the model. Constraints N/A Range N/A Default Value N/A |
|
usage |
UsageInfo |
Definition Token usage for the dialogue. Model usage, using which you can prevent the model from generating excessive tokens. Constraints N/A Range N/A Default Value N/A |
|
Parameter |
Type |
Description |
|---|---|---|
|
index |
Integer |
Definition Index of the completion in the completion choice list generated by the model. Constraints N/A Range N/A Default Value N/A |
|
finish_reason |
String |
Definition Reason why the model stops generating tokens. Constraints N/A Range [stop, length, content_filter, tool_calls, insufficient_system_resource]
Default Value N/A |
|
delta |
DeltaMessage |
Definition Completion increment returned by the V2 inference API in streaming mode. This parameter is not included in the response body of the V1 inference API. Constraints N/A Range N/A Default Value N/A |
|
message |
DeltaMessage |
Definition Completion increment returned by the V1 inference API in streaming mode. This parameter is not included in the response body of the V2 inference API. Constraints N/A Range N/A Default Value N/A |
|
Parameter |
Type |
Description |
|---|---|---|
|
role |
String |
Definition Role that generates the message. Constraints N/A Range N/A Default Value N/A |
|
content |
String |
Definition Content of the completion increment. Constraints N/A Range N/A Default Value N/A |
|
reasoning_content |
String |
Definition Reasoning steps that led to the final conclusion (thinking process of the model). Constraints This parameter applies only to models that support the thinking process. Range N/A Default Value N/A |
|
Parameter |
Type |
Description |
|---|---|---|
|
prompt_tokens |
Integer |
Definition Number of tokens in the prompt and the default persona. Constraints N/A Range N/A Default Value N/A |
|
completion_tokens |
Integer |
Definition Number of tokens in the completion returned by the inference service. Constraints N/A Range N/A Default Value N/A |
|
total_tokens |
Integer |
Definition Total number of consumed tokens. Constraints N/A Range N/A Default Value N/A |
Status code: 400
If an error is reported, the error information returned by the V1 inference API complies with Huawei Cloud specifications. The V2 inference API transparently transmits the error information returned by the inference service, which usually complies with the OpenAI API format.
|
Parameter |
Type |
Description |
|---|---|---|
|
error_msg |
String |
Error message |
|
error_code |
String |
Error code |
|
details |
List<Object> |
Error information returned by the inference service. The format and content are determined by the inference service. |
|
Parameter |
Type |
Description |
|---|---|---|
|
error |
Error message |
|
|
id |
String |
Request ID |
Example Request
Interface URL and message header:
V1 inference API
POST https://mastudio.cn-southwest-2.myhuaweicloud.com/v1/{project_id}/deployments/{deployment_id}/chat/completions
Request Header:
Content-Type: application/json
X-Auth-Token: MIINRwYJKoZIhvcNAQcCoIINODCCDTQCAQExDTALBglghkgBZQMEAgEwgguVBgkqhkiG...
V2 inference API
POST https://mastudio.cn-southwest-2.myhuaweicloud.com/api/v2/chat/completions
Request Header:
Content-Type: application/json
Authorization: Bearer 201ca68f-45f9-4e19-8fa4-831e...
Request body example:
{
"temperature": 0.5,
"model": "Qwen25-vl-32b", // This parameter is required only for the V2 inference API.
"messages": [
{
"role":"user", // This parameter is required only for the V2 inference API.
"content": [
{
"type": "image_url",
"image_url": {
"url": "......qVKgqkf/Z"
}
},
{
"type": "text",
"text": "What's in the image?"
}
]
}
],
"presence_penalty": 0.5,
"frequency_penalty": 0.5,
"max_tokens": 2048,
"stream": false
}
Example of a multi-turn dialogue request:
{
"temperature": 0.5,
"model": "Qwen25-vl-32b", // This parameter is required only for the V2 inference API.
"messages": [{
"role": "user", // This parameter is required only for the V2 inference API.
"content": [{
"type": "image_url",
"image_url": {
"url": "......qVKgqkf/Z"
}
},
{
"type": "text",
"text": "What's in the image?"
}
]
},
{
"role": "assistant",
"content": [{
"type": "text",
"text" : "This is a picture of a military jet fighter flying in the sky. You can see the size of the plane, its wingspan, and engine, as well as the amount of air it's moving through. The plane has a black fuselage and a large propeller in the front. The clouds in the background suggest that the plane is flying at high altitudes, possibly in the middle of the voyage."
}]
},
{
"role": "user", // This parameter is required only for the V2 inference API.
"content": [{
"type": "image_url",
"image_url": {
"url": "......qVKgqkf/Z"
}
},
{
"type": "text",
"text": "What are the differences between this image and the first image?"
}
]
}
],
"presence_penalty": 0.5,
"frequency_penalty": 0.5,
"max_tokens": 2048,
"stream": false
}
Example Response
Status code: 200
Non-streaming Q&A response example
{
"id": "chat-38ea6118a5d14e38b7d592211bbd31a6",
"object": "chat.completion",
"created": 1749894390,
"model": "Qwen25-vl-32b",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"reasoning_content": null,
"content" : "This is a picture of a military jet fighter flying in the sky. You can see the size of the plane, its wingspan, and engine, as well as the amount of air it's moving through. The plane has a black fuselage and a large propeller in the front. The clouds in the background suggest that the plane is flying at high altitudes, possibly in the middle of the voyage."
"tool_calls": [
]
},
"logprobs": null,
"finish_reason": "stop",
"stop_reason": null
}
],
"usage": {
"prompt_tokens": 3189,
"total_tokens": 3236,
"completion_tokens": 47
},
"prompt_logprobs": null
}
Streaming Q&A response example
Response of the V1 inference API
data:{"id":"chat-59170add0fd1427bbca0388431058d45","object":"chat.completion.chunk","created":1745725837,"model":"Qwen25-vl-32b","choices":[{"index":0,"logprobs":null,"finish_reason":null,"message":{"role":"assistant"}}],"usage":{"prompt_tokens":64,"total_tokens":64,"completion_tokens":0}}
data:{"id":"chat-59170add0fd1427bbca0388431058d45","object":"chat.completion.chunk","created":1745725837,"model":"Qwen25-vl-32b","choices": [ {"index":0,"logprobs":null,"finish_reason":null,"message":{"content":"In"}}],"usage":{"prompt_tokens":64,"total_tokens":65,"completion_tokens":1}}
data:{"id":"chat-59170add0fd1427bbca0388431058d45","object":"chat.completion.chunk","created":1745725837,"model":"Qwen25-vl-32b","choices": ["{"index":0,"logprobs":null,"finish_reason":"stop","stop_reason":null,"message":{"content":"this"}}, "usage": {"prompt_tokens": 64, "total_tokens": 73, "completion_tokens": 2}}
data:{"id":"chat-59170add0fd1427bbca0388431058d45","object":"chat.completion.chunk","created":1745725837,"model":"Qwen25-vl-32b","choices": [{"index":0,"logprobs":null,"finish_reason":"stop","stop_reason":null,"message":{"content":" image"}}]","usage":{"prompt_tokens":64,"total_tokens":73,"completion_tokens":3}}
......
data:{"id":"chat-59170add0fd1427bbca0388431058d45","object":"chat.completion.chunk","created":1745725837,"model":"pQwen25-vl-32b","choices":[],"usage":{"prompt_tokens":64,"total_tokens":73,"completion_tokens":9}}
event:{"usage":{"completionTokens":9,"promptTokens":64,"totalTokens":73},"tokens":64,"token_number":9}
data:[DONE]
Response of the V2 inference API
data:{"id":"chat-59170add0fd1427bbca0388431058d45","object":"chat.completion.chunk","created":1745725837,"model":"Qwen25-vl-32b","choices":[{"index":0,"logprobs":null,"finish_reason":null,"delta":{"role":"assistant"}}],"usage":{"prompt_tokens":64,"total_tokens":64,"completion_tokens":0}}
data:{"id":"chat-59170add0fd1427bbca0388431058d45","object":"chat.completion.chunk","created":1745725837,"model":"Qwen25-vl-32b","choices": [ {"index":0,"logprobs":null,"finish_reason":null,"delta":{"content":"In"}}] "usage":{"prompt_tokens":64,"total_tokens":65,"completion_tokens":1}}
ata:{"id":"chat-59170add0fd1427bbca0388431058d45","object":"chat.completion.chunk","created":1745725837,"model":"Qwen25-vl-32b","choices":[{"index":0,"logprobs":null,"finish_reason":"stop","stop_reason":null,"delta":{"content":"this"}}],"usage":{"prompt_tokens":64,"total_tokens":73,"completion_tokens":2}}
data:{"id":"chat-59170add0fd1427bbca0388431058d45","object":"chat.completion.chunk","created":1745725837,"model":"Qwen25-vl-32b","choices": [ {"index":0,"logprobs":null,"finish_reason":"stop","stop_reason":null,"delta":{"content":"image"}}]],"usage":{"prompt_tokens":64,"total_tokens":73,"completion_tokens":3}}
......
data:{"id":"chat-59170add0fd1427bbca0388431058d45","object":"chat.completion.chunk","created":1745725837,"model":"Qwen25-vl-32b","choices":[],"usage":{"prompt_tokens":64,"total_tokens":73,"completion_tokens":9}}
event:{"usage":{"completionTokens":9,"promptTokens":64,"totalTokens":73},"tokens":64,"token_number":9}
data:[DONE]
Status Codes
For details, see Status Codes.
Error Codes
For details, see Error Codes.
Feedback
Was this page helpful?
Provide feedbackThank you very much for your feedback. We will continue working to improve the documentation.See the reply and handling status in My Cloud VOC.
For any further questions, feel free to contact us through the chatbot.
Chatbot