Updated on 2025-11-03 GMT+08:00

Qwen Third-Party VL Model

Function

This Qwen2.5-VL model provides capabilities including image recognition, precise visual positioning, text recognition and understanding, document parsing, and video comprehension.

URI

The multi-modal inference service provides two inference APIs:

  • Pangu inference API (V1 inference API)
  • OpenAI-compatible API (V2 inference API)

The authentication modes of V1 and V2 APIs are different, and the request body and response body of V1 and V2 APIs are slightly different. Inference API lists the URIs of the two APIs.

Table 1 Inference API

API Type

API URI

V1 inference API

POST /v1/{project_id}/deployments/{deployment_id}/chat/completions

V2 inference API

POST /api/v2/chat/completions

Additional parameters are required for the URI of the V1 inference API. For details about the parameters, see URI parameters.

Table 2 URI parameters

Parameter

Mandatory

Type

Description

project_id

Yes

String

Definition

Project ID. For details about how to obtain a project ID, see Obtaining the Project ID.

Constraints

N/A

Range

N/A

Default Value

N/A

deployment_id

Yes

String

Definition

Model deployment ID. For details about how to obtain the deployment ID, see Obtaining the Model Deployment ID.

Constraints

N/A

Range

N/A

Default Value

N/A

Request Parameters

The authentication modes of the V1 and V2 inference APIs are different, and the request and response parameters are also different. The details are as follows:

Header parameters

  1. The V1 inference API supports both token-based authentication and API key authentication. The request header parameters for the two authentication modes are as follows:
    Table 3 Request header parameters (token-based authentication)

    Parameter

    Mandatory

    Type

    Description

    X-Auth-Token

    Yes

    String

    Definition

    User token.

    Used to obtain the permission required to call APIs. The token is the value of X-Subject-Token in the response header in Token-based Authentication.

    Constraints

    N/A

    Range

    N/A

    Default Value

    N/A

    Content-Type

    Yes

    String

    Definition

    MIME type of the request body.

    Constraints

    N/A

    Range

    N/A

    Default Value

    application/json

    • API key authentication lists the request header parameters for request header parameters (API key authentication).
      Table 4 Request header parameters (API key authentication)

      Parameter

      Mandatory

      Type

      Description

      X-Apig-AppCode

      Yes

      String

      Definition

      API key.

      Used to obtain the permission required to call APIs. The API key is the value of X-Apig-AppCode in the response header in API key authentication.

      Constraints

      N/A

      Range

      N/A

      Default Value

      N/A

      Content-Type

      Yes

      String

      Definition

      MIME type of the request body.

      Constraints

      N/A

      Range

      N/A

      Default Value

      application/json

  2. The V2 inference API supports only API key authentication. For details about the request header parameters, see Table 5.
    Table 5 Request header parameters (OpenAI-compatible API key authentication)

    Parameter

    Mandatory

    Type

    Description

    Authorization

    Yes

    String

    Definition

    A character string consisting of Bearer and the API key obtained from created application access. A space is required between Bearer and the API key. An example is Bearer d59******9C3.

    Constraints

    N/A

    Range

    N/A

    Default Value

    N/A

    Content-Type

    Yes

    String

    Definition

    MIME type of the request body.

    Constraints

    N/A

    Range

    N/A

    Default Value

    application/json

Request body parameters

The request body parameters of the V1 and V2 inference APIs are the same, as described in Table 6.

Table 6 Request body parameters

Parameter

Mandatory

Type

Description

messages

Yes

Array of message objects

Definition

Multi-turn dialogue question-answer pairs.

Constraints

N/A

Range

Array length: 1–20

Default Value

N/A

model

V1 inference API: No

V2 inference API: Yes

String

Definition

Name of the inference service model, which is the value of Deployed_Model specified during inference service deployment. You can obtain the value on the inference service details page. This parameter is mandatory for the V2 inference API and is not required for the V1 inference API.

Constraints

N/A

Range

The value contains 1 to 64 characters.

Default Value

N/A

stream

No

Boolean

Definition

Whether to enable streaming mode.

Constraints

N/A

Range

  • true: Enable streaming mode.
  • false: Disable streaming mode.

Default Value

false

temperature

No

Float

Definition

Used to control the diversity and creativity of the generated text. The value ranges from 0 to 1, and 0 indicates the lowest diversity. A lower temperature produces more deterministic outputs. A higher temperature, for example, 0.9, produces more creative outputs. temperature is one of the key parameters that affect the output quality and diversity of an LLM. Other parameters, like top_p, can also be used to adjust the behavior and preferences of the LLM. However, do not use these parameters at the same time.

Constraints

N/A

Range

Minimum value: 0

Maximum value: 1

Default Value

Default value: 0.3

top_p

No

Float

Definition

An alternative to sampling with temperature, called nucleus sampling, where the model only takes into account the tokens with the probability mass determined by the top_p parameter. You are advised to change the value of top_p or temperature, but do not change both. You are advised to change the value of top_p or temperature to adjust the generating text tendency. Do not change both two parameters.

Constraints

N/A

Range

[0, 1]

Default Value

Default value: 0

max_tokens

No

Integer

Definition

Used to control the length and quality of chat replies. Generally, a large max_tokens value can generate a long and complete reply, but may also increase the risk of generating irrelevant or duplicate content. A small max_tokens value can generate short and concise replies, but may also cause incomplete or discontinuous content. Therefore, you need to select a proper max_tokens value based on scenarios and requirements.

Constraints

The minimum value is 1.

Range

N/A

Default Value

N/A

presence_penalty

No

Float

Definition

Penalty given to repetition in the generated text. Positive presence penalty values penalize new tokens based on whether they have appeared in the text so far, increasing the model's likelihood of talking about new topics. This parameter helps to make the output more creative and diverse by reducing the likelihood of repetition.

Constraints

N/A

Range

Minimum value: -2

Maximum value: 2

Default Value

Default value: 0

frequency_penalty

No

Float

Definition

How the model penalizes new tokens based on their frequency to decrease the likelihood of repeated words and phrases in the output.

Constraints

N/A

Range

Minimum value: -2

Maximum value: 2

Default Value

0

Table 7 message

Parameter

Mandatory

Type

Description

role

V1 inference API: No

V2 inference API: Yes

String

Definition

Role in a dialogue. The value can be system, user, or assistant.

If you want the model to answer questions as a specific persona, set role to system. If you do not use a specific persona, set role to user. In a dialogue request, you need to set role only once. In a multi-turn dialogue, set role to user for the scenario when a user enters prompts and to assistant for inference results.

Constraints

N/A

Range

[system, user, assistant]

Default Value

N/A

content

Yes

Array of content objects

Definition

Q&A pair text.

Constraints

Minimum length: 1

Range

N/A

Default Value

N/A

Table 8 content

Parameter

Mandatory

Type

Description

type

Yes

String

Definition

Input content type.

Constraints

N/A

Range

  • text: indicates the content type is text.
  • image_url: indicates the content type is image.

Default Value

N/A

text

No

String

Definition

Q&A pair text.

Constraints

Minimum length: 1

This parameter is mandatory when type is set to text.

Range

N/A

Default Value

N/A

image_url

No

text and image_url cannot be empty at the same time.

image_url object

Definition

Images in question-answer pairs.

Constraints

This parameter is mandatory when type is set to image_url.

Range

N/A

Default Value

N/A

Table 9 image_url

Parameter

Mandatory

Type

Description

url

Yes

String

Definition

A character string consisting of the identifier and Base64 code of the image.

Constraints

The value must be in the format of data:image/jpg;base64,{base64_str}. base64_str indicates the Base64 code of an image, for example, ......qkf/z.

Range

N/A

Default Value

N/A

Response Parameters

Non-streaming response (with stream set to false or not specified)

Status code: 200

Table 10 Response body parameters

Parameter

Type

Description

id

String

Definition

Response ID

Constraints

N/A

Range

N/A

Default Value

N/A

created

Integer

Definition

Response time

Constraints

N/A

Range

N/A

Default Value

N/A

choices

Array of ChatChoice objects

Definition

Model's replies

Constraints

N/A

Range

N/A

Default Value

N/A

usage

CompletionUsage object

Definition

Token statistics

Constraints

N/A

Range

N/A

Default Value

N/A

Table 11 ChatChoice

Parameter

Type

Description

index

Integer

Definition

Reply index

Constraints

N/A

Range

N/A

Default Value

N/A

message

Array of MessageItem objects

Definition

Model's response

Constraints

N/A

Range

N/A

Default Value

N/A

Table 12 MessageItem

Parameter

Type

Description

role

String

Definition

Role

Constraints

N/A

Range

N/A

Default Value

N/A

content

String

Definition

Model's response

Constraints

N/A

Range

N/A

Default Value

N/A

Table 13 CompletionUsage

Parameter

Type

Description

completion_tokens

Number

Definition

Number of tokens in the generated completion

Constraints

N/A

Range

N/A

Default Value

N/A

prompt_tokens

Number

Definition

Number of tokens in the provided prompt

Constraints

N/A

Range

N/A

Default Value

N/A

total_tokens

Number

Definition

Total number of tokens used, including both the prompt and completion tokens

Constraints

N/A

Range

N/A

Default Value

N/A

Streaming response (with stream set to true)

Status code: 200

Table 14 Data units output in streaming mode

Parameter

Type

Description

data

CompletionStreamResponse

Definition

If stream is set to true, messages generated by the model will be returned in streaming mode. The generated text is returned incrementally. Each data field contains a part of the generated text until all data fields are returned.

Constraints

N/A

Range

N/A

Default Value

N/A

Table 15 CompletionStreamResponse

Parameter

Type

Description

id

String

Definition

Unique identifier of the dialogue.

Constraints

N/A

Range

N/A

Default Value

N/A

created

Integer

Definition

Unix timestamp (in seconds) when the chat was created. The timestamps of each chunk in the streaming response are the same.

Constraints

N/A

Range

N/A

Default Value

N/A

model

String

Definition

Name of the model that generates the completion.

Constraints

N/A

Range

N/A

Default Value

N/A

object

String

Definition

Object type, which is chat.completion.chunk.

Constraints

N/A

Range

N/A

Default Value

N/A

choices

ChatCompletionResponseStreamChoice

Definition

A list of completion choices generated by the model.

Constraints

N/A

Range

N/A

Default Value

N/A

usage

UsageInfo

Definition

Token usage for the dialogue. Model usage, using which you can prevent the model from generating excessive tokens.

Constraints

N/A

Range

N/A

Default Value

N/A

Table 16 ChatCompletionResponseStreamChoice

Parameter

Type

Description

index

Integer

Definition

Index of the completion in the completion choice list generated by the model.

Constraints

N/A

Range

N/A

Default Value

N/A

finish_reason

String

Definition

Reason why the model stops generating tokens.

Constraints

N/A

Range

[stop, length, content_filter, tool_calls, insufficient_system_resource]

  • stop: The model stops generating text after the task is complete or when a pre-defined stop sequence is encountered.
  • length: The output length reaches the context length limit of the model or the max_tokens limit.
  • content_filter: The output content is filtered due to filter conditions.
  • tool_calls: The model determines to call an external tool (function/API) to complete the task.
  • insufficient_system_resource: Generation is interrupted because system inference resources are insufficient.

Default Value

N/A

delta

DeltaMessage

Definition

Completion increment returned by the V2 inference API in streaming mode.

This parameter is not included in the response body of the V1 inference API.

Constraints

N/A

Range

N/A

Default Value

N/A

message

DeltaMessage

Definition

Completion increment returned by the V1 inference API in streaming mode.

This parameter is not included in the response body of the V2 inference API.

Constraints

N/A

Range

N/A

Default Value

N/A

Table 17 DeltaMessage

Parameter

Type

Description

role

String

Definition

Role that generates the message.

Constraints

N/A

Range

N/A

Default Value

N/A

content

String

Definition

Content of the completion increment.

Constraints

N/A

Range

N/A

Default Value

N/A

reasoning_content

String

Definition

Reasoning steps that led to the final conclusion (thinking process of the model).

Constraints

This parameter applies only to models that support the thinking process.

Range

N/A

Default Value

N/A

Table 18 UsageInfo

Parameter

Type

Description

prompt_tokens

Integer

Definition

Number of tokens in the prompt and the default persona.

Constraints

N/A

Range

N/A

Default Value

N/A

completion_tokens

Integer

Definition

Number of tokens in the completion returned by the inference service.

Constraints

N/A

Range

N/A

Default Value

N/A

total_tokens

Integer

Definition

Total number of consumed tokens.

Constraints

N/A

Range

N/A

Default Value

N/A

Status code: 400

If an error is reported, the error information returned by the V1 inference API complies with Huawei Cloud specifications. The V2 inference API transparently transmits the error information returned by the inference service, which usually complies with the OpenAI API format.

Table 19 Response body parameters

Parameter

Type

Description

error_msg

String

Error message

error_code

String

Error code

details

List<Object>

Error information returned by the inference service. The format and content are determined by the inference service.

Table 20 Body parameters in the response error information of the V2 inference API

Parameter

Type

Description

error

ErrorResp

Error message

id

String

Request ID

Table 21 ErrorResp

Parameter

Type

Description

code

String

Error code

type

String

Error type

message

String

Error details

Example Request

Interface URL and message header:

V1 inference API
POST https://mastudio.cn-southwest-2.myhuaweicloud.com/v1/{project_id}/deployments/{deployment_id}/chat/completions 

Request Header:   
Content-Type: application/json   
X-Auth-Token: MIINRwYJKoZIhvcNAQcCoIINODCCDTQCAQExDTALBglghkgBZQMEAgEwgguVBgkqhkiG...      

V2 inference API
POST https://mastudio.cn-southwest-2.myhuaweicloud.com/api/v2/chat/completions

Request Header:   
Content-Type: application/json   
Authorization: Bearer 201ca68f-45f9-4e19-8fa4-831e... 

Request body example:

{
    "temperature": 0.5,
    "model": "Qwen25-vl-32b", // This parameter is required only for the V2 inference API.
    "messages": [
        {
            "role":"user", // This parameter is required only for the V2 inference API.
            "content": [
                {
                    "type": "image_url",
                    "image_url": {
                        "url": "......qVKgqkf/Z"
                    }
                },
                {
                    "type": "text",
                    "text": "What's in the image?"
                }
            ]
        }
    ],
    "presence_penalty": 0.5,
    "frequency_penalty": 0.5,
    "max_tokens": 2048,
    "stream": false
}

Example of a multi-turn dialogue request:

{
	"temperature": 0.5,
	"model": "Qwen25-vl-32b", // This parameter is required only for the V2 inference API.
	"messages": [{
			"role": "user", // This parameter is required only for the V2 inference API.
			"content": [{
					"type": "image_url",
					"image_url": {
						"url": "......qVKgqkf/Z"
					}
				},
				{
					"type": "text",
					"text": "What's in the image?"
				}
			]
		},
		{
			"role": "assistant",
			"content": [{
				"type": "text",
				"text" : "This is a picture of a military jet fighter flying in the sky. You can see the size of the plane, its wingspan, and engine, as well as the amount of air it's moving through. The plane has a black fuselage and a large propeller in the front. The clouds in the background suggest that the plane is flying at high altitudes, possibly in the middle of the voyage."
			}]
		},
		{
			"role": "user", // This parameter is required only for the V2 inference API.
			"content": [{
					"type": "image_url",
					"image_url": {
						"url": "......qVKgqkf/Z"
					}
				},
				{
					"type": "text",
					"text": "What are the differences between this image and the first image?"
				}
			]
		}
	],
	"presence_penalty": 0.5,
	"frequency_penalty": 0.5,
	"max_tokens": 2048,
	"stream": false
}

Example Response

Status code: 200

Non-streaming Q&A response example

{
    "id": "chat-38ea6118a5d14e38b7d592211bbd31a6",
    "object": "chat.completion",
    "created": 1749894390,
    "model": "Qwen25-vl-32b",
    "choices": [
        {
            "index": 0,
            "message": {
                "role": "assistant",
                "reasoning_content": null,
                "content" : "This is a picture of a military jet fighter flying in the sky. You can see the size of the plane, its wingspan, and engine, as well as the amount of air it's moving through. The plane has a black fuselage and a large propeller in the front. The clouds in the background suggest that the plane is flying at high altitudes, possibly in the middle of the voyage."
                "tool_calls": [
                ]
            },
            "logprobs": null,
            "finish_reason": "stop",
            "stop_reason": null
        }
    ],
    "usage": {
        "prompt_tokens": 3189,
        "total_tokens": 3236,
        "completion_tokens": 47
    },
    "prompt_logprobs": null
}

Streaming Q&A response example

Response of the V1 inference API
data:{"id":"chat-59170add0fd1427bbca0388431058d45","object":"chat.completion.chunk","created":1745725837,"model":"Qwen25-vl-32b","choices":[{"index":0,"logprobs":null,"finish_reason":null,"message":{"role":"assistant"}}],"usage":{"prompt_tokens":64,"total_tokens":64,"completion_tokens":0}}

data:{"id":"chat-59170add0fd1427bbca0388431058d45","object":"chat.completion.chunk","created":1745725837,"model":"Qwen25-vl-32b","choices": [ {"index":0,"logprobs":null,"finish_reason":null,"message":{"content":"In"}}],"usage":{"prompt_tokens":64,"total_tokens":65,"completion_tokens":1}}

data:{"id":"chat-59170add0fd1427bbca0388431058d45","object":"chat.completion.chunk","created":1745725837,"model":"Qwen25-vl-32b","choices": ["{"index":0,"logprobs":null,"finish_reason":"stop","stop_reason":null,"message":{"content":"this"}}, "usage": {"prompt_tokens": 64, "total_tokens": 73, "completion_tokens": 2}}

data:{"id":"chat-59170add0fd1427bbca0388431058d45","object":"chat.completion.chunk","created":1745725837,"model":"Qwen25-vl-32b","choices": [{"index":0,"logprobs":null,"finish_reason":"stop","stop_reason":null,"message":{"content":" image"}}]","usage":{"prompt_tokens":64,"total_tokens":73,"completion_tokens":3}}

......

data:{"id":"chat-59170add0fd1427bbca0388431058d45","object":"chat.completion.chunk","created":1745725837,"model":"pQwen25-vl-32b","choices":[],"usage":{"prompt_tokens":64,"total_tokens":73,"completion_tokens":9}}

event:{"usage":{"completionTokens":9,"promptTokens":64,"totalTokens":73},"tokens":64,"token_number":9}

data:[DONE]


Response of the V2 inference API
data:{"id":"chat-59170add0fd1427bbca0388431058d45","object":"chat.completion.chunk","created":1745725837,"model":"Qwen25-vl-32b","choices":[{"index":0,"logprobs":null,"finish_reason":null,"delta":{"role":"assistant"}}],"usage":{"prompt_tokens":64,"total_tokens":64,"completion_tokens":0}}

data:{"id":"chat-59170add0fd1427bbca0388431058d45","object":"chat.completion.chunk","created":1745725837,"model":"Qwen25-vl-32b","choices": [ {"index":0,"logprobs":null,"finish_reason":null,"delta":{"content":"In"}}] "usage":{"prompt_tokens":64,"total_tokens":65,"completion_tokens":1}}

ata:{"id":"chat-59170add0fd1427bbca0388431058d45","object":"chat.completion.chunk","created":1745725837,"model":"Qwen25-vl-32b","choices":[{"index":0,"logprobs":null,"finish_reason":"stop","stop_reason":null,"delta":{"content":"this"}}],"usage":{"prompt_tokens":64,"total_tokens":73,"completion_tokens":2}}

data:{"id":"chat-59170add0fd1427bbca0388431058d45","object":"chat.completion.chunk","created":1745725837,"model":"Qwen25-vl-32b","choices": [ {"index":0,"logprobs":null,"finish_reason":"stop","stop_reason":null,"delta":{"content":"image"}}]],"usage":{"prompt_tokens":64,"total_tokens":73,"completion_tokens":3}}

......

data:{"id":"chat-59170add0fd1427bbca0388431058d45","object":"chat.completion.chunk","created":1745725837,"model":"Qwen25-vl-32b","choices":[],"usage":{"prompt_tokens":64,"total_tokens":73,"completion_tokens":9}}

event:{"usage":{"completionTokens":9,"promptTokens":64,"totalTokens":73},"tokens":64,"token_number":9}

data:[DONE]

Status Codes

For details, see Status Codes.

Error Codes

For details, see Error Codes.