Help Center/ PanguLargeModels/ API Reference/ API/ Model Inference APIs/ Third-Party Models/ Qwen Third-Party VL Model

Updated on 2025-11-03 GMT+08:00

View PDF

Qwen Third-Party VL Model

Function

This Qwen2.5-VL model provides capabilities including image recognition, precise visual positioning, text recognition and understanding, document parsing, and video comprehension.

URI

The multi-modal inference service provides two inference APIs:

Pangu inference API (V1 inference API)
OpenAI-compatible API (V2 inference API)

The authentication modes of V1 and V2 APIs are different, and the request body and response body of V1 and V2 APIs are slightly different. Inference API lists the URIs of the two APIs.

**Table 1** Inference API
API Type	API URI
V1 inference API	POST /v1/{project_id}/deployments/{deployment_id}/chat/completions
V2 inference API	POST /api/v2/chat/completions

Additional parameters are required for the URI of the V1 inference API. For details about the parameters, see URI parameters.

**Table 2** URI parameters
Parameter	Mandatory	Type	Description
project_id	Yes	String	Definition Project ID. For details about how to obtain a project ID, see Obtaining the Project ID. Constraints N/A Range N/A Default Value N/A
deployment_id	Yes	String	Definition Model deployment ID. For details about how to obtain the deployment ID, see Obtaining the Model Deployment ID. Constraints N/A Range N/A Default Value N/A

Request Parameters

The authentication modes of the V1 and V2 inference APIs are different, and the request and response parameters are also different. The details are as follows:

Header parameters

The V1 inference API supports both token-based authentication and API key authentication. The request header parameters for the two authentication modes are as follows:

Request header parameters (token-based authentication) lists the request header parameters for token-based authentication.

**Table 3** Request header parameters (token-based authentication)
Parameter	Mandatory	Type	Description
X-Auth-Token	Yes	String	Definition User token. Used to obtain the permission required to call APIs. The token is the value of X-Subject-Token in the response header in Token-based Authentication. Constraints N/A Range N/A Default Value N/A
Content-Type	Yes	String	Definition MIME type of the request body. Constraints N/A Range N/A Default Value application/json

API key authentication lists the request header parameters for request header parameters (API key authentication).

**Table 4** Request header parameters (API key authentication)
Parameter	Mandatory	Type	Description
X-Apig-AppCode	Yes	String	Definition API key. Used to obtain the permission required to call APIs. The API key is the value of X-Apig-AppCode in the response header in API key authentication. Constraints N/A Range N/A Default Value N/A
Content-Type	Yes	String	Definition MIME type of the request body. Constraints N/A Range N/A Default Value application/json

The V2 inference API supports only API key authentication. For details about the request header parameters, see Table 5.

**Table 5** Request header parameters (OpenAI-compatible API key authentication)
Parameter	Mandatory	Type	Description
Authorization	Yes	String	Definition A character string consisting of Bearer and the API key obtained from created application access. A space is required between Bearer and the API key. An example is Bearer d59**9C3. Constraints N/A Range N/A Default Value N/A
Content-Type	Yes	String	Definition MIME type of the request body. Constraints N/A Range N/A Default Value application/json

Request body parameters

The request body parameters of the V1 and V2 inference APIs are the same, as described in Table 6.

**Table 6** Request body parameters
Parameter	Mandatory	Type	Description
messages	Yes	Array of message objects	Definition Multi-turn dialogue question-answer pairs. Constraints N/A Range Array length: 1–20 Default Value N/A
model	V1 inference API: No V2 inference API: Yes	String	Definition Name of the inference service model, which is the value of Deployed_Model specified during inference service deployment. You can obtain the value on the inference service details page. This parameter is mandatory for the V2 inference API and is not required for the V1 inference API. Constraints N/A Range The value contains 1 to 64 characters. Default Value N/A
stream	No	Boolean	Definition Whether to enable streaming mode. Constraints N/A Range true: Enable streaming mode. false: Disable streaming mode. Default Value false
temperature	No	Float	Definition Used to control the diversity and creativity of the generated text. The value ranges from 0 to 1, and 0 indicates the lowest diversity. A lower temperature produces more deterministic outputs. A higher temperature, for example, 0.9, produces more creative outputs. temperature is one of the key parameters that affect the output quality and diversity of an LLM. Other parameters, like top_p, can also be used to adjust the behavior and preferences of the LLM. However, do not use these parameters at the same time. Constraints N/A Range Minimum value: 0 Maximum value: 1 Default Value Default value: 0.3
top_p	No	Float	Definition An alternative to sampling with temperature, called nucleus sampling, where the model only takes into account the tokens with the probability mass determined by the top_p parameter. You are advised to change the value of top_p or temperature, but do not change both. You are advised to change the value of top_p or temperature to adjust the generating text tendency. Do not change both two parameters. Constraints N/A Range [0, 1] Default Value Default value: 0
max_tokens	No	Integer	Definition Used to control the length and quality of chat replies. Generally, a large max_tokens value can generate a long and complete reply, but may also increase the risk of generating irrelevant or duplicate content. A small max_tokens value can generate short and concise replies, but may also cause incomplete or discontinuous content. Therefore, you need to select a proper max_tokens value based on scenarios and requirements. Constraints The minimum value is 1. Range N/A Default Value N/A
presence_penalty	No	Float	Definition Penalty given to repetition in the generated text. Positive presence penalty values penalize new tokens based on whether they have appeared in the text so far, increasing the model's likelihood of talking about new topics. This parameter helps to make the output more creative and diverse by reducing the likelihood of repetition. Constraints N/A Range Minimum value: -2 Maximum value: 2 Default Value Default value: 0
frequency_penalty	No	Float	Definition How the model penalizes new tokens based on their frequency to decrease the likelihood of repeated words and phrases in the output. Constraints N/A Range Minimum value: -2 Maximum value: 2 Default Value 0

**Table 7** message
Parameter	Mandatory	Type	Description
role	V1 inference API: No V2 inference API: Yes	String	Definition Role in a dialogue. The value can be system, user, or assistant. If you want the model to answer questions as a specific persona, set role to system. If you do not use a specific persona, set role to user. In a dialogue request, you need to set role only once. In a multi-turn dialogue, set role to user for the scenario when a user enters prompts and to assistant for inference results. Constraints N/A Range [system, user, assistant] Default Value N/A
content	Yes	Array of content objects	Definition Q&A pair text. Constraints Minimum length: 1 Range N/A Default Value N/A

**Table 8** content
Parameter	Mandatory	Type	Description
type	Yes	String	Definition Input content type. Constraints N/A Range text: indicates the content type is text. image_url: indicates the content type is image. Default Value N/A
text	No	String	Definition Q&A pair text. Constraints Minimum length: 1 This parameter is mandatory when type is set to text. Range N/A Default Value N/A
image_url	No text and image_url cannot be empty at the same time.	image_url object	Definition Images in question-answer pairs. Constraints This parameter is mandatory when type is set to image_url. Range N/A Default Value N/A

**Table 9** image_url
Parameter	Mandatory	Type	Description
url	Yes	String	Definition A character string consisting of the identifier and Base64 code of the image. Constraints The value must be in the format of data:image/jpg;base64,{base64_str}. base64_str indicates the Base64 code of an image, for example, data:image/jpg;base64,/9j/4AAQSKZJRg......qkf/z. Range N/A Default Value N/A

Response Parameters

Non-streaming response (with stream set to false or not specified)

Status code: 200

**Table 10** Response body parameters
Parameter	Type	Description
id	String	Definition Response ID Constraints N/A Range N/A Default Value N/A
created	Integer	Definition Response time Constraints N/A Range N/A Default Value N/A
choices	Array of ChatChoice objects	Definition Model's replies Constraints N/A Range N/A Default Value N/A
usage	CompletionUsage object	Definition Token statistics Constraints N/A Range N/A Default Value N/A

**Table 11** ChatChoice
Parameter	Type	Description
index	Integer	Definition Reply index Constraints N/A Range N/A Default Value N/A
message	Array of MessageItem objects	Definition Model's response Constraints N/A Range N/A Default Value N/A

**Table 12** MessageItem
Parameter	Type	Description
role	String	Definition Role Constraints N/A Range N/A Default Value N/A
content	String	Definition Model's response Constraints N/A Range N/A Default Value N/A

**Table 13** CompletionUsage
Parameter	Type	Description
completion_tokens	Number	Definition Number of tokens in the generated completion Constraints N/A Range N/A Default Value N/A
prompt_tokens	Number	Definition Number of tokens in the provided prompt Constraints N/A Range N/A Default Value N/A
total_tokens	Number	Definition Total number of tokens used, including both the prompt and completion tokens Constraints N/A Range N/A Default Value N/A

Streaming response (with stream set to true)

Status code: 200

**Table 14** Data units output in streaming mode
Parameter	Type	Description
data	CompletionStreamResponse	Definition If stream is set to true, messages generated by the model will be returned in streaming mode. The generated text is returned incrementally. Each data field contains a part of the generated text until all data fields are returned. Constraints N/A Range N/A Default Value N/A

**Table 15** CompletionStreamResponse
Parameter	Type	Description
id	String	Definition Unique identifier of the dialogue. Constraints N/A Range N/A Default Value N/A
created	Integer	Definition Unix timestamp (in seconds) when the chat was created. The timestamps of each chunk in the streaming response are the same. Constraints N/A Range N/A Default Value N/A
model	String	Definition Name of the model that generates the completion. Constraints N/A Range N/A Default Value N/A
object	String	Definition Object type, which is chat.completion.chunk. Constraints N/A Range N/A Default Value N/A
choices	ChatCompletionResponseStreamChoice	Definition A list of completion choices generated by the model. Constraints N/A Range N/A Default Value N/A
usage	UsageInfo	Definition Token usage for the dialogue. Model usage, using which you can prevent the model from generating excessive tokens. Constraints N/A Range N/A Default Value N/A

**Table 16** ChatCompletionResponseStreamChoice
Parameter	Type	Description
index	Integer	Definition Index of the completion in the completion choice list generated by the model. Constraints N/A Range N/A Default Value N/A
finish_reason	String	Definition Reason why the model stops generating tokens. Constraints N/A Range [stop, length, content_filter, tool_calls, insufficient_system_resource] stop: The model stops generating text after the task is complete or when a pre-defined stop sequence is encountered. length: The output length reaches the context length limit of the model or the max_tokens limit. content_filter: The output content is filtered due to filter conditions. tool_calls: The model determines to call an external tool (function/API) to complete the task. insufficient_system_resource: Generation is interrupted because system inference resources are insufficient. Default Value N/A
delta	DeltaMessage	Definition Completion increment returned by the V2 inference API in streaming mode. This parameter is not included in the response body of the V1 inference API. Constraints N/A Range N/A Default Value N/A
message	DeltaMessage	Definition Completion increment returned by the V1 inference API in streaming mode. This parameter is not included in the response body of the V2 inference API. Constraints N/A Range N/A Default Value N/A

**Table 17** DeltaMessage
Parameter	Type	Description
role	String	Definition Role that generates the message. Constraints N/A Range N/A Default Value N/A
content	String	Definition Content of the completion increment. Constraints N/A Range N/A Default Value N/A
reasoning_content	String	Definition Reasoning steps that led to the final conclusion (thinking process of the model). Constraints This parameter applies only to models that support the thinking process. Range N/A Default Value N/A

**Table 18** UsageInfo
Parameter	Type	Description
prompt_tokens	Integer	Definition Number of tokens in the prompt and the default persona. Constraints N/A Range N/A Default Value N/A
completion_tokens	Integer	Definition Number of tokens in the completion returned by the inference service. Constraints N/A Range N/A Default Value N/A
total_tokens	Integer	Definition Total number of consumed tokens. Constraints N/A Range N/A Default Value N/A

Status code: 400

If an error is reported, the error information returned by the V1 inference API complies with Huawei Cloud specifications. The V2 inference API transparently transmits the error information returned by the inference service, which usually complies with the OpenAI API format.

**Table 19** Response body parameters
Parameter	Type	Description
error_msg	String	Error message
error_code	String	Error code
details	List<Object>	Error information returned by the inference service. The format and content are determined by the inference service.

**Table 20** Body parameters in the response error information of the V2 inference API
Parameter	Type	Description
error	ErrorResp	Error message
id	String	Request ID

**Table 21** ErrorResp
Parameter	Type	Description
code	String	Error code
type	String	Error type
message	String	Error details

Example Request

Interface URL and message header:

V1 inference API
POST https://mastudio.cn-southwest-2.myhuaweicloud.com/v1/{project_id}/deployments/{deployment_id}/chat/completions 

Request Header:   
Content-Type: application/json   
X-Auth-Token: MIINRwYJKoZIhvcNAQcCoIINODCCDTQCAQExDTALBglghkgBZQMEAgEwgguVBgkqhkiG...      

V2 inference API
POST https://mastudio.cn-southwest-2.myhuaweicloud.com/api/v2/chat/completions

Request Header:   
Content-Type: application/json   
Authorization: Bearer 201ca68f-45f9-4e19-8fa4-831e...

Request body example:

{
    "temperature": 0.5,
    "model": "Qwen25-vl-32b", // This parameter is required only for the V2 inference API.
    "messages": [
        {
            "role":"user", // This parameter is required only for the V2 inference API.
            "content": [
                {
                    "type": "image_url",
                    "image_url": {
                        "url": "data:image/jpg;base64,/9j/4AAQSkZJRgABAQAAAQABAA......qVKgqkf/Z"
                    }
                },
                {
                    "type": "text",
                    "text": "What's in the image?"
                }
            ]
        }
    ],
    "presence_penalty": 0.5,
    "frequency_penalty": 0.5,
    "max_tokens": 2048,
    "stream": false
}

Example of a multi-turn dialogue request:

{
	"temperature": 0.5,
	"model": "Qwen25-vl-32b", // This parameter is required only for the V2 inference API.
	"messages": [{
			"role": "user", // This parameter is required only for the V2 inference API.
			"content": [{
					"type": "image_url",
					"image_url": {
						"url": "data:image/jpg;base64,/9j/4AAQSkZJRgABAQAAAQABAA......qVKgqkf/Z"
					}
				},
				{
					"type": "text",
					"text": "What's in the image?"
				}
			]
		},
		{
			"role": "assistant",
			"content": [{
				"type": "text",
				"text" : "This is a picture of a military jet fighter flying in the sky. You can see the size of the plane, its wingspan, and engine, as well as the amount of air it's moving through. The plane has a black fuselage and a large propeller in the front. The clouds in the background suggest that the plane is flying at high altitudes, possibly in the middle of the voyage."
			}]
		},
		{
			"role": "user", // This parameter is required only for the V2 inference API.
			"content": [{
					"type": "image_url",
					"image_url": {
						"url": "data:image/jpg;base64,/9j/4AAQSkZJRgABAQAAAQABAA......qVKgqkf/Z"
					}
				},
				{
					"type": "text",
					"text": "What are the differences between this image and the first image?"
				}
			]
		}
	],
	"presence_penalty": 0.5,
	"frequency_penalty": 0.5,
	"max_tokens": 2048,
	"stream": false
}

Example Response

Status code: 200

Non-streaming Q&A response example

{
    "id": "chat-38ea6118a5d14e38b7d592211bbd31a6",
    "object": "chat.completion",
    "created": 1749894390,
    "model": "Qwen25-vl-32b",
    "choices": [
        {
            "index": 0,
            "message": {
                "role": "assistant",
                "reasoning_content": null,
                "content" : "This is a picture of a military jet fighter flying in the sky. You can see the size of the plane, its wingspan, and engine, as well as the amount of air it's moving through. The plane has a black fuselage and a large propeller in the front. The clouds in the background suggest that the plane is flying at high altitudes, possibly in the middle of the voyage."
                "tool_calls": [
                ]
            },
            "logprobs": null,
            "finish_reason": "stop",
            "stop_reason": null
        }
    ],
    "usage": {
        "prompt_tokens": 3189,
        "total_tokens": 3236,
        "completion_tokens": 47
    },
    "prompt_logprobs": null
}

Streaming Q&A response example

Response of the V1 inference API
data:{"id":"chat-59170add0fd1427bbca0388431058d45","object":"chat.completion.chunk","created":1745725837,"model":"Qwen25-vl-32b","choices":[{"index":0,"logprobs":null,"finish_reason":null,"message":{"role":"assistant"}}],"usage":{"prompt_tokens":64,"total_tokens":64,"completion_tokens":0}}

data:{"id":"chat-59170add0fd1427bbca0388431058d45","object":"chat.completion.chunk","created":1745725837,"model":"Qwen25-vl-32b","choices": [ {"index":0,"logprobs":null,"finish_reason":null,"message":{"content":"In"}}],"usage":{"prompt_tokens":64,"total_tokens":65,"completion_tokens":1}}

data:{"id":"chat-59170add0fd1427bbca0388431058d45","object":"chat.completion.chunk","created":1745725837,"model":"Qwen25-vl-32b","choices": ["{"index":0,"logprobs":null,"finish_reason":"stop","stop_reason":null,"message":{"content":"this"}}, "usage": {"prompt_tokens": 64, "total_tokens": 73, "completion_tokens": 2}}

data:{"id":"chat-59170add0fd1427bbca0388431058d45","object":"chat.completion.chunk","created":1745725837,"model":"Qwen25-vl-32b","choices": [{"index":0,"logprobs":null,"finish_reason":"stop","stop_reason":null,"message":{"content":" image"}}]","usage":{"prompt_tokens":64,"total_tokens":73,"completion_tokens":3}}

......

data:{"id":"chat-59170add0fd1427bbca0388431058d45","object":"chat.completion.chunk","created":1745725837,"model":"pQwen25-vl-32b","choices":[],"usage":{"prompt_tokens":64,"total_tokens":73,"completion_tokens":9}}

event:{"usage":{"completionTokens":9,"promptTokens":64,"totalTokens":73},"tokens":64,"token_number":9}

data:[DONE]


Response of the V2 inference API
data:{"id":"chat-59170add0fd1427bbca0388431058d45","object":"chat.completion.chunk","created":1745725837,"model":"Qwen25-vl-32b","choices":[{"index":0,"logprobs":null,"finish_reason":null,"delta":{"role":"assistant"}}],"usage":{"prompt_tokens":64,"total_tokens":64,"completion_tokens":0}}

data:{"id":"chat-59170add0fd1427bbca0388431058d45","object":"chat.completion.chunk","created":1745725837,"model":"Qwen25-vl-32b","choices": [ {"index":0,"logprobs":null,"finish_reason":null,"delta":{"content":"In"}}] "usage":{"prompt_tokens":64,"total_tokens":65,"completion_tokens":1}}

ata:{"id":"chat-59170add0fd1427bbca0388431058d45","object":"chat.completion.chunk","created":1745725837,"model":"Qwen25-vl-32b","choices":[{"index":0,"logprobs":null,"finish_reason":"stop","stop_reason":null,"delta":{"content":"this"}}],"usage":{"prompt_tokens":64,"total_tokens":73,"completion_tokens":2}}

data:{"id":"chat-59170add0fd1427bbca0388431058d45","object":"chat.completion.chunk","created":1745725837,"model":"Qwen25-vl-32b","choices": [ {"index":0,"logprobs":null,"finish_reason":"stop","stop_reason":null,"delta":{"content":"image"}}]],"usage":{"prompt_tokens":64,"total_tokens":73,"completion_tokens":3}}

......

data:{"id":"chat-59170add0fd1427bbca0388431058d45","object":"chat.completion.chunk","created":1745725837,"model":"Qwen25-vl-32b","choices":[],"usage":{"prompt_tokens":64,"total_tokens":73,"completion_tokens":9}}

event:{"usage":{"completionTokens":9,"promptTokens":64,"totalTokens":73},"tokens":64,"token_number":9}

data:[DONE]