Structured Outputs

What are Structured Outputs?

Structured outputs refer to the generation of text that strictly adheres to a specified format provided by the user in foundation model requests. These formats can include JSON, SQL, or other structured data formats. The guided decoding feature supports the generation of text that conforms to the user-specified format.

What is Guided Decoding?

Guided decoding is a strategy used to generate text by providing additional context or constraints to guide the model towards producing results that are more in line with the desired outcome.

For example, when starting a service with OpenAI, you can configure the guided_json parameter to use a JSON Schema to guide the generation process.

JSON Schema uses special keywords to describe data structures, such as title, type, properties, required, and definitions. It guides the model to generate a JSON object containing user information by specifying object attributes, types, and formats.

Its advantages mainly include:

Context guidance: By providing specific format templates, the model can better understand the attributes of each field and filter out irrelevant information.
Constrained generation: You can set certain constraints, such as keywords, topics, or styles, to ensure that the generated content is more consistent and relevant.
Improved quality: The generated text typically adheres strictly to the user-specified format requirements, ensuring logical and rigorous content.

Principles

Guided decoding works by converting the user-provided content format templates (such as JSON Schema, CFG language, and regular expressions) into a Finite State Machine (FSM). The FSM is used to guide and filter the output tokens. The state transitions within the FSM define the content of each field in the formatted output. By using probabilistic interventions, tokens that do not meet the specified conditions are excluded, thus normalizing the generated text.

Using Guided Decoding for Offline Inference

To use guided-decoding for offline inference, configure GuidedDecodingParams in the SamplingParams class.

The following is an example of using the offline mode:

from vllm import LLM, SamplingParams
from vllm.sampling_params import GuidedDecodingParams

MODEL_NAME = ${MODEL_NAME}
llm = LLM(model=MODEL_NAME)

guided_decoding_params = GuidedDecodingParams(choice=["Positive", "Negative"])
sampling_params = SamplingParams(guided_decoding=guided_decoding_params)
outputs = llm.generate(
    prompts="Classify this sentiment: vLLM is wonderful!",
    sampling_params=sampling_params,
)
print(outputs[0].outputs[0].text)

MODEL_NAME indicates the model path.

Startup Parameters

**Table 1** Parameters
Parameter	Type	Range	Description
--guided-decoding-backend	str	{xgrammar}	Enables the guided generation feature. Currently, the guided generation feature only supports the xgrammar backend.

Using Guided Decoding for Online Inference

For details about how to start the inference service, see Starting an LLM-powered Inference Service. A startup parameter needs to be added.

When guided decoding is used for online inference, the request contains the guided_json architecture. For details, see the following code:

curl -X POST http://${docker_ip}:8080/v1/completions \
-H "Content-Type: application/json" \
-d '{
    "model": "${container_model_path}",
    "prompt": "Meet our valorous character, named Knight, who has reached the age of 32. Clad in impenetrable plate armor, Knight is well-prepared for any battle. Armed with a trusty sword and boasting a strength score of 90, this character stands as a formidable warrior on the field.Please provide details for this character, including their Name, Age, preferred Armor, Weapon, and Strength",
    "max_tokens": 200,
    "temperature": 0,
    "guided_json": "{\"title\": \"Character\", \"type\": \"object\", \"properties\": {\"name\": {\"title\": \"Name\", \"maxLength\": 10, \"type\": \"string\"}, \"age\": {\"title\": \"Age\", \"type\": \"integer\"}, \"armor\": {\"$ref\": \"#/definitions/Armor\"}, \"weapon\": {\"$ref\": \"#/definitions/Weapon\"}, \"strength\": {\"title\": \"Strength\", \"type\": \"integer\"}}, \"required\": [\"name\", \"age\", \"armor\", \"weapon\", \"strength\"], \"definitions\": {\"Armor\": {\"title\": \"Armor\", \"description\": \"An enumeration.\", \"enum\": [\"leather\", \"chainmail\", \"plate\"], \"type\": \"string\"}, \"Weapon\": {\"title\": \"Weapon\", \"description\": \"An enumeration.\", \"enum\": [\"sword\", \"axe\", \"mace\", \"spear\", \"bow\", \"crossbow\"], \"type\": \"string\"}}}"
}'

Parent topic: Usage of Key Inference Features

Previous topic: Chunked Prefill

Next topic: Tool Calling