Skip to main content

Generate valid JSON with small LLMs using stop token control and field-by-field completion

Project description

Prefilled JSON

A Python library that helps low-parameter LLMs generate valid JSON by controlling the generation process through iterative field-by-field completion.

Small/low-parameter LLMs struggle to generate valid JSON, this library helps them out by prefilling JSON field names and using pattern matching to extract clean field values.

What this does:

  1. Controls the generation process: The library fills in JSON field names and structure
  2. Letting the LLM focus on values: The LLM only generates field values
  3. Using pattern extraction: Uses regex patterns to extract precise field values from model output
  4. Ensuring valid structure: The library maintains proper JSON syntax throughout

How It Works

The library uses a stop token approach that has proven to be the most reliable method for JSON generation. After extensive testing, the stop token driver with precise field extraction achieves 100% success rates on complex conversations, significantly outperforming streaming approaches and other alternatives.

The stop token driver:

  1. Fills in JSON field names and structure
  2. Uses stop tokens (like , and }) for precise control
  3. Extracts clean field values with robust pattern matching
  4. Handles over-generation gracefully through sophisticated text processing

This approach has been validated through comprehensive benchmarks showing 100% reliability compared to 50% success rates for alternatives like VLLM JSON mode.

Architecture

Core Components

  • StopTokenJsonDriver: Primary driver using stop tokens for reliable JSON generation (recommended)
  • JsonFieldDriver: Legacy interface for custom implementations
  • VLLM Plugin: Seamless integration with VLLM using the stop token approach

Note: The streaming approach has been deprecated due to reliability issues. The stop token driver is now the recommended approach for all use cases.

VLLM Integration

The library includes a VLLM plugin with intelligent model compatibility detection that runs a bunch of checks to see if the loaded model is compatible.

Model Compatibility

The plugin automatically detects compatible models by testing:

  • Assistant message resumption capabilities
  • Chat template flexibility
  • continue_final_message parameter support
  • Custom template acceptance

Sample Models

Chat:

# Qwen models (excellent JSON generation)
"Qwen/Qwen2.5-0.5B-Instruct"     # 0.5B - Ultra lightweight
"Qwen/Qwen2.5-1.5B-Instruct"     # 1.5B - Best balance
"Qwen/Qwen2.5-3B-Instruct"       # 3B - Production ready
"Qwen/Qwen2.5-7B-Instruct"       # 7B - Maximum performance
"Qwen/Qwen2.5-Coder-1.5B-Instruct" # 1.5B - Code/JSON specialized

# Microsoft Phi models (excellent chat flexibility)
"microsoft/phi-2"                 # 2.7B - Versatile base/chat
"microsoft/Phi-3-mini-4k-instruct" # 3.8B - Strong reasoning
"microsoft/Phi-3.5-mini-instruct" # 3.8B - Latest with 128K context

# Google Gemma models (production tested)
"google/gemma-2b-it"             # 2B - Efficient chat
"google/gemma-7b-it"             # 7B - High performance chat

Base Models

"meta-llama/Llama-3.2-1B"        # 1B - Latest Llama base
"meta-llama/Llama-3.2-3B"        # 3B - Balanced base model
"TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T" # 1.1B - Ultra efficient
"microsoft/DialoGPT-medium"      # 345M - Proven compatibility

Incompatible Models

Models with rigid chat templates that enforce strict role alternation:

  • meta-llama/Llama-2-7b-chat-hf (rigid template)
  • meta-llama/Llama-3.1-8B-Instruct (strict turn-taking)
  • Most models with very strict chat formatting

Quick VLLM Usage

from vllm import LLM
from vllm_plugin import generate_with_json_prefilled

# Initialize with a compatible model
llm = LLM(model="microsoft/Phi-3.5-mini-instruct", 
          enable_prefix_caching=True,
          disable_sliding_window=True)  # Required for some models

# Generate JSON with simple API
outputs = generate_with_json_prefilled(
    engine=llm,
    prompts=["Generate user data:"],
    json_prefilled_fields=[{"name": "string"}, {"age": "number"}]
)

print(outputs[0])
# Output: Generate user data:
# {"name": "Alice", "age": 30}

Testing Model Compatibility

from vllm import LLM
from vllm_plugin.json_prefilled_plugin import VLLMJSONPrefilledPlugin

def test_model(model_name):
    try:
        llm = LLM(model=model_name, trust_remote_code=True)
        plugin = VLLMJSONPrefilledPlugin(llm)
        print(f"{model_name} is compatible!")
        return True
    except Exception as e:
        print(f"{model_name}: {e}")
        return False

# Test any model
test_model("your-model-here")

See examples/vllm_plugin_example.py for more detailed usage examples and TESTING.md for comprehensive testing instructions.

The library uses pattern matching to extract clean field values from model output, automatically handling over-generation and ensuring valid JSON structure.

What it doesn't do

Because we focus on reliable JSON generation, some advanced features are not supported:

  1. Fancy JSON schema restrictions on field values
  2. Types other than string and number (object nesting is supported)
  3. Optional fields

Usage

VLLM Integration (Recommended)

from vllm import LLM
from vllm_plugin import generate_with_json_prefilled

# Initialize VLLM with proven model configuration
llm = LLM(model="microsoft/Phi-3.5-mini-instruct", 
          enable_prefix_caching=True,
          disable_sliding_window=True,
          trust_remote_code=True)

# Basic JSON generation
outputs = generate_with_json_prefilled(
    engine=llm,
    prompts=["Create user profile:"],
    json_prefilled_fields=[
        {"name": "string"},
        {"age": "number"},
        {"city": "string"}
    ]
)

print(outputs[0])
# Output: Create user profile:
# {"name": "Alice", "age": 30, "city": "Seattle"}

# Customer support conversation extraction
conversation_prompt = """
Customer: Hi, I need to check my order status. My order ID is 12345 and my email is john.smith@example.com
Support: I can help you with that! Let me look up your order.

Extract the customer information:
"""

customer_outputs = generate_with_json_prefilled(
    engine=llm,
    prompts=[conversation_prompt],
    json_prefilled_fields=[
        {"order_id": "string"},
        {"email": "string"},
        {"name": "string"}
    ]
)

print(customer_outputs[0])
# Output: Extract the customer information:
# {"order_id": "12345", "email": "john.smith@example.com", "name": "John Smith"}

# Complex nested structures
nested_outputs = generate_with_json_prefilled(
    engine=llm,
    prompts=["Generate business contact data:"],
    json_prefilled_fields=[
        {"company": "string"},
        {"contact": {
            "name": "string",
            "email": "string",
            "phone": "string"
        }},
        {"address": {
            "street": "string",
            "city": "string",
            "state": "string",
            "zip": "number"
        }}
    ]
)

print(nested_outputs[0])
# Output: Generate business contact data:
# {"company": "TechCorp Inc", "contact": {"name": "Alice Johnson", "email": "alice@techcorp.com", "phone": "555-0123"}, "address": {"street": "123 Business Ave", "city": "New York", "state": "NY", "zip": 10001}}

Custom Driver Usage (Advanced)

For custom LLM implementations, use the stop token driver directly:

from driver.stop_token_json_driver import StopTokenJsonDriver

# Define your generation function
def my_generate_func(prompt: str, stop_token: str = None) -> str:
    # Your LLM call here - should respect stop_token parameter
    # Example with hypothetical LLM API:
    # return my_llm.generate(prompt, stop=stop_token, max_tokens=50)
    return your_llm_response

# Configure stop tokens for your model
model_config = {
    "stop_tokens": [",", "}", "\n"],
    "stop_reliable": True
}

# Create driver and generate JSON
driver = StopTokenJsonDriver(my_generate_func, model_config)
result = driver.generate_json([{"name": "string"}, {"age": "number"}])

print(result)
# Output: {"name": "Alice", "age": 30}

How the Stop Token Approach Works

The library uses a sophisticated stop token approach combined with pattern matching for reliable JSON generation:

  1. Step 1: Sends '{"name": ' to LLM with stop token ,

    • LLM generates: '"Alice" (stops at comma)
    • Library extracts and validates: '"Alice"'
  2. Step 2: Sends '{"name": "Alice", "age": ' to LLM with stop token ,

    • LLM generates: '25' (stops at comma)
    • Library extracts and validates: 25
  3. Step 3: Sends '{"name": "Alice", "age": 25, "city": ' to LLM with no stop token (final field)

    • LLM generates: '"Seattle"'
    • Library extracts: '"Seattle"'
  4. Final result: '{"name": "Alice", "age": 25, "city": "Seattle"}'

This approach achieves 100% reliability through:

  • Precise stop token control preventing over-generation
  • Robust field value extraction handling any edge cases
  • Full conversation context preservation across field generations
  • Intelligent handling of model output variations

Features

  • Field Types: Supports "string" and "number" field types, plus nested objects
  • Stop Token Control: Precise generation control using stop tokens for reliability
  • Pattern Extraction: Robust regex-based field value extraction handling over-generation
  • 100% Reliability: Proven 100% success rate on realistic conversation benchmarks
  • Modern Model Support: Works reliably with instruction-tuned models (Phi-3.5, Qwen, Gemma, etc.)
  • Automatic Validation: Validates numeric fields and handles string quoting automatically
  • Error Handling: Clear error messages for invalid field types or malformed values
  • VLLM Integration: Seamless integration with VLLM using the stop token approach
  • Compatibility Detection: Automatic technical testing of model capabilities
  • Context Preservation: Maintains full conversation context across field generations

Performance & Reliability

Based on comprehensive benchmarks with realistic conversation scenarios:

Approach Success Rate Average Time Reliability
Prefilled-JSON (Stop Tokens) 100.0% 2.333s Perfect
VLLM JSON Mode 50.0% 1.667s Unreliable
Simple Prompting 0.0% 2.250s Fails

Key Results:

  • 100% success rate on complex multi-turn conversations (~1000 tokens)
  • Perfect JSON validity across all test scenarios
  • Robust handling of long context windows and nested structures
  • Production ready with proven reliability

See benchmark_results.md for detailed performance analysis.

Installation

pip install -e .

Development

# Install dev dependencies
pip install -e ".[dev]"

# Run tests
pytest

# Format code
black .
isort .

# Type check
mypy driver/

Field Schema Format

Each field is specified as a dictionary with exactly one key-value pair:

  • Key: The field name (string)
  • Value: The field type ("string" or "number")
fields = [
    {"username": "string"},
    {"score": "number"},
    {"active": "string"}  # booleans can be represented as strings
]

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

prefilled_json-0.2.0.tar.gz (23.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

prefilled_json-0.2.0-py3-none-any.whl (14.6 kB view details)

Uploaded Python 3

File details

Details for the file prefilled_json-0.2.0.tar.gz.

File metadata

  • Download URL: prefilled_json-0.2.0.tar.gz
  • Upload date:
  • Size: 23.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.10

File hashes

Hashes for prefilled_json-0.2.0.tar.gz
Algorithm Hash digest
SHA256 2042c9fdeeb8fd10b155e6abc1ffca7d94fa00e3b75891322a2c7079cf598035
MD5 f5914aaa9441dd8c034e9f285bed2fe6
BLAKE2b-256 c67837ed64682c3ae2eff558623c1e62e700cff33d28b422b83018c155af265f

See more details on using hashes here.

File details

Details for the file prefilled_json-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: prefilled_json-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 14.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.10

File hashes

Hashes for prefilled_json-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 9c400d77f605e1455099ed7761fe6878d035bccf584ee9e345acc3bc73631843
MD5 aee6b532ed56a32762ceb493e41b591c
BLAKE2b-256 f104ea8e046d830530c4235bb2ee83cad7a38f92e5a84e1a4dd3fffaea355638

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page