Generate valid JSON with small LLMs using iterative field-by-field completion

These details have not been verified by PyPI

Project links

Project description

Prefilled JSON

A Python library that helps low-parameter LLMs generate valid JSON by controlling the generation process through iterative field-by-field completion.

Small/low-parameter LLMs struggle to generate valid JSON, this library helps them out by prefilling JSON field names and using pattern matching to extract clean field values.

What this does:

Controls the generation process: The library fills in JSON field names and structure
Letting the LLM focus on values: The LLM only generates field values
Using pattern extraction: Uses regex patterns to extract precise field values from model output
Ensuring valid structure: The library maintains proper JSON syntax throughout

How It Works

The library uses a streaming approach with pattern matching for modern LLMs. Stop tokens are not 100% reliable across models, using streaming output is the only way I was able to make this 100% reliable. Accordingly, models may over-generate content, which this library then backtracks through and extracts precise field values using regex patterns. This approach works reliably with state-of-the-art models like Qwen, Phi-3.5, and Gemma. It should work fine even with highly quantized low-param models.

Architecture

Core Components

StreamingJsonFieldDriver: Pattern-matching based JSON generation that works with modern models
JsonFieldDriver: Traditional stop-token based driver (for custom implementations)
VLLM Plugin: Seamless integration with VLLM using the streaming approach

VLLM Integration

The library includes a VLLM plugin with intelligent model compatibility detection that runs a bunch of checks to see if the loaded model is compatible.

Model Compatibility

The plugin automatically detects compatible models by testing:

Assistant message resumption capabilities
Chat template flexibility
continue_final_message parameter support
Custom template acceptance

Sample Models

Chat:

# Qwen models (excellent JSON generation)
"Qwen/Qwen2.5-0.5B-Instruct"     # 0.5B - Ultra lightweight
"Qwen/Qwen2.5-1.5B-Instruct"     # 1.5B - Best balance
"Qwen/Qwen2.5-3B-Instruct"       # 3B - Production ready
"Qwen/Qwen2.5-7B-Instruct"       # 7B - Maximum performance
"Qwen/Qwen2.5-Coder-1.5B-Instruct" # 1.5B - Code/JSON specialized

# Microsoft Phi models (excellent chat flexibility)
"microsoft/phi-2"                 # 2.7B - Versatile base/chat
"microsoft/Phi-3-mini-4k-instruct" # 3.8B - Strong reasoning
"microsoft/Phi-3.5-mini-instruct" # 3.8B - Latest with 128K context

# Google Gemma models (production tested)
"google/gemma-2b-it"             # 2B - Efficient chat
"google/gemma-7b-it"             # 7B - High performance chat

Base Models

"meta-llama/Llama-3.2-1B"        # 1B - Latest Llama base
"meta-llama/Llama-3.2-3B"        # 3B - Balanced base model
"TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T" # 1.1B - Ultra efficient
"microsoft/DialoGPT-medium"      # 345M - Proven compatibility

Incompatible Models

Models with rigid chat templates that enforce strict role alternation:

meta-llama/Llama-2-7b-chat-hf (rigid template)
meta-llama/Llama-3.1-8B-Instruct (strict turn-taking)
Most models with very strict chat formatting

Quick VLLM Usage

from vllm import LLM
from vllm_plugin import generate_with_json_prefilled

# Initialize with a compatible model (auto-detected)
llm = LLM(model="Qwen/Qwen2.5-1.5B-Instruct", enable_prefix_caching=True)

# Generate JSON with simple API
outputs = generate_with_json_prefilled(
    engine=llm,
    prompts=["Generate user data:"],
    json_prefilled_fields=[{"name": "string"}, {"age": "number"}]
)

print(outputs[0])
# Output: Generate user data:
# {"name": "Alice", "age": 30}

Testing Model Compatibility

from vllm import LLM
from vllm_plugin.json_prefilled_plugin import VLLMJSONPrefilledPlugin

def test_model(model_name):
    try:
        llm = LLM(model=model_name, trust_remote_code=True)
        plugin = VLLMJSONPrefilledPlugin(llm)
        print(f"✅ {model_name} is compatible!")
        return True
    except Exception as e:
        print(f"❌ {model_name}: {e}")
        return False

# Test any model
test_model("your-model-here")

See examples/vllm_plugin_example.py for more detailed usage examples and TESTING.md for comprehensive testing instructions.

The library uses pattern matching to extract clean field values from model output, automatically handling over-generation and ensuring valid JSON structure.

What it doesn't do

Because we focus on reliable JSON generation, some advanced features are not supported:

Fancy JSON schema restrictions on field values
Types other than string and number (object nesting is supported)
Optional fields

Usage

VLLM Integration (Recommended)

from vllm import LLM
from vllm_plugin import generate_with_json_prefilled

# Initialize VLLM with any compatible model
llm = LLM(model="Qwen/Qwen2.5-1.5B-Instruct", trust_remote_code=True)

# Generate JSON with simple API
outputs = generate_with_json_prefilled(
    engine=llm,
    prompts=["Create user profile:"],
    json_prefilled_fields=[
        {"name": "string"},
        {"age": "number"},
        {"city": "string"}
    ]
)

print(outputs[0])
# Output: Create user profile:
# {"name": "Alice", "age": 30, "city": "Seattle"}

# Nested object usage
nested_outputs = generate_with_json_prefilled(
    engine=llm,
    prompts=["Generate complete user data:"],
    json_prefilled_fields=[
        {"name": "string"},
        {"address": {
            "street": "string",
            "city": "string",
            "zip": "number"
        }},
        {"age": "number"}
    ]
)

print(nested_outputs[0])
# Output: Generate complete user data:
# {"name": "Alice", "address": {"street": "123 Main St", "city": "Seattle", "zip": 98101}, "age": 30}

Custom Driver Usage (Advanced)

For custom LLM implementations, you can use the core driver directly:

from driver.json_driver import JsonFieldDriver

# Define your generation function
def my_generate_func(prompt: str, stop_token: str = None) -> str:
    # Your LLM call here - stop_token parameter available but not required
    return your_llm_response

# Create driver and generate JSON
driver = JsonFieldDriver(generate=my_generate_func)
result = driver.generate_json([{"name": "string"}, {"age": "number"}])

How the Streaming Approach Works

The library uses a sophisticated pattern-matching approach that works reliably with modern instruction-tuned models:

Step 1: Sends '{"name": ' to LLM (no stop tokens)
- LLM generates: '"Alice", "age": 30, "city": "Seattle", "email": "alice@example.com"'
- Library extracts: '"Alice"' using regex pattern matching
Step 2: Sends '{"name": "Alice", "age": ' to LLM
- LLM generates: '25, "city": "Seattle", "active": true'
- Library extracts: '25' as the numeric value
Step 3: Sends '{"name": "Alice", "age": 25, "city": ' to LLM
- LLM generates: '"Seattle"}, "country": "USA"'
- Library extracts: '"Seattle"' using pattern matching
Final result: '{"name": "Alice", "age": 25, "city": "Seattle"}'

This approach works with modern instruction-tuned models because it doesn't fight their tendency to over-generate - instead, it extracts exactly what's needed using robust pattern matching.

Features

Field Types: Supports "string" and "number" field types, plus nested objects
Pattern Extraction: Robust regex-based field value extraction from over-generated content
Modern Model Support: Works reliably with instruction-tuned models (Qwen, Phi-3.5, Gemma, etc.)
Automatic Validation: Validates numeric fields and handles string quoting automatically
Error Handling: Clear error messages for invalid field types or malformed values
VLLM Integration: Seamless integration with VLLM using streaming approach
Compatibility Detection: Automatic technical testing of model capabilities

Installation

pip install -e .

Development

# Install dev dependencies
pip install -e ".[dev]"

# Run tests
pytest

# Format code
black .
isort .

# Type check
mypy driver/

Field Schema Format

Each field is specified as a dictionary with exactly one key-value pair:

Key: The field name (string)
Value: The field type ("string" or "number")

fields = [
    {"username": "string"},
    {"score": "number"},
    {"active": "string"}  # booleans can be represented as strings
]

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.2.2

Jul 2, 2025

0.2.1

Jul 2, 2025

0.2.0

Jul 2, 2025

This version

0.1.0

Jun 25, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

prefilled_json-0.1.0.tar.gz (21.3 kB view details)

Uploaded Jun 25, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

prefilled_json-0.1.0-py3-none-any.whl (13.6 kB view details)

Uploaded Jun 25, 2025 Python 3

File details

Details for the file prefilled_json-0.1.0.tar.gz.

File metadata

Download URL: prefilled_json-0.1.0.tar.gz
Upload date: Jun 25, 2025
Size: 21.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.10.10

File hashes

Hashes for prefilled_json-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`8525aea14058bfedd23d08bc101a5521178b46040b523cfe79b739f29904cb0c`
MD5	`3c026546f0b24326d7adcaed8fe3b9da`
BLAKE2b-256	`c624b0b7c035c5f8dc377a1e20f0195fdb024b3ff11e7af376858ad86b013baf`

See more details on using hashes here.

File details

Details for the file prefilled_json-0.1.0-py3-none-any.whl.

File metadata

Download URL: prefilled_json-0.1.0-py3-none-any.whl
Upload date: Jun 25, 2025
Size: 13.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.10.10

File hashes

Hashes for prefilled_json-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`fcc1c19936bc44f323cbbb2d90a7788469853bfecf6da12200386cd67639805b`
MD5	`bc25251572cbba687dc73ca5cea670d9`
BLAKE2b-256	`1fe916aec1e64c243551e7b55999811994e7247b7e6487f3b3a92067e231b244`

See more details on using hashes here.

prefilled-json 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Prefilled JSON

How It Works

Architecture

Core Components

VLLM Integration

Model Compatibility

Sample Models

Incompatible Models

Quick VLLM Usage

Testing Model Compatibility

What it doesn't do

Usage

VLLM Integration (Recommended)

Custom Driver Usage (Advanced)

How the Streaming Approach Works

Features

Installation

Development

Field Schema Format

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes