Skip to main content

A unified interface for streaming structured JSON from OpenAI, Anthropic, and Google Gemini.

Project description

LLM JSON Streaming

A unified Python library for streaming structured JSON outputs from OpenAI, Anthropic (Claude), and Google Gemini.

This library abstracts the differences between providers' structured output APIs and provides a consistent interface to stream JSON data and parsed Pydantic objects.

Features

  • Unified Interface: Use a single API to interact with OpenAI, Anthropic, and Google Gemini.
  • JSON Streaming: Access raw JSON chunks as they are generated (delta).
  • Structured Outputs: Enforce schema validation using Pydantic models.
  • Partial Parsing: Access accumulated JSON strings during streaming.
  • Claude Structured Outputs: Automatically upgrades Claude Sonnet 4.5 / Opus 4.1 requests to Anthropic's JSON outputs for guaranteed schemas.
  • Claude Prefill Strategy: Older Claude models avoid tool calls entirely—schema-aware prefilling keeps responses JSON-only while still streaming deltas. Includes JSON repair for partial object support.
  • Google Gemini Support: Native structured outputs with JSON repair for enhanced partial object support.

Installation

This project is managed with uv.

# Clone the repository
git clone https://github.com/yourusername/llm-json-streaming.git
cd llm-json-streaming

# Install dependencies
uv sync

Configuration

Set your API keys in a .env file:

OPENAI_API_KEY=your_openai_api_key
OPENAI_BASE_URL=https://api.openai.com/v1

ANTHROPIC_API_KEY=your_anthropic_api_key
ANTHROPIC_BASE_URL=https://api.anthropic.com

GEMINI_API_KEY=your_gemini_api_key
GOOGLE_BASE_URL=https://generativelanguage.googleapis.com  # Optional

Usage

Define your output schema using Pydantic and pass it to the provider.

import asyncio
from pydantic import BaseModel
from llm_json_streaming import create_provider

# 1. Define your schema
class UserProfile(BaseModel):
    name: str
    age: int
    bio: str

async def main():
    # 2. Initialize provider using the factory
    # Available: "openai", "anthropic", "claude", "google"
    # Ensure environment variables are set, or pass api_key="..."
    try:
        # For Anthropic, you can optionally specify mode:
        # provider = create_provider("anthropic", mode="structured")  # Force structured outputs
        # provider = create_provider("anthropic", mode="prefill")     # Force prefill mode
        # provider = create_provider("anthropic", mode="auto")        # Auto-detect (default)
        provider = create_provider("openai")
    except ValueError as e:
        print(e)
        return

    prompt = "Generate a profile for a fictional software engineer."

    # 3. Stream results
    print("Streaming JSON...")
    try:
        async for chunk in provider.stream_json(prompt, UserProfile):
            # Real-time partial parsed object (recommended for streaming updates)
            if "partial_object" in chunk:
                # Display the current best partial/complete parsed object
                user_profile = chunk["partial_object"]
                print(f"\rCurrent: {user_profile.name if user_profile.name else '...'}, {user_profile.age if user_profile.age else '?'}", end="", flush=True)

            # Raw text delta (for character-by-character display)
            if "delta" in chunk:
                print(chunk["delta"], end="", flush=True)

            # Final parsed object (complete and validated)
            if "final_object" in chunk:
                user_profile = chunk["final_object"]
                print(f"\n\nComplete: {user_profile.name}, {user_profile.age}")
    except Exception as e:
        print(f"\nError during streaming: {e}")

if __name__ == "__main__":
    asyncio.run(main())

Streaming Interface

The stream_json() method yields dictionaries with different types of content during streaming:

Chunk Fields

  • partial_object: The current best parsed object. Available from the beginning of streaming in all modes:
    • Early stage: Returns partial dictionaries for incomplete JSON
    • Later stage: Returns validated Pydantic model instances for complete/repairable JSON
  • delta: Raw text characters as they are generated by the LLM.
  • final_object: The complete, validated Pydantic object when streaming finishes.
  • partial_json: The current accumulated JSON text string.
  • final_json: The complete JSON text string when streaming finishes.

Recommended Usage Pattern

async for chunk in provider.stream_json(prompt, UserProfile):
    # Use partial_object for real-time updates (recommended)
    if "partial_object" in chunk:
        user_profile = chunk["partial_object"]
        # Available from the beginning - starts as dict, becomes Pydantic object
        # Handle both types gracefully for consistent UI updates
        if hasattr(user_profile, 'model_dump'):
            # Pydantic model (complete/repairable JSON)
            name = user_profile.name or "..."
        else:
            # Dictionary (incomplete JSON)
            name = user_profile.get('name', "...")

        update_ui(name)  # Update UI with current best data

    # Use final_object for the final result
    if "final_object" in chunk:
        final_profile = chunk["final_object"]
        # Process the complete validated object
        save_result(final_profile)

Supported Providers & Models

Provider Default Model Method Used
OpenAI gpt-4o-2024-08-06 response_format (Structured Outputs) via beta.chat.completions
Anthropic claude-3-5-sonnet-20240620 (auto-switches to Structured Outputs for claude-sonnet-4.5* / claude-opus-4.1*) Prefill JSON streaming for legacy models, Structured Outputs (output_format + beta header) for Sonnet 4.5 / Opus 4.1
Google gemini-2.5-flash response_mime_type="application/json" with structured outputs via Google GenAI SDK

Anthropic Mode Configuration

You can configure which strategy Anthropic models use through multiple methods:

Method 1: Constructor Mode (Recommended)

from llm_json_streaming import create_provider

# Force structured outputs mode
provider = create_provider("anthropic", mode="structured")

# Force prefill mode
provider = create_provider("anthropic", mode="prefill")

# Auto-detection based on model (default)
provider = create_provider("anthropic", mode="auto")

Method 2: Method Parameter Override

# Temporary override per request
async for chunk in provider.stream_json(prompt, UserProfile,
                                       model="claude-3-5-sonnet-20240620",
                                       use_structured_outputs=True):
    # Uses structured outputs regardless of auto-detection

Mode Priority

  1. Constructor mode (mode= parameter) - Highest priority
  2. Method parameter (use_structured_outputs=) - Medium priority
  3. Auto-detection - Based on model capabilities - Lowest priority

Anthropic Structured Outputs

Claude Sonnet 4.5 and Claude Opus 4.1 support Anthropic's structured output beta. When using structured mode, chunks include partial JSON text and final Pydantic objects automatically.

Anthropic Prefill Mode

All other Claude models receive schema-derived instructions and an assistant prefill (e.g., { or {"field":) so they skip generic preambles and stream JSON directly—no tool definitions or tool-use deltas required.

Enhanced with multi-level partial object support:

  • Real-time partial objects: Available from the first token, even with incomplete JSON
  • Progressive improvement: Starts with partial dictionaries, upgrades to Pydantic objects when JSON becomes complete
  • JSON repair: Automatically fixes incomplete JSON to enable better partial parsing
  • Consistent interface: Behaves like structured outputs while maintaining backward compatibility

Google Gemini Support

Google Gemini models use the Google GenAI SDK with native structured outputs:

from llm_json_streaming import create_provider

provider = create_provider("google")
async for chunk in provider.stream_json(prompt, UserProfile, model="gemini-2.5-flash"):
    # Handle streaming chunks
    if "partial_object" in chunk:
        print(chunk["partial_object"])

Key Features:

  • Native Structured Outputs: Uses response_mime_type="application/json" for guaranteed JSON responses
  • JSON Repair: Automatic repair of incomplete JSON for enhanced partial object support
  • Schema Validation: Direct Pydantic schema integration for type-safe responses
  • Streaming: Real-time partial objects with progressive enhancement

Configuration:

  • Set GEMINI_API_KEY environment variable (required)
  • Optionally set GOOGLE_BASE_URL for custom endpoints
  • Default model: gemini-2.5-flash

Testing

To run the tests with uv:

uv run pytest

Contributing

Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llm_json_streaming-0.1.0.tar.gz (70.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

llm_json_streaming-0.1.0-py3-none-any.whl (21.5 kB view details)

Uploaded Python 3

File details

Details for the file llm_json_streaming-0.1.0.tar.gz.

File metadata

  • Download URL: llm_json_streaming-0.1.0.tar.gz
  • Upload date:
  • Size: 70.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.4

File hashes

Hashes for llm_json_streaming-0.1.0.tar.gz
Algorithm Hash digest
SHA256 4e59363a6c09b7e2fcb765b5859f06cf70dd8218bacd1cad98c8e87a5b96d4f9
MD5 324c273d005fd2ac5a90c07ca1bf0f4c
BLAKE2b-256 62a4d04a04e766767961e53d64925b522abdc4fae5916897f7f7dc79bb53ce13

See more details on using hashes here.

File details

Details for the file llm_json_streaming-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for llm_json_streaming-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 bc30b4836b937d3bd21ac260bdbe8ba24ef645d69f954c6a4d7b7912680e0118
MD5 93bac746676c873819559a4643250ae4
BLAKE2b-256 cd0f1c29712864506bead715f962eb096c0f64edc50f7aa1b44e21d8d0b9a1dd

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page