Skip to main content

Lightweight orchestration toolkit to generate, validate, repair and enforce structured output from LLMs

Project description

parsec

PyPI version Python Versions Tests License: MIT Documentation PyPI Downloads

⚡ Lightweight orchestration toolkit to generate, validate, repair and enforce structured output from large language models (LLMs). The project provides a provider-agnostic adapter interface, validators (JSON/Pydantic), prompt template management with versioning, caching, dataset collection, and an enforcement engine that retries and repairs LLM output until it conforms to a schema.

This repository contains:

  • Adapter abstractions for OpenAI, Anthropic, and Google Gemini.
  • Validation and repair utilities for JSON and Pydantic schemas.
  • An EnforcementEngine that generates, validates, repairs, and retries.
  • Prompt template system with versioning and YAML persistence.
  • LRU caching to reduce redundant API calls and costs.
  • Dataset collection for training and fine-tuning.
  • Examples and comprehensive test suite.

Features

Core Enforcement

  • Provider-agnostic adapters: OpenAI, Anthropic (Claude), Google Gemini
  • Multiple validators: JSON Schema, Pydantic models
  • Automatic repair: Schema-based heuristics fix common formatting issues
  • Retry loop: Progressive feedback to model for iterative repair
  • Dataset collection: Capture and export training data (JSONL, JSON, CSV)

Prompt Management

  • Template system: Type-safe variable substitution with validation
  • Version control: Semantic versioning (1.0.0, 2.0.0, etc.)
  • YAML persistence: Save/load templates from files
  • Template registry: Centralized management of all templates
  • Template manager: One-line API for template + enforcement

Performance & Caching

  • LRU cache: In-memory caching with TTL support
  • Cost reduction: Avoid redundant API calls for identical requests
  • Cache integration: Seamless integration with enforcement engine
  • Statistics tracking: Monitor cache hits, misses, and hit rates

Installation

pip install parsec-llm

Or for development:

git clone https://github.com/olliekm/parsec.git
cd parsec
pip install -e ".[dev]"

Quick Start

Basic Usage

from parsec.models.adapters import OpenAIAdapter
from parsec.validators import JSONValidator
from parsec.enforcement import EnforcementEngine

# Set up components
adapter = OpenAIAdapter(api_key="your-api-key", model="gpt-4o-mini")
validator = JSONValidator()
engine = EnforcementEngine(adapter, validator, max_retries=3)

# Define your schema
schema = {
    "type": "object",
    "properties": {
        "name": {"type": "string"},
        "age": {"type": "integer"}
    },
    "required": ["name", "age"]
}

# Enforce structured output
result = await engine.enforce(
    "Extract: John Doe is 30 years old",
    schema
)

print(result.data)  # {"name": "John Doe", "age": 30}
print(result.success)  # True
print(result.retry_count)  # 0

With Caching

from parsec.cache import InMemoryCache

# Add cache to reduce redundant API calls
cache = InMemoryCache(max_size=100, default_ttl=3600)
engine = EnforcementEngine(adapter, validator, cache=cache)

# First call hits API
result1 = await engine.enforce(prompt, schema)

# Second identical call returns cached result (no API call!)
result2 = await engine.enforce(prompt, schema)

# Check cache performance
stats = cache.get_stats()
print(stats)  # {'hits': 1, 'misses': 1, 'hit_rate': '50.00%'}

With Prompt Templates

from parsec.prompts import PromptTemplate, TemplateRegistry, TemplateManager

# Create a reusable template
template = PromptTemplate(
    name="extract_person",
    template="Extract person info from: {text}\n\nReturn as JSON.",
    variables={"text": str},
    required=["text"]
)

# Register with version
registry = TemplateRegistry()
registry.register(template, "1.0.0")

# Use with enforcement
manager = TemplateManager(registry, engine)
result = await manager.enforce_with_template(
    template_name="extract_person",
    variables={"text": "John Doe, age 30"},
    schema=schema
)

# Save templates to file
registry.save_to_disk("templates.yaml")

# Load templates later
registry.load_from_disk("templates.yaml")

With Pydantic Models

from pydantic import BaseModel
from parsec.validators import PydanticValidator

class Person(BaseModel):
    name: str
    age: int
    email: str

validator = PydanticValidator()
engine = EnforcementEngine(adapter, validator)

result = await engine.enforce(
    "Extract: John Doe, 30 years old, john@example.com",
    Person
)

print(result.data)  # {"name": "John Doe", "age": 30, "email": "john@example.com"}

Development Setup

Requirements: Python 3.9+

  1. Install dependencies:
pip install -e ".[dev]"
  1. Run tests:
poetry run pytest -q
  1. Run the OpenAI example (requires OPENAI_API_KEY):
export OPENAI_API_KEY="sk-..."
export OPENAI_MODEL="gpt-4o-mini"  # optional
poetry run python examples/run_with_openai.py

The example demonstrates using OpenAIAdapter, JSONValidator and EnforcementEngine to extract structured data using a JSON schema.

Code Structure

  • src/parsec/core/ — Core abstractions and schemas
  • src/parsec/models/ — LLM provider adapters (OpenAI, Anthropic, Gemini)
  • src/parsec/validators/ — Validator implementations (JSON, Pydantic)
  • src/parsec/enforcement/ — Enforcement and orchestration engine
  • src/parsec/prompts/ — Prompt template system with versioning
  • src/parsec/cache/ — Caching implementations (InMemoryCache)
  • src/parsec/training/ — Dataset collection for fine-tuning
  • src/parsec/utils/ — Utility functions (partial JSON parsing)
  • examples/ — Working examples with real API calls
  • tests/ — Comprehensive test suite with pytest

Examples

Check out the examples/ directory for complete working examples:

  • basic_usage.py - Simple extraction with JSON schema
  • prompt_template_example.py - Template system with versioning
  • prompt_persistence_example.py - Save/load templates from YAML
  • template_manager_example.py - TemplateManager integration
  • template_manager_live_example.py - Live demo with real API calls
  • streaming_example.py - Streaming support (experimental)

Run any example:

python3 examples/template_manager_live_example.py

Testing

Run the test suite with:

poetry run pytest -q

Advanced Features

Dataset Collection

Collect and export training data for fine-tuning:

from parsec.training import DatasetCollector

collector = DatasetCollector(
    output_path="./training_data",
    format="jsonl",  # or "json", "csv"
    auto_flush=True
)

engine = EnforcementEngine(adapter, validator, collector=collector)

# Data is automatically collected during enforcement
result = await engine.enforce(prompt, schema)

# Export collected data
collector.flush()  # Writes to disk

Template Versioning Workflow

# v1.0.0 - Initial template
template_v1 = PromptTemplate(
    name="extract_person",
    template="Extract: {text}",
    variables={"text": str},
    required=["text"]
)
registry.register(template_v1, "1.0.0")

# v2.0.0 - Improved with validation rules
template_v2 = PromptTemplate(
    name="extract_person",
    template="Extract: {text}\n\nValidation: {rules}",
    variables={"text": str, "rules": str},
    required=["text"],
    defaults={"rules": "Strict validation"}
)
registry.register(template_v2, "2.0.0")

# Use specific version
result = await manager.enforce_with_template(
    template_name="extract_person",
    version="2.0.0",  # Explicit version
    variables={"text": "John Doe, 30"}
)

# Or use latest automatically
result = await manager.enforce_with_template(
    template_name="extract_person",  # Gets v2.0.0
    variables={"text": "John Doe, 30"}
)

Multi-Provider Support

from parsec.models.adapters import OpenAIAdapter, AnthropicAdapter

# Switch between providers easily
openai_adapter = OpenAIAdapter(api_key=openai_key, model="gpt-4o-mini")
anthropic_adapter = AnthropicAdapter(api_key=anthropic_key, model="claude-3-5-sonnet-20241022")

# Same enforcement code works with any adapter
engine = EnforcementEngine(anthropic_adapter, validator)
result = await engine.enforce(prompt, schema)

Roadmap

  • Core enforcement engine with retry logic
  • Multiple LLM providers (OpenAI, Anthropic, Gemini)
  • JSON and Pydantic validation
  • LRU caching with TTL
  • Prompt template system with versioning
  • Dataset collection for training
  • Streaming support for real-time output
  • Batch processing with rate limiting
  • Cost tracking and analytics
  • A/B testing for prompt variants
  • Output post-processing pipeline

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Notes

  • Examples with real API calls will incur costs — use test/development API keys
  • The framework is intentionally modular — extend adapters and validators as needed
  • Template system supports version control via YAML files for team collaboration

License

This project is licensed under the MIT License - see the LICENSE file for details.

Copyright (c) 2025 Oliver Kwun-Morfitt

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

parsec_llm-0.2.0.tar.gz (28.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

parsec_llm-0.2.0-py3-none-any.whl (33.6 kB view details)

Uploaded Python 3

File details

Details for the file parsec_llm-0.2.0.tar.gz.

File metadata

  • Download URL: parsec_llm-0.2.0.tar.gz
  • Upload date:
  • Size: 28.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for parsec_llm-0.2.0.tar.gz
Algorithm Hash digest
SHA256 41065c68fd719430b5e87f156ce0de8d0f12f11280b994064987ac2321d2c3a2
MD5 57ba59ca32d293a55b5892a9e0d582e8
BLAKE2b-256 bdf4d1d5536d20ba3bcaf12bd6def16c88331453156a2f7c1b6dfe7a668f0219

See more details on using hashes here.

File details

Details for the file parsec_llm-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: parsec_llm-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 33.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for parsec_llm-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 110eba85b7967c9eacf68333a055be5ce392bb3ed50fb3efcc5c4435ca60a138
MD5 988d3717be06d7bfd34d1c2b675b9a02
BLAKE2b-256 d058392ea9853327ff57b1622a529ab1bb98d7449475ab69eed55dec93145923

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page