Skip to main content

Lightweight orchestration toolkit to generate, validate, repair and enforce structured output from LLMs

Project description

parsec

PyPI version Python Versions Tests License: MIT Documentation PyPI Downloads

⚡ Lightweight orchestration toolkit to generate, validate, repair and enforce structured output from large language models (LLMs). The project provides a provider-agnostic adapter interface, validators (JSON/Pydantic), prompt template management with versioning, caching, dataset collection, and an enforcement engine that retries and repairs LLM output until it conforms to a schema.

This repository contains:

  • Adapter abstractions for OpenAI, Anthropic, and Google Gemini.
  • Validation and repair utilities for JSON and Pydantic schemas.
  • An EnforcementEngine that generates, validates, repairs, and retries.
  • Prompt template system with versioning and YAML persistence.
  • LRU caching to reduce redundant API calls and costs.
  • Dataset collection for training and fine-tuning.
  • Examples and comprehensive test suite.

Features

Core Enforcement

  • Provider-agnostic adapters: OpenAI, Anthropic (Claude), Google Gemini
  • Multiple validators: JSON Schema, Pydantic models
  • Automatic repair: Schema-based heuristics fix common formatting issues
  • Retry loop: Progressive feedback to model for iterative repair
  • Dataset collection: Capture and export training data (JSONL, JSON, CSV)

Prompt Management

  • Template system: Type-safe variable substitution with validation
  • Version control: Semantic versioning (1.0.0, 2.0.0, etc.)
  • YAML persistence: Save/load templates from files
  • Template registry: Centralized management of all templates
  • Template manager: One-line API for template + enforcement

Performance & Caching

  • LRU cache: In-memory caching with TTL support
  • Cost reduction: Avoid redundant API calls for identical requests
  • Cache integration: Seamless integration with enforcement engine
  • Statistics tracking: Monitor cache hits, misses, and hit rates

Installation

pip install parsec-llm

Or for development:

git clone https://github.com/olliekm/parsec.git
cd parsec
pip install -e ".[dev]"

Quick Start

Basic Usage

from parsec.models.adapters import OpenAIAdapter
from parsec.validators import JSONValidator
from parsec.enforcement import EnforcementEngine

# Set up components
adapter = OpenAIAdapter(api_key="your-api-key", model="gpt-4o-mini")
validator = JSONValidator()
engine = EnforcementEngine(adapter, validator, max_retries=3)

# Define your schema
schema = {
    "type": "object",
    "properties": {
        "name": {"type": "string"},
        "age": {"type": "integer"}
    },
    "required": ["name", "age"]
}

# Enforce structured output
result = await engine.enforce(
    "Extract: John Doe is 30 years old",
    schema
)

print(result.data)  # {"name": "John Doe", "age": 30}
print(result.success)  # True
print(result.retry_count)  # 0

With Caching

from parsec.cache import InMemoryCache

# Add cache to reduce redundant API calls
cache = InMemoryCache(max_size=100, default_ttl=3600)
engine = EnforcementEngine(adapter, validator, cache=cache)

# First call hits API
result1 = await engine.enforce(prompt, schema)

# Second identical call returns cached result (no API call!)
result2 = await engine.enforce(prompt, schema)

# Check cache performance
stats = cache.get_stats()
print(stats)  # {'hits': 1, 'misses': 1, 'hit_rate': '50.00%'}

With Prompt Templates

from parsec.prompts import PromptTemplate, TemplateRegistry, TemplateManager

# Create a reusable template
template = PromptTemplate(
    name="extract_person",
    template="Extract person info from: {text}\n\nReturn as JSON.",
    variables={"text": str},
    required=["text"]
)

# Register with version
registry = TemplateRegistry()
registry.register(template, "1.0.0")

# Use with enforcement
manager = TemplateManager(registry, engine)
result = await manager.enforce_with_template(
    template_name="extract_person",
    variables={"text": "John Doe, age 30"},
    schema=schema
)

# Save templates to file
registry.save_to_disk("templates.yaml")

# Load templates later
registry.load_from_disk("templates.yaml")

With Pydantic Models

from pydantic import BaseModel
from parsec.validators import PydanticValidator

class Person(BaseModel):
    name: str
    age: int
    email: str

validator = PydanticValidator()
engine = EnforcementEngine(adapter, validator)

result = await engine.enforce(
    "Extract: John Doe, 30 years old, john@example.com",
    Person
)

print(result.data)  # {"name": "John Doe", "age": 30, "email": "john@example.com"}

Development Setup

Requirements: Python 3.9+

  1. Install dependencies:
pip install -e ".[dev]"
  1. Run tests:
poetry run pytest -q
  1. Run the OpenAI example (requires OPENAI_API_KEY):
export OPENAI_API_KEY="sk-..."
export OPENAI_MODEL="gpt-4o-mini"  # optional
poetry run python examples/run_with_openai.py

The example demonstrates using OpenAIAdapter, JSONValidator and EnforcementEngine to extract structured data using a JSON schema.

Code Structure

  • src/parsec/core/ — Core abstractions and schemas
  • src/parsec/models/ — LLM provider adapters (OpenAI, Anthropic, Gemini)
  • src/parsec/validators/ — Validator implementations (JSON, Pydantic)
  • src/parsec/enforcement/ — Enforcement and orchestration engine
  • src/parsec/prompts/ — Prompt template system with versioning
  • src/parsec/cache/ — Caching implementations (InMemoryCache)
  • src/parsec/training/ — Dataset collection for fine-tuning
  • src/parsec/utils/ — Utility functions (partial JSON parsing)
  • examples/ — Working examples with real API calls
  • tests/ — Comprehensive test suite with pytest

Examples

Check out the examples/ directory for complete working examples:

  • basic_usage.py - Simple extraction with JSON schema
  • prompt_template_example.py - Template system with versioning
  • prompt_persistence_example.py - Save/load templates from YAML
  • template_manager_example.py - TemplateManager integration
  • template_manager_live_example.py - Live demo with real API calls
  • streaming_example.py - Streaming support (experimental)

Run any example:

python3 examples/template_manager_live_example.py

Testing

Run the test suite with:

poetry run pytest -q

Advanced Features

Dataset Collection

Collect and export training data for fine-tuning:

from parsec.training import DatasetCollector

collector = DatasetCollector(
    output_path="./training_data",
    format="jsonl",  # or "json", "csv"
    auto_flush=True
)

engine = EnforcementEngine(adapter, validator, collector=collector)

# Data is automatically collected during enforcement
result = await engine.enforce(prompt, schema)

# Export collected data
collector.flush()  # Writes to disk

Template Versioning Workflow

# v1.0.0 - Initial template
template_v1 = PromptTemplate(
    name="extract_person",
    template="Extract: {text}",
    variables={"text": str},
    required=["text"]
)
registry.register(template_v1, "1.0.0")

# v2.0.0 - Improved with validation rules
template_v2 = PromptTemplate(
    name="extract_person",
    template="Extract: {text}\n\nValidation: {rules}",
    variables={"text": str, "rules": str},
    required=["text"],
    defaults={"rules": "Strict validation"}
)
registry.register(template_v2, "2.0.0")

# Use specific version
result = await manager.enforce_with_template(
    template_name="extract_person",
    version="2.0.0",  # Explicit version
    variables={"text": "John Doe, 30"}
)

# Or use latest automatically
result = await manager.enforce_with_template(
    template_name="extract_person",  # Gets v2.0.0
    variables={"text": "John Doe, 30"}
)

Multi-Provider Support

from parsec.models.adapters import OpenAIAdapter, AnthropicAdapter

# Switch between providers easily
openai_adapter = OpenAIAdapter(api_key=openai_key, model="gpt-4o-mini")
anthropic_adapter = AnthropicAdapter(api_key=anthropic_key, model="claude-3-5-sonnet-20241022")

# Same enforcement code works with any adapter
engine = EnforcementEngine(anthropic_adapter, validator)
result = await engine.enforce(prompt, schema)

Roadmap

  • Core enforcement engine with retry logic
  • Multiple LLM providers (OpenAI, Anthropic, Gemini)
  • JSON and Pydantic validation
  • LRU caching with TTL
  • Prompt template system with versioning
  • Dataset collection for training
  • Streaming support for real-time output
  • Batch processing with rate limiting
  • Cost tracking and analytics
  • A/B testing for prompt variants
  • Output post-processing pipeline

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Notes

  • Examples with real API calls will incur costs — use test/development API keys
  • The framework is intentionally modular — extend adapters and validators as needed
  • Template system supports version control via YAML files for team collaboration

License

This project is licensed under the MIT License - see the LICENSE file for details.

Copyright (c) 2025 Oliver Kwun-Morfitt

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

parsec_llm-0.2.1.tar.gz (34.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

parsec_llm-0.2.1-py3-none-any.whl (42.3 kB view details)

Uploaded Python 3

File details

Details for the file parsec_llm-0.2.1.tar.gz.

File metadata

  • Download URL: parsec_llm-0.2.1.tar.gz
  • Upload date:
  • Size: 34.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for parsec_llm-0.2.1.tar.gz
Algorithm Hash digest
SHA256 c7ace359f8361ce11a8e2b760b23f88915ab5b23d595d78d5d3b2a89ab27d7d5
MD5 d3bd7a0bdec9f8407411d7d686287943
BLAKE2b-256 5be1ea7d262f9a871d76f7c70690bc9985ce62748689dbbadc85bd214ecd0f86

See more details on using hashes here.

File details

Details for the file parsec_llm-0.2.1-py3-none-any.whl.

File metadata

  • Download URL: parsec_llm-0.2.1-py3-none-any.whl
  • Upload date:
  • Size: 42.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for parsec_llm-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 f35108d2e348b81c11d0b75ccf16007df190aec3efd48eff8f5cd33fc85cf1a3
MD5 3466216ff6193905c93f97a84f8e750f
BLAKE2b-256 13e5a45a8641157e34d0211cfe1b308521f1f9167422df2240bac79af6acb6e6

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page