Lightweight orchestration toolkit to generate, validate, repair and enforce structured output from LLMs
Project description
parsec
⚡ Lightweight orchestration toolkit to generate, validate, repair and enforce structured output from large language models (LLMs). The project provides a provider-agnostic adapter interface, validators (JSON/Pydantic), prompt template management with versioning, caching, dataset collection, and an enforcement engine that retries and repairs LLM output until it conforms to a schema.
This repository contains:
- Adapter abstractions for OpenAI, Anthropic, and Google Gemini.
- Validation and repair utilities for JSON and Pydantic schemas.
- An
EnforcementEnginethat generates, validates, repairs, and retries. - Prompt template system with versioning and YAML persistence.
- LRU caching to reduce redundant API calls and costs.
- Dataset collection for training and fine-tuning.
- Examples and comprehensive test suite.
Features
Core Enforcement
- Provider-agnostic adapters: OpenAI, Anthropic (Claude), Google Gemini
- Multiple validators: JSON Schema, Pydantic models
- Automatic repair: Schema-based heuristics fix common formatting issues
- Retry loop: Progressive feedback to model for iterative repair
- Dataset collection: Capture and export training data (JSONL, JSON, CSV)
Prompt Management
- Template system: Type-safe variable substitution with validation
- Version control: Semantic versioning (1.0.0, 2.0.0, etc.)
- YAML persistence: Save/load templates from files
- Template registry: Centralized management of all templates
- Template manager: One-line API for template + enforcement
Performance & Caching
- LRU cache: In-memory caching with TTL support
- Cost reduction: Avoid redundant API calls for identical requests
- Cache integration: Seamless integration with enforcement engine
- Statistics tracking: Monitor cache hits, misses, and hit rates
Installation
pip install parsec-llm
Or for development:
git clone https://github.com/olliekm/parsec.git
cd parsec
pip install -e ".[dev]"
Quick Start
Basic Usage
from parsec.models.adapters import OpenAIAdapter
from parsec.validators import JSONValidator
from parsec.enforcement import EnforcementEngine
# Set up components
adapter = OpenAIAdapter(api_key="your-api-key", model="gpt-4o-mini")
validator = JSONValidator()
engine = EnforcementEngine(adapter, validator, max_retries=3)
# Define your schema
schema = {
"type": "object",
"properties": {
"name": {"type": "string"},
"age": {"type": "integer"}
},
"required": ["name", "age"]
}
# Enforce structured output
result = await engine.enforce(
"Extract: John Doe is 30 years old",
schema
)
print(result.data) # {"name": "John Doe", "age": 30}
print(result.success) # True
print(result.retry_count) # 0
With Caching
from parsec.cache import InMemoryCache
# Add cache to reduce redundant API calls
cache = InMemoryCache(max_size=100, default_ttl=3600)
engine = EnforcementEngine(adapter, validator, cache=cache)
# First call hits API
result1 = await engine.enforce(prompt, schema)
# Second identical call returns cached result (no API call!)
result2 = await engine.enforce(prompt, schema)
# Check cache performance
stats = cache.get_stats()
print(stats) # {'hits': 1, 'misses': 1, 'hit_rate': '50.00%'}
With Prompt Templates
from parsec.prompts import PromptTemplate, TemplateRegistry, TemplateManager
# Create a reusable template
template = PromptTemplate(
name="extract_person",
template="Extract person info from: {text}\n\nReturn as JSON.",
variables={"text": str},
required=["text"]
)
# Register with version
registry = TemplateRegistry()
registry.register(template, "1.0.0")
# Use with enforcement
manager = TemplateManager(registry, engine)
result = await manager.enforce_with_template(
template_name="extract_person",
variables={"text": "John Doe, age 30"},
schema=schema
)
# Save templates to file
registry.save_to_disk("templates.yaml")
# Load templates later
registry.load_from_disk("templates.yaml")
With Pydantic Models
from pydantic import BaseModel
from parsec.validators import PydanticValidator
class Person(BaseModel):
name: str
age: int
email: str
validator = PydanticValidator()
engine = EnforcementEngine(adapter, validator)
result = await engine.enforce(
"Extract: John Doe, 30 years old, john@example.com",
Person
)
print(result.data) # {"name": "John Doe", "age": 30, "email": "john@example.com"}
Development Setup
Requirements: Python 3.9+
- Install dependencies:
pip install -e ".[dev]"
- Run tests:
poetry run pytest -q
- Run the OpenAI example (requires
OPENAI_API_KEY):
export OPENAI_API_KEY="sk-..."
export OPENAI_MODEL="gpt-4o-mini" # optional
poetry run python examples/run_with_openai.py
The example demonstrates using OpenAIAdapter, JSONValidator and
EnforcementEngine to extract structured data using a JSON schema.
Code Structure
src/parsec/core/— Core abstractions and schemassrc/parsec/models/— LLM provider adapters (OpenAI, Anthropic, Gemini)src/parsec/validators/— Validator implementations (JSON, Pydantic)src/parsec/enforcement/— Enforcement and orchestration enginesrc/parsec/prompts/— Prompt template system with versioningsrc/parsec/cache/— Caching implementations (InMemoryCache)src/parsec/training/— Dataset collection for fine-tuningsrc/parsec/utils/— Utility functions (partial JSON parsing)examples/— Working examples with real API callstests/— Comprehensive test suite with pytest
Examples
Check out the examples/ directory for complete working examples:
basic_usage.py- Simple extraction with JSON schemaprompt_template_example.py- Template system with versioningprompt_persistence_example.py- Save/load templates from YAMLtemplate_manager_example.py- TemplateManager integrationtemplate_manager_live_example.py- Live demo with real API callsstreaming_example.py- Streaming support (experimental)
Run any example:
python3 examples/template_manager_live_example.py
Testing
Run the test suite with:
poetry run pytest -q
Advanced Features
Dataset Collection
Collect and export training data for fine-tuning:
from parsec.training import DatasetCollector
collector = DatasetCollector(
output_path="./training_data",
format="jsonl", # or "json", "csv"
auto_flush=True
)
engine = EnforcementEngine(adapter, validator, collector=collector)
# Data is automatically collected during enforcement
result = await engine.enforce(prompt, schema)
# Export collected data
collector.flush() # Writes to disk
Template Versioning Workflow
# v1.0.0 - Initial template
template_v1 = PromptTemplate(
name="extract_person",
template="Extract: {text}",
variables={"text": str},
required=["text"]
)
registry.register(template_v1, "1.0.0")
# v2.0.0 - Improved with validation rules
template_v2 = PromptTemplate(
name="extract_person",
template="Extract: {text}\n\nValidation: {rules}",
variables={"text": str, "rules": str},
required=["text"],
defaults={"rules": "Strict validation"}
)
registry.register(template_v2, "2.0.0")
# Use specific version
result = await manager.enforce_with_template(
template_name="extract_person",
version="2.0.0", # Explicit version
variables={"text": "John Doe, 30"}
)
# Or use latest automatically
result = await manager.enforce_with_template(
template_name="extract_person", # Gets v2.0.0
variables={"text": "John Doe, 30"}
)
Multi-Provider Support
from parsec.models.adapters import OpenAIAdapter, AnthropicAdapter
# Switch between providers easily
openai_adapter = OpenAIAdapter(api_key=openai_key, model="gpt-4o-mini")
anthropic_adapter = AnthropicAdapter(api_key=anthropic_key, model="claude-3-5-sonnet-20241022")
# Same enforcement code works with any adapter
engine = EnforcementEngine(anthropic_adapter, validator)
result = await engine.enforce(prompt, schema)
Roadmap
- Core enforcement engine with retry logic
- Multiple LLM providers (OpenAI, Anthropic, Gemini)
- JSON and Pydantic validation
- LRU caching with TTL
- Prompt template system with versioning
- Dataset collection for training
- Streaming support for real-time output
- Batch processing with rate limiting
- Cost tracking and analytics
- A/B testing for prompt variants
- Output post-processing pipeline
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
Notes
- Examples with real API calls will incur costs — use test/development API keys
- The framework is intentionally modular — extend adapters and validators as needed
- Template system supports version control via YAML files for team collaboration
License
This project is licensed under the MIT License - see the LICENSE file for details.
Copyright (c) 2025 Oliver Kwun-Morfitt
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file parsec_llm-0.2.1.tar.gz.
File metadata
- Download URL: parsec_llm-0.2.1.tar.gz
- Upload date:
- Size: 34.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c7ace359f8361ce11a8e2b760b23f88915ab5b23d595d78d5d3b2a89ab27d7d5
|
|
| MD5 |
d3bd7a0bdec9f8407411d7d686287943
|
|
| BLAKE2b-256 |
5be1ea7d262f9a871d76f7c70690bc9985ce62748689dbbadc85bd214ecd0f86
|
File details
Details for the file parsec_llm-0.2.1-py3-none-any.whl.
File metadata
- Download URL: parsec_llm-0.2.1-py3-none-any.whl
- Upload date:
- Size: 42.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f35108d2e348b81c11d0b75ccf16007df190aec3efd48eff8f5cd33fc85cf1a3
|
|
| MD5 |
3466216ff6193905c93f97a84f8e750f
|
|
| BLAKE2b-256 |
13e5a45a8641157e34d0211cfe1b308521f1f9167422df2240bac79af6acb6e6
|