Optimize Pydantic model field descriptions using DSPy
Project description
DSPydantic
mStop manually tuning prompts. Let your data optimize them.
DSPydantic automatically optimizes your Pydantic model prompts and field descriptions using DSPy. Extract structured data from text, images, and PDFs with higher accuracy and less effort.
The Problem
You've defined a Pydantic model. You're using an LLM to extract data. But:
- Your prompts are guesswork—trial and error until something works
- Accuracy varies wildly depending on input phrasing
- Every new use case means more manual prompt engineering
The Solution
DSPydantic takes your examples and automatically finds the best prompts for your use case:
from pydantic import BaseModel, Field
from dspydantic import Prompter, Example
class Invoice(BaseModel):
vendor: str = Field(description="Company that issued the invoice")
total: str = Field(description="Total amount due")
due_date: str = Field(description="Payment due date")
prompter = Prompter(model=Invoice, model_id="openai/gpt-4o-mini")
# Optimize with examples
result = prompter.optimize(examples=[
Example(
text="Invoice from Acme Corp. Total: $1,250.00. Due: March 15, 2024.",
expected_output={"vendor": "Acme Corp", "total": "$1,250.00", "due_date": "March 15, 2024"}
),
])
# Extract with optimized prompts
invoice = prompter.run("Consolidated Energy Partners | Invoice Total $3,200 | Due 2024-05-30")
Typical improvement: 10-30% higher accuracy with the same LLM.
Installation
pip install dspydantic
Quick Start
Extract Data (No Optimization)
For simple cases, extract immediately:
from pydantic import BaseModel, Field
from dspydantic import Prompter
class Contact(BaseModel):
name: str = Field(description="Person's full name")
email: str = Field(description="Email address")
prompter = Prompter(model=Contact, model_id="openai/gpt-4o-mini")
contact = prompter.run("Reach out to Sarah Chen at sarah.chen@techcorp.io")
# Contact(name='Sarah Chen', email='sarah.chen@techcorp.io')
Optimize for Better Accuracy
When accuracy matters, optimize with examples:
from dspydantic import Example
examples = [
Example(text="...", expected_output={...}),
# 5-20 examples typically enough
]
result = prompter.optimize(examples=examples)
print(f"Accuracy: {result.baseline_score:.0%} → {result.optimized_score:.0%}")
Deploy to Production
# Save optimized prompter
prompter.save("./invoice_prompter")
# Load in production
prompter = Prompter.load("./invoice_prompter", model=Invoice, model_id="openai/gpt-4o-mini")
invoice = prompter.run(new_document)
Why DSPydantic?
| Feature | DSPydantic | Manual Prompting |
|---|---|---|
| Automatic optimization | ✅ Data-driven | ❌ Trial and error |
| Pydantic native | ✅ Full type safety | ⚠️ JSON only |
| Multi-modal | ✅ Text, images, PDFs | ⚠️ Text only |
| Production ready | ✅ Save/load, batch, async | ❌ Manual |
| Confidence scores | ✅ Per-extraction | ❌ No |
Built on: DSPy (Stanford's optimization framework) + Pydantic (Python data validation)
Input Types
# Text
Example(text="Invoice from Acme...", expected_output={...})
# Images
Example(image_path="receipt.png", expected_output={...})
# PDFs
Example(pdf_path="contract.pdf", expected_output={...})
Production Features
# Caching (reduce API costs)
prompter = Prompter(model=Invoice, model_id="openai/gpt-4o-mini", cache=True)
# Batch processing
invoices = prompter.predict_batch(documents, max_workers=4)
# Async
invoice = await prompter.apredict(document)
# Confidence scores
result = prompter.predict_with_confidence(document)
if result.confidence > 0.9:
process(result.data)
Documentation
Full documentation at davidberenstein1957.github.io/dspydantic
- Getting Started - First extraction in 5 minutes
- Use Cases - Real-world examples
- Cookbook - Copy-paste patterns
- API Reference - Full documentation
License
Apache 2.0
Contributing
Contributions welcome! Open an issue or submit a pull request.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file dspydantic-0.1.1.tar.gz.
File metadata
- Download URL: dspydantic-0.1.1.tar.gz
- Upload date:
- Size: 5.4 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.17 {"installer":{"name":"uv","version":"0.9.17","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8a2fc09ea4424eb91b20394756fb0fded5ce799593b54fe131bdc1a59664b400
|
|
| MD5 |
a735f9ae657973650d7ca9c988257d49
|
|
| BLAKE2b-256 |
02f969fadfaf01b951ce12d9a279d3895b6033667c2e3dea6938e188f387c047
|
File details
Details for the file dspydantic-0.1.1-py3-none-any.whl.
File metadata
- Download URL: dspydantic-0.1.1-py3-none-any.whl
- Upload date:
- Size: 58.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.17 {"installer":{"name":"uv","version":"0.9.17","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e122a86f887c2d34d609c4d7eea67f5940432e56a7879b386aca8a1a5e46e0cc
|
|
| MD5 |
fb96b2d5b82df3391f077d665399b2a4
|
|
| BLAKE2b-256 |
2a49f7dfd20b06290e9979c48d6e418dd48f515eb9d9d2ae0216a65136a4010a
|