Optimize Pydantic model field descriptions using DSPy

These details have not been verified by PyPI

Project description

DSPydantic

Stop manually tuning prompts. Let your data optimize them.

DSPydantic automatically optimizes your Pydantic model prompts and field descriptions using DSPy. Extract structured data from text, images, and PDFs with higher accuracy and less effort.

The Problem

You've defined a Pydantic model. You're using an LLM to extract data. But:

Your prompts are guesswork—trial and error until something works
Accuracy varies wildly depending on input phrasing
Every new use case means more manual prompt engineering

The Solution

DSPydantic takes your examples and automatically finds the best prompts for your use case:

from pydantic import BaseModel, Field
from dspydantic import Prompter, Example

class Invoice(BaseModel):
    vendor: str = Field(description="Company that issued the invoice")
    total: str = Field(description="Total amount due")
    due_date: str = Field(description="Payment due date")

prompter = Prompter(model=Invoice, model_id="openai/gpt-4o-mini")

# Optimize with examples
result = prompter.optimize(examples=[
    Example(
        text="Invoice from Acme Corp. Total: $1,250.00. Due: March 15, 2024.",
        expected_output={"vendor": "Acme Corp", "total": "$1,250.00", "due_date": "March 15, 2024"}
    ),
])

# Extract with optimized prompts
invoice = prompter.run("Consolidated Energy Partners | Invoice Total $3,200 | Due 2024-05-30")

Typical improvement: 10-30% higher accuracy with the same LLM.

Installation

pip install dspydantic

Quick Start

Extract Data (No Optimization)

For simple cases, extract immediately:

from pydantic import BaseModel, Field
from dspydantic import Prompter

class Contact(BaseModel):
    name: str = Field(description="Person's full name")
    email: str = Field(description="Email address")

prompter = Prompter(model=Contact, model_id="openai/gpt-4o-mini")

contact = prompter.run("Reach out to Sarah Chen at sarah.chen@techcorp.io")
# Contact(name='Sarah Chen', email='sarah.chen@techcorp.io')

Optimize for Better Accuracy

When accuracy matters, optimize with examples:

from dspydantic import Example

examples = [
    Example(text="...", expected_output={...}),
    # 5-20 examples typically enough
]

result = prompter.optimize(examples=examples)
print(f"Accuracy: {result.baseline_score:.0%} → {result.optimized_score:.0%}")

By default, optimization uses sequential mode: each field description is optimized independently (deepest-nested first), then prompts. This reduces the search space and often yields better results.

Deploy to Production

# Save optimized prompter
prompter.save("./invoice_prompter")

# Load in production
prompter = Prompter.load("./invoice_prompter", model=Invoice, model_id="openai/gpt-4o-mini")
invoice = prompter.run(new_document)

Why DSPydantic?

Feature	DSPydantic	Manual Prompting
Automatic optimization	✅ Data-driven	❌ Trial and error
Pydantic native	✅ Full type safety	⚠️ JSON only
Multi-modal	✅ Text, images, PDFs	⚠️ Text only
Production ready	✅ Save/load, batch, async	❌ Manual
Confidence scores	✅ Per-extraction	❌ No

Built on: DSPy (Stanford's optimization framework) + Pydantic (Python data validation)

Input Types

# Text
Example(text="Invoice from Acme...", expected_output={...})

# Images
Example(image_path="receipt.png", expected_output={...})

# PDFs
Example(pdf_path="contract.pdf", expected_output={...})

Optimization Options

# Focus on specific fields only
result = prompter.optimize(
    examples=examples,
    include_fields=["address", "total"],  # Only optimize these
)

# Exclude fields from scoring (still extracted)
result = prompter.optimize(
    examples=examples,
    exclude_fields=["metadata", "timestamp"],
)

# Single-pass mode (all fields at once, legacy behavior)
result = prompter.optimize(
    examples=examples,
    sequential=False,
)

Production Features

# Caching (reduce API costs)
prompter = Prompter(model=Invoice, model_id="openai/gpt-4o-mini", cache=True)

# Batch processing
invoices = prompter.predict_batch(documents, max_workers=4)

# Async
invoice = await prompter.apredict(document)

# Confidence scores
result = prompter.predict_with_confidence(document)
if result.confidence > 0.9:
    process(result.data)

Documentation

Full documentation at davidberenstein1957.github.io/dspydantic

Getting Started - First extraction in 5 minutes
Configure Optimizations - Optimizers, sequential mode, threads
Field Inclusion & Exclusion - Focus optimization on specific fields
API Reference - Full documentation

License

Apache 2.0

Contributing

Contributions welcome! Open an issue or submit a pull request.

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.1.6

Mar 20, 2026

0.1.5

Mar 20, 2026

0.1.4

Mar 18, 2026

0.1.3

Mar 18, 2026

This version

0.1.2

Mar 16, 2026

0.1.1

Jan 30, 2026

0.1

Jan 27, 2026

0.0.7

Dec 10, 2025

0.0.6

Dec 9, 2025

0.0.5

Dec 9, 2025

0.0.4

Dec 6, 2025

0.0.3

Dec 5, 2025

0.0.2

Dec 5, 2025

0.0.1

Dec 5, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dspydantic-0.1.2.tar.gz (5.4 MB view details)

Uploaded Mar 16, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

dspydantic-0.1.2-py3-none-any.whl (61.1 kB view details)

Uploaded Mar 16, 2026 Python 3

File details

Details for the file dspydantic-0.1.2.tar.gz.

File metadata

Download URL: dspydantic-0.1.2.tar.gz
Upload date: Mar 16, 2026
Size: 5.4 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.9.17 {"installer":{"name":"uv","version":"0.9.17","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for dspydantic-0.1.2.tar.gz
Algorithm	Hash digest
SHA256	`a3d9b97f937526194a7f88453283fd266f957ddbaf2a02b64ffe559916579e43`
MD5	`127ea119ab4961608c014a2026ea9834`
BLAKE2b-256	`f041ebd1a8941cbb9a968eac4184435a9fe0b80531f8f04c47b75dad3bd26406`

See more details on using hashes here.

File details

Details for the file dspydantic-0.1.2-py3-none-any.whl.

File metadata

Download URL: dspydantic-0.1.2-py3-none-any.whl
Upload date: Mar 16, 2026
Size: 61.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.9.17 {"installer":{"name":"uv","version":"0.9.17","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for dspydantic-0.1.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`b56f3873e16c057d3d2681f3a27ea9479335f08a3ba24536b301116a43c49c30`
MD5	`19b0aa454518bad10e2ece6fb1f7fa50`
BLAKE2b-256	`3d7d0e486f23bad6d5a2bc35f2a721a07f8a707cec0a0070e1d43e50886df753`

See more details on using hashes here.

dspydantic 0.1.2

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

DSPydantic

The Problem

The Solution

Installation

Quick Start

Extract Data (No Optimization)

Optimize for Better Accuracy

Deploy to Production

Why DSPydantic?

Input Types

Optimization Options

Production Features

Documentation

License

Contributing

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes