Skip to main content

Optimize Pydantic model field descriptions using DSPy

Project description

DSPydantic

Stop manually tuning prompts. Let your data optimize them.

DSPydantic automatically optimizes your Pydantic model prompts and field descriptions using DSPy. Extract structured data from text, images, and PDFs with higher accuracy and less effort.

PyPI License Documentation

The Problem

You've defined a Pydantic model. You're using an LLM to extract data. But:

  • Your prompts are guesswork—trial and error until something works
  • Accuracy varies wildly depending on input phrasing
  • Every new use case means more manual prompt engineering

The Solution

DSPydantic takes your examples and automatically finds the best prompts for your use case:

from pydantic import BaseModel, Field
from dspydantic import Prompter, Example

class Invoice(BaseModel):
    vendor: str = Field(description="Company that issued the invoice")
    total: str = Field(description="Total amount due")
    due_date: str = Field(description="Payment due date")

prompter = Prompter(model=Invoice, model_id="openai/gpt-4o-mini")

# Optimize with examples
result = prompter.optimize(examples=[
    Example(
        text="Invoice from Acme Corp. Total: $1,250.00. Due: March 15, 2024.",
        expected_output={"vendor": "Acme Corp", "total": "$1,250.00", "due_date": "March 15, 2024"}
    ),
])

# Extract with optimized prompts
invoice = prompter.run("Consolidated Energy Partners | Invoice Total $3,200 | Due 2024-05-30")

Typical improvement: 10-30% higher accuracy with the same LLM.

Installation

pip install dspydantic

Quick Start

Extract Data (No Optimization)

For simple cases, extract immediately:

from pydantic import BaseModel, Field
from dspydantic import Prompter

class Contact(BaseModel):
    name: str = Field(description="Person's full name")
    email: str = Field(description="Email address")

prompter = Prompter(model=Contact, model_id="openai/gpt-4o-mini")

contact = prompter.run("Reach out to Sarah Chen at sarah.chen@techcorp.io")
# Contact(name='Sarah Chen', email='sarah.chen@techcorp.io')

Optimize for Better Accuracy

When accuracy matters, optimize with examples:

from dspydantic import Example

examples = [
    Example(text="...", expected_output={...}),
    # 5-20 examples typically enough
]

result = prompter.optimize(examples=examples)
print(f"Accuracy: {result.baseline_score:.0%}{result.optimized_score:.0%}")

By default, optimization uses sequential mode: each field description is optimized independently (deepest-nested first), then prompts. This reduces the search space and often yields better results.

Deploy to Production

# Save optimized prompter
prompter.save("./invoice_prompter")

# Load in production
prompter = Prompter.load("./invoice_prompter", model=Invoice, model_id="openai/gpt-4o-mini")
invoice = prompter.run(new_document)

Why DSPydantic?

Feature DSPydantic Manual Prompting
Automatic optimization ✅ Data-driven ❌ Trial and error
Pydantic native ✅ Full type safety ⚠️ JSON only
Multi-modal ✅ Text, images, PDFs ⚠️ Text only
Production ready ✅ Save/load, batch, async ❌ Manual
Confidence scores ✅ Per-extraction ❌ No

Built on: DSPy (Stanford's optimization framework) + Pydantic (Python data validation)

Input Types

# Text
Example(text="Invoice from Acme...", expected_output={...})

# Images
Example(image_path="receipt.png", expected_output={...})

# PDFs
Example(pdf_path="contract.pdf", expected_output={...})

Optimization Options

# Focus on specific fields only
result = prompter.optimize(
    examples=examples,
    include_fields=["address", "total"],  # Only optimize these
)

# Exclude fields from scoring (still extracted)
result = prompter.optimize(
    examples=examples,
    exclude_fields=["metadata", "timestamp"],
)

# Single-pass mode (all fields at once, legacy behavior)
result = prompter.optimize(
    examples=examples,
    sequential=False,
)

Production Features

# Caching (reduce API costs)
prompter = Prompter(model=Invoice, model_id="openai/gpt-4o-mini", cache=True)

# Batch processing
invoices = prompter.predict_batch(documents, max_workers=4)

# Async
invoice = await prompter.apredict(document)

# Confidence scores
result = prompter.predict_with_confidence(document)
if result.confidence > 0.9:
    process(result.data)

Documentation

Full documentation at davidberenstein1957.github.io/dspydantic

License

Apache 2.0

Contributing

Contributions welcome! Open an issue or submit a pull request.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dspydantic-0.1.2.tar.gz (5.4 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dspydantic-0.1.2-py3-none-any.whl (61.1 kB view details)

Uploaded Python 3

File details

Details for the file dspydantic-0.1.2.tar.gz.

File metadata

  • Download URL: dspydantic-0.1.2.tar.gz
  • Upload date:
  • Size: 5.4 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.17 {"installer":{"name":"uv","version":"0.9.17","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for dspydantic-0.1.2.tar.gz
Algorithm Hash digest
SHA256 a3d9b97f937526194a7f88453283fd266f957ddbaf2a02b64ffe559916579e43
MD5 127ea119ab4961608c014a2026ea9834
BLAKE2b-256 f041ebd1a8941cbb9a968eac4184435a9fe0b80531f8f04c47b75dad3bd26406

See more details on using hashes here.

File details

Details for the file dspydantic-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: dspydantic-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 61.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.17 {"installer":{"name":"uv","version":"0.9.17","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for dspydantic-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 b56f3873e16c057d3d2681f3a27ea9479335f08a3ba24536b301116a43c49c30
MD5 19b0aa454518bad10e2ece6fb1f7fa50
BLAKE2b-256 3d7d0e486f23bad6d5a2bc35f2a721a07f8a707cec0a0070e1d43e50886df753

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page