Skip to main content

Optimize Pydantic model field descriptions using DSPy

Project description

DSPydantic

Stop manually tuning prompts. Let your data optimize them.

DSPydantic automatically optimizes your Pydantic model prompts and field descriptions using DSPy. Extract structured data from text, images, and PDFs with higher accuracy and less effort.

PyPI License Documentation

The Problem

You've defined a Pydantic model. You're using an LLM to extract data. But:

  • Your prompts are guesswork—trial and error until something works
  • Accuracy varies wildly depending on input phrasing
  • Every new use case means more manual prompt engineering

The Solution

DSPydantic takes your examples and automatically finds the best prompts for your use case:

from pydantic import BaseModel, Field
from dspydantic import Prompter, Example

class Invoice(BaseModel):
    vendor: str = Field(description="Company that issued the invoice")
    total: str = Field(description="Total amount due")
    due_date: str = Field(description="Payment due date")

prompter = Prompter(model=Invoice, model_id="openai/gpt-4o-mini")

# Optimize with examples
result = prompter.optimize(examples=[
    Example(
        text="Invoice from Acme Corp. Total: $1,250.00. Due: March 15, 2024.",
        expected_output={"vendor": "Acme Corp", "total": "$1,250.00", "due_date": "March 15, 2024"}
    ),
])

# Extract with optimized prompts
invoice = prompter.run("Consolidated Energy Partners | Invoice Total $3,200 | Due 2024-05-30")

Typical improvement: 10-30% higher accuracy with the same LLM.

Installation

pip install dspydantic

Quick Start

Extract Data (No Optimization)

For simple cases, extract immediately:

from pydantic import BaseModel, Field
from dspydantic import Prompter

class Contact(BaseModel):
    name: str = Field(description="Person's full name")
    email: str = Field(description="Email address")

prompter = Prompter(model=Contact, model_id="openai/gpt-4o-mini")

contact = prompter.run("Reach out to Sarah Chen at sarah.chen@techcorp.io")
# Contact(name='Sarah Chen', email='sarah.chen@techcorp.io')

Optimize for Better Accuracy

When accuracy matters, optimize with examples:

from dspydantic import Example

examples = [
    Example(text="...", expected_output={...}),
    # 5-20 examples typically enough
]

result = prompter.optimize(examples=examples, verbose=True)
print(f"Accuracy: {result.baseline_score:.0%}{result.optimized_score:.0%}")

Monitor progress in real-time with verbose=True to see:

  • Rich-formatted optimization progress
  • Actual optimized descriptions after each field optimization
  • Final summary with scores, API calls, and token usage

By default, optimization uses single-pass mode: one DSPy compile for all fields with reduced demo budgets for maximum speed. For better quality at the cost of more API calls, use sequential=True to optimize each field description independently (deepest-nested first), then prompts. With parallel_fields=True (default), fields are optimized in parallel for speed.

Deploy to Production

# Save optimized prompter
prompter.save("./invoice_prompter")

# Load in production
prompter = Prompter.load("./invoice_prompter", model=Invoice, model_id="openai/gpt-4o-mini")
invoice = prompter.run(new_document)

Why DSPydantic?

Feature DSPydantic Manual Prompting
Automatic optimization ✅ Data-driven ❌ Trial and error
Pydantic native ✅ Full type safety ⚠️ JSON only
Multi-modal ✅ Text, images, PDFs ⚠️ Text only
Production ready ✅ Save/load, batch, async ❌ Manual
Confidence scores ✅ Per-extraction ❌ No

Built on: DSPy (Stanford's optimization framework) + Pydantic (Python data validation)

Input Types

# Text
Example(text="Invoice from Acme...", expected_output={...})

# Images
Example(image_path="receipt.png", expected_output={...})

# PDFs
Example(pdf_path="contract.pdf", expected_output={...})

Optimization Options

# Focus on specific fields only
result = prompter.optimize(
    examples=examples,
    include_fields=["address", "total"],  # Only optimize these
)

# Exclude fields from scoring (still extracted)
result = prompter.optimize(
    examples=examples,
    exclude_fields=["metadata", "timestamp"],
)

# Sequential mode (field-by-field optimization)
result = prompter.optimize(
    examples=examples,
    sequential=True,
)

# Parallel field optimization (sequential mode with parallelization)
result = prompter.optimize(
    examples=examples,
    sequential=True,
    parallel_fields=True,
)

# Reduce validation set size for faster optimization
result = prompter.optimize(
    examples=examples,
    max_val_examples=5,
)

Production Features

# Caching (reduce API costs)
prompter = Prompter(model=Invoice, model_id="openai/gpt-4o-mini", cache=True)

# Batch processing
invoices = prompter.predict_batch(documents, max_workers=4)

# Async
invoice = await prompter.apredict(document)

# Confidence scores
result = prompter.predict_with_confidence(document)
if result.confidence > 0.9:
    process(result.data)

Documentation

Full documentation at davidberenstein1957.github.io/dspydantic

License

Apache 2.0

Contributing

Contributions welcome! Open an issue or submit a pull request.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dspydantic-0.1.6.tar.gz (5.4 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dspydantic-0.1.6-py3-none-any.whl (67.6 kB view details)

Uploaded Python 3

File details

Details for the file dspydantic-0.1.6.tar.gz.

File metadata

  • Download URL: dspydantic-0.1.6.tar.gz
  • Upload date:
  • Size: 5.4 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.17 {"installer":{"name":"uv","version":"0.9.17","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for dspydantic-0.1.6.tar.gz
Algorithm Hash digest
SHA256 c253226e6de5c33dc269544217f45164beb534c19bba769ba06696a6f3ac3364
MD5 e01dc78b5df46637c81b63b035ff82e9
BLAKE2b-256 b8f63a51e4636449e3108ec105a3fa6d17381a67717b3ade105779b28aee2acc

See more details on using hashes here.

File details

Details for the file dspydantic-0.1.6-py3-none-any.whl.

File metadata

  • Download URL: dspydantic-0.1.6-py3-none-any.whl
  • Upload date:
  • Size: 67.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.17 {"installer":{"name":"uv","version":"0.9.17","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for dspydantic-0.1.6-py3-none-any.whl
Algorithm Hash digest
SHA256 d129f71357dac6e8805413da25d3a6366fafb43862d5867d9a5031c25caac0e3
MD5 a926ae818d765ef1d670418d1a9194c1
BLAKE2b-256 91705589adce71a01a5920349af2c786990b6342be3c68c4d34e0d6cf35a6678

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page