Skip to main content

Optimize Pydantic model field descriptions using DSPy

Project description

DSPydantic

mStop manually tuning prompts. Let your data optimize them.

DSPydantic automatically optimizes your Pydantic model prompts and field descriptions using DSPy. Extract structured data from text, images, and PDFs with higher accuracy and less effort.

PyPI License Documentation

The Problem

You've defined a Pydantic model. You're using an LLM to extract data. But:

  • Your prompts are guesswork—trial and error until something works
  • Accuracy varies wildly depending on input phrasing
  • Every new use case means more manual prompt engineering

The Solution

DSPydantic takes your examples and automatically finds the best prompts for your use case:

from pydantic import BaseModel, Field
from dspydantic import Prompter, Example

class Invoice(BaseModel):
    vendor: str = Field(description="Company that issued the invoice")
    total: str = Field(description="Total amount due")
    due_date: str = Field(description="Payment due date")

prompter = Prompter(model=Invoice, model_id="openai/gpt-4o-mini")

# Optimize with examples
result = prompter.optimize(examples=[
    Example(
        text="Invoice from Acme Corp. Total: $1,250.00. Due: March 15, 2024.",
        expected_output={"vendor": "Acme Corp", "total": "$1,250.00", "due_date": "March 15, 2024"}
    ),
])

# Extract with optimized prompts
invoice = prompter.run("Consolidated Energy Partners | Invoice Total $3,200 | Due 2024-05-30")

Typical improvement: 10-30% higher accuracy with the same LLM.

Installation

pip install dspydantic

Quick Start

Extract Data (No Optimization)

For simple cases, extract immediately:

from pydantic import BaseModel, Field
from dspydantic import Prompter

class Contact(BaseModel):
    name: str = Field(description="Person's full name")
    email: str = Field(description="Email address")

prompter = Prompter(model=Contact, model_id="openai/gpt-4o-mini")

contact = prompter.run("Reach out to Sarah Chen at sarah.chen@techcorp.io")
# Contact(name='Sarah Chen', email='sarah.chen@techcorp.io')

Optimize for Better Accuracy

When accuracy matters, optimize with examples:

from dspydantic import Example

examples = [
    Example(text="...", expected_output={...}),
    # 5-20 examples typically enough
]

result = prompter.optimize(examples=examples)
print(f"Accuracy: {result.baseline_score:.0%}{result.optimized_score:.0%}")

Deploy to Production

# Save optimized prompter
prompter.save("./invoice_prompter")

# Load in production
prompter = Prompter.load("./invoice_prompter", model=Invoice, model_id="openai/gpt-4o-mini")
invoice = prompter.run(new_document)

Why DSPydantic?

Feature DSPydantic Manual Prompting
Automatic optimization ✅ Data-driven ❌ Trial and error
Pydantic native ✅ Full type safety ⚠️ JSON only
Multi-modal ✅ Text, images, PDFs ⚠️ Text only
Production ready ✅ Save/load, batch, async ❌ Manual
Confidence scores ✅ Per-extraction ❌ No

Built on: DSPy (Stanford's optimization framework) + Pydantic (Python data validation)

Input Types

# Text
Example(text="Invoice from Acme...", expected_output={...})

# Images
Example(image_path="receipt.png", expected_output={...})

# PDFs
Example(pdf_path="contract.pdf", expected_output={...})

Production Features

# Caching (reduce API costs)
prompter = Prompter(model=Invoice, model_id="openai/gpt-4o-mini", cache=True)

# Batch processing
invoices = prompter.predict_batch(documents, max_workers=4)

# Async
invoice = await prompter.apredict(document)

# Confidence scores
result = prompter.predict_with_confidence(document)
if result.confidence > 0.9:
    process(result.data)

Documentation

Full documentation at davidberenstein1957.github.io/dspydantic

License

Apache 2.0

Contributing

Contributions welcome! Open an issue or submit a pull request.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dspydantic-0.1.1.tar.gz (5.4 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dspydantic-0.1.1-py3-none-any.whl (58.4 kB view details)

Uploaded Python 3

File details

Details for the file dspydantic-0.1.1.tar.gz.

File metadata

  • Download URL: dspydantic-0.1.1.tar.gz
  • Upload date:
  • Size: 5.4 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.17 {"installer":{"name":"uv","version":"0.9.17","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for dspydantic-0.1.1.tar.gz
Algorithm Hash digest
SHA256 8a2fc09ea4424eb91b20394756fb0fded5ce799593b54fe131bdc1a59664b400
MD5 a735f9ae657973650d7ca9c988257d49
BLAKE2b-256 02f969fadfaf01b951ce12d9a279d3895b6033667c2e3dea6938e188f387c047

See more details on using hashes here.

File details

Details for the file dspydantic-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: dspydantic-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 58.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.17 {"installer":{"name":"uv","version":"0.9.17","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for dspydantic-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 e122a86f887c2d34d609c4d7eea67f5940432e56a7879b386aca8a1a5e46e0cc
MD5 fb96b2d5b82df3391f077d665399b2a4
BLAKE2b-256 2a49f7dfd20b06290e9979c48d6e418dd48f515eb9d9d2ae0216a65136a4010a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page