Optimize Pydantic model field descriptions using DSPy

These details have not been verified by PyPI

Project description

🚀 DSPydantic: Auto-Optimize Your Pydantic Models with DSPy

Automatically optimize Pydantic model field descriptions and prompts using DSPy. Get better structured data extraction from LLMs with less manual tuning.

✨ What It Does

Instead of spending hours crafting the perfect field descriptions for your Pydantic models, DSPydantic uses DSPy's optimization algorithms to automatically find the best descriptions based on your examples. Just provide a few examples, and watch your extraction accuracy improve.

🎯 Quick Start

from pydantic import BaseModel, Field
from typing import Literal
from dspydantic import PydanticOptimizer, Example, create_optimized_model

# 1. Define your model (any Pydantic model works)
class TransactionRecord(BaseModel):
    broker: str = Field(description="Financial institution or brokerage firm")
    amount: str = Field(description="Transaction amount with currency")
    security: str = Field(description="Stock, bond, or financial instrument")
    date: str = Field(description="Transaction date")
    transaction_type: Literal["equity", "bond", "option", "future", "forex"] = Field(
        description="Type of financial instrument"
    )

# 2. Provide examples (just input text + expected output)
examples = [
    Example(
        text="Transaction Report: Goldman Sachs processed a $2.5M equity trade for Tesla Inc. on March 15, 2024.",
        expected_output=TransactionRecord(
            broker="Goldman Sachs",
            amount="$2.5M",
            security="Tesla Inc.",
            date="March 15, 2024",
            transaction_type="equity"
        )
    ),
    Example(
        text="JPMorgan executed $500K bond purchase for Apple Corp dated 2024-03-20.",
        expected_output=TransactionRecord(
            broker="JPMorgan",
            amount="$500K",
            security="Apple Corp",
            date="2024-03-20",
            transaction_type="bond"
        )
    ),
]

# 3. Optimize and use
optimizer = PydanticOptimizer(
    model=TransactionRecord,
    examples=examples,
    model_id="gpt-4o",
    system_prompt="You are a financial document analysis assistant.",
    instruction_prompt="Extract transaction details from the financial report.",
)
result = optimizer.optimize() 

OptimizedTransactionRecord = create_optimized_model(
    TransactionRecord,
    result.optimized_descriptions
)
print(result.optimized_descriptions)
print(result.optimized_system_prompt)
print(result.optimized_instruction_prompt)
# Use OptimizedTransactionRecord just like your original model, but with better accuracy!

That's it! Your model now has optimized descriptions that extract data more accurately.

📦 Installation

pip install dspydantic

Or with uv:

uv pip install dspydantic

🌟 Key Features

Auto-optimization: Finds best field descriptions automatically
Simple input: Just examples (text/images/PDFs) + your Pydantic model
Better output: Optimized model ready to use with improved accuracy
Template prompts: Dynamic prompts with {placeholders} for context-aware extraction
Enum & Literal support: Optimize classification models
Multiple formats: Text, images, PDFs—works with any input type
Smart defaults: Auto-selects best optimizer, no configuration needed

📚 Examples

Check out the examples directory for complete working examples:

Veterinary EHR extraction: Extract diseases, ICD-11 labels, and anonymized entities from clinical narratives—real-world medical data extraction
Image classification: Classify MNIST handwritten digits using Literal[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]—demonstrates vision capabilities and Literal type optimization
Text classification: Classify IMDB movie review sentiment with Literal["positive", "negative"] and template prompts—shows dynamic prompt formatting with {review} placeholders
Human-in-the-loop: Interactive evaluation with GUI—get human feedback during optimization

Basic Usage

1. Define Your Pydantic Model

from pydantic import BaseModel, Field
from typing import Literal

class ProductInfo(BaseModel):
    name: str = Field(description="Full product name and model")
    storage: str = Field(description="Storage capacity like 256GB or 1TB")
    processor: str = Field(description="Chip or processor information")
    price: str = Field(description="Product price with currency")
    colors: list[str] = Field(description="Available color options")
    availability: Literal["in_stock", "pre_order", "sold_out"] = Field(
        description="Current availability status"
    )

2. Create Examples

Simple input format—just text + expected output:

from dspydantic import Example

# Plain text input
examples = [
    Example(
        text="iPhone 15 Pro Max with 256GB storage, A17 Pro chip, priced at $1199. Available in titanium and black colors.",
        expected_output=ProductInfo(
            name="iPhone 15 Pro Max",
            storage="256GB",
            processor="A17 Pro chip",
            price="$1199",
            colors=["titanium", "black"],
            availability="in_stock"
        )
    ),
    Example(
        text="MacBook Air M3, 512GB SSD, M3 processor, $1299. Colors: space gray, silver. Currently on pre-order.",
        expected_output=ProductInfo(
            name="MacBook Air M3",
            storage="512GB SSD",
            processor="M3 processor",
            price="$1299",
            colors=["space gray", "silver"],
            availability="pre_order"
        )
    ),
]

# Or use dictionaries for template prompts (see Template Usage section)
# Or use images: Example(image_path="product.png", expected_output=...)
# Or use PDFs: Example(pdf_path="catalog.pdf", expected_output=...)

3. Optimize

from dspydantic import PydanticOptimizer

optimizer = PydanticOptimizer(
    model=ProductInfo,
    examples=examples,
    model_id="gpt-4o"
)

result = optimizer.optimize()  # Returns optimized descriptions

# Access optimized results
result.optimized_descriptions        # dict[str, str] - optimized field descriptions
result.optimized_system_prompt      # str | None - optimized system prompt
result.optimized_instruction_prompt # str | None - optimized instruction prompt

Template Formatting: When using text as a dictionary, instruction prompt templates with placeholders like {key} are automatically formatted with values from each example's text dict. This allows you to create dynamic, example-specific prompts. See the Template Usage section for a complete example.

4. Use Your Optimized Model

Simple output—just use the optimized model like your original:

from dspydantic import create_optimized_model
from openai import OpenAI

# Create optimized model (drop-in replacement)
OptimizedProductInfo = create_optimized_model(
    ProductInfo,
    result.optimized_descriptions
)

# Use with OpenAI structured outputs
client = OpenAI()
messages = []

# Add optimized system prompt if available
if result.optimized_system_prompt:
    messages.append({
        "role": "system",
        "content": result.optimized_system_prompt
    })

# Prepare user content with optimized instruction prompt
user_content = (
    "Samsung Galaxy S24 Ultra, 1TB storage, Snapdragon 8 Gen 3 processor, "
    "$1299. Available in titanium black, titanium gray, and titanium violet. In stock now."
)
if result.optimized_instruction_prompt:
    user_content = f"{result.optimized_instruction_prompt}\n\n{user_content}"

messages.append({
    "role": "user",
    "content": user_content
})

# Call OpenAI API with optimized model
response = client.chat.completions.create(
    model="gpt-4o",
    messages=messages,
    response_format=OptimizedProductInfo
)

# Parse response using the optimized model
product = OptimizedProductInfo.model_validate_json(
    response.choices[0].message.content
)

That's it! Your optimized model extracts data more accurately with zero code changes.

🏭 Real-World Usage Scenarios

Financial Document Processing

class Transaction(BaseModel):
    broker: str = Field(description="Financial institution")
    amount: str = Field(description="Transaction amount")
    security: str = Field(description="Financial instrument")
    transaction_type: Literal["equity", "bond", "option"] = Field(description="Transaction type")

examples = [
    Example(
        text="Goldman Sachs processed a $2.5M equity trade for Tesla Inc. on March 15, 2024.",
        expected_output=Transaction(broker="Goldman Sachs", amount="$2.5M", security="Tesla Inc.", transaction_type="equity")
    ),
    Example(
        text="JPMorgan executed $500K bond purchase for Apple Corp dated 2024-03-20.",
        expected_output=Transaction(broker="JPMorgan", amount="$500K", security="Apple Corp", transaction_type="bond")
    ),
]

Healthcare Information Extraction

from pydantic import BaseModel, Field
from dspydantic import Example

class MedicalRecord(BaseModel):
    patient_name: str = Field(description="Patient name")
    symptoms: list[str] = Field(description="Symptoms")
    medications: list[str] = Field(description="Prescribed medications")

examples = [
    Example(
        text="Patient: Sarah Johnson, 34. Symptoms: chest pain, shortness of breath. Prescribed: Lisinopril 10mg daily.",
        expected_output=MedicalRecord(
            patient_name="Sarah Johnson",
            symptoms=["chest pain", "shortness of breath"],
            medications=["Lisinopril 10mg daily"]
        )
    ),
    Example(
        text="Patient: Michael Chen, 45. Symptoms: headache, fatigue. Prescribed: Ibuprofen 400mg twice daily.",
        expected_output=MedicalRecord(
            patient_name="Michael Chen",
            symptoms=["headache", "fatigue"],
            medications=["Ibuprofen 400mg twice daily"]
        )
    ),
]

Legal Contract Analysis

from pydantic import BaseModel, Field
from typing import Literal
from dspydantic import Example

class ContractAnalysis(BaseModel):
    parties: list[str] = Field(description="Contracting parties")
    effective_date: str = Field(description="Effective date")
    monthly_fee: str = Field(description="Monthly fee")
    contract_type: Literal["service", "employment", "nda"] = Field(description="Contract type")

examples = [
    Example(
        text="Service Agreement between TechCorp LLC and DataSystems Inc., effective January 1, 2024. Monthly fee: $15,000.",
        expected_output=ContractAnalysis(
            parties=["TechCorp LLC", "DataSystems Inc."],
            effective_date="January 1, 2024",
            monthly_fee="$15,000",
            contract_type="service"
        )
    ),
]

Advanced Usage

Other modalities

Working with Images

Just provide image paths + expected output:

from pydantic import BaseModel, Field
from typing import Literal
from dspydantic import Example

class DigitClassification(BaseModel):
    digit: Literal[0, 1, 2, 3, 4, 5, 6, 7, 8, 9] = Field(description="Digit 0-9")

examples = [
    Example(
        image_path="digit_5.png",
        expected_output=DigitClassification(digit=5)
    ),
    Example(
        image_path="digit_3.png",
        expected_output=DigitClassification(digit=3)
    ),
]

Working with PDFs

Just provide PDF paths + expected output:

from pydantic import BaseModel, Field
from dspydantic import Example

class Invoice(BaseModel):
    invoice_number: str = Field(description="Invoice number")
    total_amount: float = Field(description="Total amount of the invoice")
    vendor: str = Field(description="Vendor or supplier name")
    date: str = Field(description="Invoice date")

examples = [
    Example(
        pdf_path="invoice_001.pdf",
        expected_output=Invoice(
            invoice_number="INV-2024-001",
            total_amount=1234.56,
            vendor="Acme Corporation",
            date="2024-03-15"
        )
    ),
]

Optimizing Prompt Templates

Optional system_prompt and instruction_prompt are optimized along with field descriptions. Use template prompts with placeholders that are automatically filled from example data dictionaries:

from pydantic import BaseModel, Field
from typing import Literal
from dspydantic import Example

class ProductReview(BaseModel):
    sentiment: Literal["positive", "negative", "neutral"] = Field(description="Review sentiment")
    rating: int = Field(description="Rating 1-5")
    aspects: list[Literal["camera", "performance", "battery"]] = Field(description="Product aspects")

examples = [
    Example(
        text={"review": "Amazing camera quality and fast performance!", "product": "iPhone 15 Pro", "category": "smartphone"},
        expected_output=ProductReview(sentiment="positive", rating=4, aspects=["camera", "performance"])
    ),
    Example(
        text={"review": "Poor battery life and overpriced.", "product": "Samsung Galaxy S24", "category": "smartphone"},
        expected_output=ProductReview(sentiment="negative", rating=2, aspects=["battery"])
    ),
]

# Template prompts with {placeholders} are automatically filled from dict keys
optimizer = PydanticOptimizer(
    model=ProductReview,
    examples=examples,
    system_prompt="You are an expert analyst specializing in {category} reviews.",
    instruction_prompt="Analyze the {category} review about {product}: {review}",
    model_id="gpt-4o"
)
result = optimizer.optimize()
# Access: result.optimized_system_prompt, result.optimized_instruction_prompt, result.optimized_descriptions

Placeholders like {category}, {product}, {review} are automatically filled from each example's text dictionary. Both prompts are optimized along with field descriptions.

Working with Enums and Literals

Literal and Enum types work automatically and are taken into account for optimization.

from pydantic import BaseModel, Field
from typing import Literal
from dspydantic import Example

class ReviewAnalysis(BaseModel):
    sentiment: Literal["positive", "negative", "neutral"] = Field(description="Review sentiment")
    aspects: list[Literal["camera", "performance", "battery", "display", "price"]] = Field(description="Product aspects")
    rating: int = Field(description="Rating 1-5")

examples = [
    Example(
        text="Great camera quality and amazing performance. Overall 4/5.",
        expected_output=ReviewAnalysis(sentiment="positive", aspects=["camera", "performance"], rating=4)
    ),
    Example(
        text="Poor display quality and overpriced. Not worth it. Rating: 2 stars.",
        expected_output=ReviewAnalysis(sentiment="negative", aspects=["display", "price"], rating=2)
    ),
]

Excluding Fields from Evaluation

If you have fields that shouldn't affect the evaluation score (e.g., metadata, timestamps, or fields you're not optimizing), you can exclude them:

from pydantic import BaseModel, Field

class PatientRecord(BaseModel):
    patient_name: str = Field(description="Patient full name")
    urgency: Literal["low", "medium", "high", "critical"] = Field(
        description="Urgency level of the case"
    )
    diagnosis: str = Field(description="Primary diagnosis")
    metadata: str = Field(description="Internal metadata")  # Not important for evaluation
    timestamp: str = Field(description="Record timestamp")  # Not important for evaluation

optimizer = PydanticOptimizer(
    model=PatientRecord,
    examples=examples,
    model_id="gpt-4o",
    exclude_fields=["metadata", "timestamp"],  # These fields won't affect scoring
)
result = optimizer.optimize()

Excluded fields will still be extracted by the model, but they won't be included in the evaluation score calculation. This is useful when you have fields that are not critical for optimization or that you don't want to optimize for.

Nested Models

Nested models work automatically and are taken into account for optimization. Field paths like "address.street" are handled automatically:

from pydantic import BaseModel, Field
from dspydantic import Example

class Address(BaseModel):
    street: str = Field(description="Street")
    city: str = Field(description="City")
    zip_code: str = Field(description="ZIP code")

class Customer(BaseModel):
    name: str = Field(description="Name")
    address: Address = Field(description="Address")

examples = [
    Example(
        text="Jane Smith, 456 Oak Ave, San Francisco, CA 94102",
        expected_output=Customer(
            name="Jane Smith",
            address=Address(street="456 Oak Ave", city="San Francisco", zip_code="94102")
        )
    ),
]

Evaluation Options

Built-in Evaluation

Use built-in options: "exact", "levenshtein", "exact-hitl", "levenshtein-hitl":

from dspydantic import PydanticOptimizer

optimizer = PydanticOptimizer(
    model=PatientRecord,
    examples=examples,
    evaluate_fn="exact",  # or "levenshtein" for fuzzy matching
    model_id="gpt-4o"
)

Custom Evaluation Function

from dspydantic import PydanticOptimizer

def evaluate(example, optimized_descriptions, system_prompt, instruction_prompt) -> float:
    # Returns score 0.0 to 1.0
    return 0.85

optimizer = PydanticOptimizer(model=Customer, examples=examples, evaluate_fn=evaluate, model_id="gpt-4o")

LLM Judge (No Expected Output)

When expected_output is None, use an LLM as a judge for unlabeled data:

from pydantic import BaseModel, Field
from typing import Literal
from dspydantic import Example, PydanticOptimizer
import dspy

class Transaction(BaseModel):
    broker: str = Field(description="Financial institution")
    amount: str = Field(description="Transaction amount")
    security: str = Field(description="Financial instrument")
    transaction_type: Literal["equity", "bond", "option"] = Field(description="Transaction type")

examples = [
    Example(text="Goldman Sachs processed a $2.5M equity trade for Tesla Inc.", expected_output=None),
    Example(text="JPMorgan executed $500K bond purchase for Apple Corp.", expected_output=None),
]

# Uses model_id LLM as judge by default
optimizer = PydanticOptimizer(model=Transaction, examples=examples, model_id="gpt-4o")

# Or use a separate judge LLM
judge_lm = dspy.LM("gpt-4", api_key="your-api-key")
optimizer = PydanticOptimizer(model=Transaction, examples=examples, evaluate_fn=judge_lm, model_id="gpt-4o-mini")

Optimizer Selection

Auto-selects optimizer based on dataset size, or specify manually:

Auto (default): < 20 examples → BootstrapFewShot, >= 20 examples → BootstrapFewShotWithRandomSearch
Manual: Pass string ("miprov2", "gepa", "copro", etc.) or Teleprompter instance

from dspydantic import PydanticOptimizer
from dspy.teleprompt import MIPROv2

# Auto-select (default)
optimizer = PydanticOptimizer(model=PatientRecord, examples=examples, model_id="gpt-4o")

# Manual selection
optimizer = PydanticOptimizer(model=PatientRecord, examples=examples, optimizer="miprov2", model_id="gpt-4o")

# Custom optimizer instance
custom_optimizer = MIPROv2(metric=my_metric, num_threads=8)
optimizer = PydanticOptimizer(model=PatientRecord, examples=examples, optimizer=custom_optimizer)

API Reference

`PydanticOptimizer`

Main optimizer class.

Parameters:

model (type[BaseModel]): Pydantic model class to optimize
examples (list[Example]): Examples for optimization (typically 5-20)
evaluate_fn (Callable | dspy.LM | str | None): Evaluation function, built-in ("exact", "levenshtein", "exact-hitl", "levenshtein-hitl"), or dspy.LM instance
system_prompt (str | None): Optional system prompt to optimize
instruction_prompt (str | None): Optional instruction prompt to optimize (supports {placeholders})
lm (dspy.LM | None): Optional DSPy LM instance (overrides model_id/api_key)
model_id (str): LLM model ID (default: "gpt-4o")
api_key (str | None): API key (default: from OPENAI_API_KEY env var)
api_base (str | None): API base URL (for Azure OpenAI)
api_version (str | None): API version (for Azure OpenAI)
num_threads (int): Optimization threads (default: 4)
init_temperature (float): Initial temperature (default: 1.0)
verbose (bool): Print progress (default: False)
optimizer (str | Teleprompter | None): Optimizer name or instance (auto-selects if None)
train_split (float): Training split fraction (default: 0.8)
optimizer_kwargs (dict[str, Any] | None): Additional kwargs for optimizer
exclude_fields (list[str] | None): Field names to exclude from evaluation

Returns: OptimizationResult with optimized descriptions, prompts, and metrics

`Example`

Example data for optimization.

Parameters:

expected_output (dict | BaseModel | None): Expected output (Pydantic model or dict). If None, uses LLM judge
text (str | dict | None): Plain text input or dict for template prompts
image_path (str | Path | None): Path to image file
image_base64 (str | None): Base64-encoded image
pdf_path (str | Path | None): Path to PDF file
pdf_dpi (int): DPI for PDF conversion (default: 300)

Examples:

# Text input
Example(text="Goldman Sachs processed $2.5M equity trade", expected_output=Transaction(...))

# Image input
Example(image_path="report.png", expected_output=Transaction(...))

# PDF input
Example(pdf_path="statement.pdf", expected_output=Transaction(...))

# Template prompt (dict)
Example(text={"report": "...", "date": "..."}, expected_output=Transaction(...))

# LLM judge (no expected_output)
Example(text="...", expected_output=None)

`create_optimized_model(model, optimized_descriptions)`

Create a new Pydantic model class with optimized descriptions.

Parameters:

model (type[BaseModel]): Original Pydantic model
optimized_descriptions (dict[str, str]): From result.optimized_descriptions

Returns: type[BaseModel] - New model class with optimized descriptions

Example:

OptimizedTransaction = create_optimized_model(Transaction, result.optimized_descriptions)
response = client.chat.completions.create(model="gpt-4o", messages=messages, response_format=OptimizedTransaction)

`apply_optimized_descriptions(model, optimized_descriptions)`

Get optimized JSON schema without creating a new model class.

Parameters:

model (type[BaseModel]): Original Pydantic model
optimized_descriptions (dict[str, str]): From result.optimized_descriptions

Returns: dict - JSON schema with optimized descriptions

Example:

optimized_schema = apply_optimized_descriptions(ProductInfo, result.optimized_descriptions)
response = client.chat.completions.create(
    model="gpt-4o",
    messages=messages,
    response_format={"type": "json_schema", "json_schema": {"schema": optimized_schema}}
)

License

Apache 2.0

Contributing

Contributions are welcome! Please open an issue or submit a pull request.

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.1.6

Mar 20, 2026

0.1.5

Mar 20, 2026

0.1.4

Mar 18, 2026

0.1.3

Mar 18, 2026

0.1.2

Mar 16, 2026

0.1.1

Jan 30, 2026

0.1

Jan 27, 2026

This version

0.0.7

Dec 10, 2025

0.0.6

Dec 9, 2025

0.0.5

Dec 9, 2025

0.0.4

Dec 6, 2025

0.0.3

Dec 5, 2025

0.0.2

Dec 5, 2025

0.0.1

Dec 5, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dspydantic-0.0.7.tar.gz (232.3 kB view details)

Uploaded Dec 10, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

dspydantic-0.0.7-py3-none-any.whl (49.6 kB view details)

Uploaded Dec 10, 2025 Python 3

File details

Details for the file dspydantic-0.0.7.tar.gz.

File metadata

Download URL: dspydantic-0.0.7.tar.gz
Upload date: Dec 10, 2025
Size: 232.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.5.18

File hashes

Hashes for dspydantic-0.0.7.tar.gz
Algorithm	Hash digest
SHA256	`cbe9b5b767030bc887c2d19b2133792aaf3d50260057c20d31bb6f6a67a15f03`
MD5	`9dccfe4c227f4017c94ff2957747f7fb`
BLAKE2b-256	`4073cc394c5a3e845fb9c55990c467b18c66836e75e29c3147d7272b72baa853`

See more details on using hashes here.

File details

Details for the file dspydantic-0.0.7-py3-none-any.whl.

File metadata

Download URL: dspydantic-0.0.7-py3-none-any.whl
Upload date: Dec 10, 2025
Size: 49.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.5.18

File hashes

Hashes for dspydantic-0.0.7-py3-none-any.whl
Algorithm	Hash digest
SHA256	`46c0b103943d2d66980b42185ad6d4b1818e5e968c4be6c0e0a0965c0c3b69f6`
MD5	`dcf5f6a3bfaa4b1bdaee02b8855b5317`
BLAKE2b-256	`6ba4b5aca839c658ca1aa6872ac3d6a5a7680c3992a0dfb74430f486d2849c58`

See more details on using hashes here.

dspydantic 0.0.7

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

🚀 DSPydantic: Auto-Optimize Your Pydantic Models with DSPy

✨ What It Does

🎯 Quick Start

📦 Installation

🌟 Key Features

📚 Examples

Basic Usage

1. Define Your Pydantic Model

2. Create Examples

3. Optimize

4. Use Your Optimized Model

🏭 Real-World Usage Scenarios

Financial Document Processing

Healthcare Information Extraction

Legal Contract Analysis

Advanced Usage

Other modalities

Working with Images

Working with PDFs

Optimizing Prompt Templates

Working with Enums and Literals

Excluding Fields from Evaluation

Nested Models

Evaluation Options

Built-in Evaluation

Custom Evaluation Function

LLM Judge (No Expected Output)

Optimizer Selection

API Reference

PydanticOptimizer

Example

create_optimized_model(model, optimized_descriptions)

apply_optimized_descriptions(model, optimized_descriptions)

License

Contributing

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

`PydanticOptimizer`

`Example`

`create_optimized_model(model, optimized_descriptions)`

`apply_optimized_descriptions(model, optimized_descriptions)`