A library for optimizing PydanticAI agents prompts through iterative improvement and evaluation, built on top of PydanticAI + Pydantic Evals.
Project description
PydanticAI Optimizers
⚠️ Super Opinionated: This library is specifically built on top of PydanticAI + Pydantic Evals. If you don't use both together, this is useless to you.
A Python library for systematically improving PydanticAI agent prompts through iterative optimization. Heavily inspired by the GEPA paper with practical extensions for prompt optimization when switching model classes or providers.
Acknowledgments
This work builds upon the excellent research in "GEPA: Reflective Prompt Evolution Can Outperform Reinforcement Learning" by Agrawal et al. We're grateful for their foundational work on reflective prompt evolution and have adapted (some of) their methodology with several practical tweaks for the PydanticAI ecosystem.
Why this exists: Every time you switch model classes (GPT-4.1 → GPT-5 → Claude Sonnet 4) or providers, your prompting needs change. Instead of manually tweaking prompts each time, this automates the optimization process for your existing PydanticAI agents with minimal effort.
What It Does
This library optimizes prompts by:
- Mini-batch Testing: Each candidate prompt is tested against a small subset of cases to see if it beats its parent before full evaluation
- Individual Case Tracking: Performance on each test case is tracked, enabling weighted sampling that favors prompts that win on more individual cases
- Memory for Failed Attempts: When optimization gets stuck (children keep failing mini-batch tests), the system provides previous failed attempts to the reflection agent with the message: "You've tried these approaches and they didn't work - think outside the box!"
The core insight is that you don't lose learning between iterations, and the weighted sampling based on individual case win rates helps explore more diverse and effective prompt variations.
Quick Start
Installation
uv sync
Or run an example directly from the project root:
uv run examples/chef/optimize.py
uv run examples/customer_support/optimize.py
Run the Chef Example
uv run examples/chef/optimize.py
Run the Customer Support Example
uv run examples/customer_support/optimize.py
Repository Structure
.
├── src/pydantic_ai_optimizers/
│ ├── agents/
│ │ └── reflection_agent.py
│ ├── optimizer.py
│ ├── config.py
│ └── cli.py
├── examples/
│ ├── chef/
│ └── customer_support/
├── tests/
└── docs/
This will optimize a chef assistant prompt that helps users find recipes while avoiding allergens. You'll see the optimization process with real-time feedback and the final best prompt.
Basic Usage in Your Project
from pydantic_ai_optimizers import Optimizer, make_reflection_agent
from your_domain import create_your_agent, build_dataset, YourInputType, YourOutputType
# CRITICAL: Define your run_case function with the correct signature
async def run_case(prompt_file: str, user_input: YourInputType) -> YourOutputType:
"""
Run your agent with a specific prompt file and user input.
Args:
prompt_file: ABSOLUTE path to the prompt file (e.g., "/path/to/prompts/candidate_001.txt")
user_input: The input from your dataset cases
Returns:
The agent's output that will be evaluated
"""
# Load the prompt and create agent
agent = create_your_agent(prompt_file=prompt_file, model="your-model")
result = await agent.run(user_input.message) # Or however you pass inputs
return result.output
# Set up your dataset
dataset = build_dataset("your_cases.json")
# Optional: Customize the reflection agent
reflection_agent = make_reflection_agent(
model="openai:gpt-5-mini", # Use a different model
special_instructions="Focus on conciseness and clarity" # Add custom instructions
)
# Or use the default: reflection_agent = None (will use make_reflection_agent() internally)
# Create optimizer
optimizer = Optimizer(
dataset=dataset,
run_case=run_case, # Your async function with the signature above
reflection_agent=reflection_agent, # Optional, uses default if None
)
# Run optimization
best = await optimizer.optimize(
seed_prompt_file=Path("prompts/seed.txt"),
full_validation_budget=20
)
print(f"Best prompt: {best.prompt_path}")
How It Works
1. Start with a Seed Prompt
The optimizer begins with your initial prompt and evaluates it on all test cases.
2. Mini-batch Gating (Key Innovation #1)
- Select a parent prompt using weighted sampling (prompts that win more individual cases are more likely to be selected)
- Generate a new candidate through reflection on failed cases
- Test the candidate on a small mini-batch of cases
- Only if it beats the parent on the mini-batch does it get added to the candidate pool
3. Individual Case Performance Tracking (Key Innovation #2)
- Track which prompt wins each individual test case
- Use this for Pareto-efficient weighted sampling of parents
- This ensures diverse exploration and prevents getting stuck in local optima
4. Memory for Failed Attempts (Our Addition)
- When candidates keep failing mini-batch tests, record the failed attempts
- Provide these to the reflection agent as context: "Here's what you've tried that didn't work"
- This increases pressure over time to try more creative approaches when stuck
Creating Your Own Optimization
1. Set Up Your Domain
Copy the examples/chef/ structure:
your_domain/
├── agent.py # Your complete agent (tools, setup, everything)
├── optimize.py # Your evaluation logic + optimization loop
├── data/ # Your domain data
└── prompts/ # Seed prompt and reflection instructions
2. Implement Required Functions
Agent (agent.py):
# CRITICAL: Your run_case function must have this exact signature
async def run_case(prompt_file: str, user_input: YourInputType) -> YourOutputType:
"""
Run your agent with a specific prompt file and user input.
Args:
prompt_file: ABSOLUTE path to the prompt file (optimizer passes full paths)
user_input: Input from your dataset cases (your domain-specific type)
Returns:
Agent output that matches your evaluators' expectations
"""
# Example implementation:
agent = create_your_agent(prompt_file=prompt_file, model="gpt-4")
result = await agent.run(user_input.message)
return result.output
# Optional: Customize the reflection agent
# If you don't provide one, the optimizer uses make_reflection_agent() internally
def create_custom_reflection_agent():
from pydantic_ai_optimizers import make_reflection_agent
return make_reflection_agent(
model="gpt-4o", # Your preferred model for reflection
special_instructions="""
Focus on:
- Brevity and clarity
- Domain-specific accuracy
- Better error handling
""" # Custom instructions for prompt improvement
)
Optimization (optimize.py):
from pydantic_ai_optimizers import Optimizer, make_reflection_agent
from pathlib import Path
def build_dataset(cases_file):
# Load test cases and evaluators using pydantic-evals
# Return dataset that can evaluate your agent's outputs
pass
async def main():
# Set up dataset
dataset = build_dataset("cases.yaml")
# Your run_case function (defined above)
# No need to wrap it - pass it directly
# Optional: Use custom reflection agent
reflection_agent = make_reflection_agent(
model="gpt-4o",
special_instructions="Focus on accuracy and brevity"
)
# Or use default: reflection_agent = None
# Create optimizer
optimizer = Optimizer(
dataset=dataset,
run_case=run_case, # Your async function
reflection_agent=reflection_agent, # Optional
pool_dir=Path("prompt_pool"),
minibatch_size=4,
max_pool_size=16,
)
# Run optimization
best = await optimizer.optimize(
seed_prompt_file=Path("prompts/seed.txt"),
full_validation_budget=20
)
print(f"Best prompt saved to: {best.prompt_path}")
if __name__ == "__main__":
import asyncio
asyncio.run(main())
3. Run Optimization
python optimize.py
Key Integrations
This library is designed to work seamlessly with:
textprompts
Makes it easy to use standard text files with placeholders for prompt evolution. Perfect for diffing prompts and version control:
# In your prompt file:
"You are a {role}. Your task is to {task}..."
# textprompts handles loading and placeholder substitution
prompt = textprompts.load_prompt("my_prompt.txt", role="chef", task="find recipes")
pydantic-ai-helpers
Provides utilities that make PydanticAI much more convenient:
- Quick tool parsing and setup
- Simple evaluation comparisons between outputs and expected results
- Streamlined agent configuration
These integrations save significant development time when building optimization pipelines.
Reflection Agent Options
The optimizer uses a reflection agent to generate improved prompts based on evaluation feedback. You have several options:
Use Default Reflection Agent
# Pass None or omit the parameter - uses make_reflection_agent() with defaults
optimizer = Optimizer(
dataset=dataset,
run_case=run_case,
# reflection_agent=None, # Uses default
)
Customize the Model
from pydantic_ai_optimizers import make_reflection_agent
# Use a different model for reflection
reflection_agent = make_reflection_agent(model="openai:gpt-5-mini")
optimizer = Optimizer(
dataset=dataset,
run_case=run_case,
reflection_agent=reflection_agent,
)
Add Special Instructions (e.g., GPT-5 prompting tips)
import textprompts
from pathlib import Path
from pydantic_ai_optimizers import make_reflection_agent
# Load GPT-5 prompting tips from a file and pass to the reflection agent
tips = str(textprompts.load_prompt(
Path("examples/customer_support/prompts/gpt5_tips.txt")
))
reflection_agent = make_reflection_agent(
model="openai:gpt-5-mini",
special_instructions=tips,
)
optimizer = Optimizer(
dataset=dataset,
run_case=run_case,
reflection_agent=reflection_agent,
)
Bring Your Own Reflection Agent
from pydantic_ai import Agent
# Create completely custom reflection agent
reflection_agent = Agent(
model="your-model",
instructions="Your custom reflection instructions..."
)
optimizer = Optimizer(
dataset=dataset,
run_case=run_case,
reflection_agent=reflection_agent,
)
Configuration
Set up through environment variables or configuration files:
export OPENAI_API_KEY="your-key"
export REFLECTION_MODEL="openai:gpt-5"
export AGENT_MODEL="openai:gpt-5-nano"
export VALIDATION_BUDGET=20
export MAX_POOL_SIZE=16
Development
# Install with dev dependencies
uv pip install -e ".[dev]"
# Run tests
make test
# Format and lint
make format && make lint
# Type check
make type-check
Why This Approach Works
The combination of mini-batch gating and individual case tracking prevents two common optimization problems:
- Expensive Evaluation: Mini-batches mean you only do full evaluation on promising candidates
- Premature Convergence: Weighted sampling based on individual case wins maintains diversity
The memory system addresses a key weakness in memoryless optimization: when you get stuck, the system learns from its failures and tries more creative approaches.
License
MIT License
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pydantic_ai_optimizers-0.0.2.tar.gz.
File metadata
- Download URL: pydantic_ai_optimizers-0.0.2.tar.gz
- Upload date:
- Size: 14.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.7.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a4fa52c156f6b83988482551935f8c70904363984e691448df9eea788b6d7e0c
|
|
| MD5 |
f2ffd514d05e46476c26d260b3bf2ce1
|
|
| BLAKE2b-256 |
00df1f9da97b7da94918a1a692ed38da2c47b3d7e1f3a6268ae3d33b26b13f1a
|
File details
Details for the file pydantic_ai_optimizers-0.0.2-py3-none-any.whl.
File metadata
- Download URL: pydantic_ai_optimizers-0.0.2-py3-none-any.whl
- Upload date:
- Size: 17.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.7.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
68b13e0973386da727cca04447ba2de9a9ed7152136e593e62829c3717469a87
|
|
| MD5 |
e182b17b6732845d90ac71bd670066ef
|
|
| BLAKE2b-256 |
41034cd5403cfab39e8d67d9bf5559e1b0ed3a14c4d9a2d8f8f451ddd4586e6e
|