Python-native prompt evaluation tool using PydanticAI
Project description
Promptdev
promptdev is a prompt evaluation framework that provides comprehensive testing for AI agents across multiple providers.
[!WARNING]
promptdev is in preview and is not ready for production use.
We're working hard to make it stable and feature-complete, but until then, expect to encounter bugs, missing features, and fatal errors.
Features
- 🔒 Type Safe - Full Pydantic validation for inputs, outputs, and configurations
- 🤖 PydanticAI Integration - Native support for PydanticAI agents (in progress) and evaluation framework
- 📊 Multi-Provider Testing - Test across OpenAI, Together.ai, Ollama, Bedrock, and more
- ⚡ Performance Optimized - File-based caching with TTL for faster repeated evaluations
- 📈 Rich Reporting - Beautiful console output with detailed failure analysis and provider comparisons
- 🧪 Promptfoo Compatible - Works with (some) existing promptfoo YAML configs and datasets
- 🎯 Comprehensive Assertions - Built-in evaluators plus custom Python assertion support
Quick Start
Installation
From PyPI (alpha version)
pip install promptdev --pre
From Source
git clone https://github.com/artefactop/promptdev.git
cd promptdev
pip install -e .
For Development
git clone https://github.com/artefactop/promptdev.git
cd promptdev
uv sync
uv run promptdev --help
Basic Usage
If installed via pip:
# Run evaluation (simple demo)
promptdev eval examples/demo/config.yaml
# Run evaluation (advanced example)
promptdev eval examples/calendar_event_summary/config.yaml
# Disable caching for a run
promptdev eval examples/demo/config.yaml --no-cache
# Export results
promptdev eval examples/demo/config.yaml --output json
promptdev eval examples/demo/config.yaml --output html
# Validate configuration
promptdev validate examples/demo/config.yaml
# Cache management
promptdev cache stats
promptdev cache clear
If running from source:
uv run promptdev --help
Assertion Types
Promptdev supports a comprehensive set of evaluators for different testing scenarios:
| Type | Description |
|---|---|
equals |
Checks if the output exactly equals the provided value |
contains |
Checks if the output contains the expected output |
is_instance |
Checks if the output is an instance of a type with the given name |
max_duration |
Checks if the execution time is under the specified maximum |
is_json |
Checks if the output is a valid JSON string (optional json schema validation) |
contains_json |
Checks if the output contains a valid json (optional json schema validation) |
python |
Promptfoo compatible Allows you to provide a custom Python function to validate the LLM output |
Configuration
Promptdev uses YAML configuration files compatible with Promptfoo format, but only a subset is available for now:
Promptfoo Compatibility
Promptdev maintains compatibility with promptfoo configurations to ease migration:
To migrate if you are using ids with format
provider:chat|completion:model, just remove the middle partprovider:model, promptdev only supports chat.Some provider name can change for example
togetheraiis nowtogeher. Refer to pydantic_ai models for the full list.
- YAML configs - Most promptfoo YAML configs work with minimal changes
- JSONL datasets - Existing test datasets are fully supported
- Python assertions - Custom
get_assertfunctions work without modification - JSON schemas - Schema validation uses the same format
[!WARNING] Promptdev can run custom Python assertions. While powerful, running arbitrary Python code always comes with security issues. Use this feature only with code you trust.
Example of a Python assertion:
# tests/data/python_assert.py
from typing import Any
def get_assert(output:str, context:dict) -> bool | float | dict[str, Any]:
"""Test assertion that checks if output contains 'success'."""
return "success" in str(output).lower()
Development
# Setup development environment
uv sync
# Run tests
uv run pytest
# Format and lint code
uv run ruff check . --fix
uv run ruff format .
# Type checking
uv run ty check
Roadmap
- Core evaluation engine with PydanticAI integration
- Multi-provider support for major AI platforms
- YAML configuration loading with promptfoo compatibility
- Comprehensive assertion types (JSON schema, Python, LLM-based)
- File-based caching system with TTL support
- Rich console reporting with failure analysis
- Simple file disk cache
- Better integration with PydanticAI, do not reinvent the wheel
- Concurrent execution using PydanticAI natively, for faster large-scale evaluations
- Code cleanup
- Testing
- Testing promptfoo files
- Native support for PydanticAI agents
- Add support to run multiple config files with one command
- CI/CD integration helpers with change detection
- SQLite persistence for evaluation history and analytics
- Performance benchmarking and regression detection
Contributing
We welcome contributions! Here's how to get started:
- Fork the repository
- Create a feature branch:
git checkout -b feature/amazing-feature - Install development dependencies:
uv sync - Make your changes and add tests
- Run tests:
uv run pytest - Commit your changes:
git commit -m 'Add amazing feature' - Push to the branch:
git push origin feature/amazing-feature - Open a Pull Request
Code Style
We use ruff for code formatting and linting, ty for type checking, and pytest for testing. Please ensure your code follows these standards:
uv run ruff check . # Lint code
uv run ruff format . # Format code
uv run ty check # Type checking
uv run pytest # Run tests
License
This project is licensed under the MIT License - see the LICENSE file for details.
Acknowledgments
- Built on PydanticAI for type-safe AI agent development
- Inspired by promptfoo for evaluation concepts
- Uses Rich for beautiful console output
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file promptdev-0.0.2a1.tar.gz.
File metadata
- Download URL: promptdev-0.0.2a1.tar.gz
- Upload date:
- Size: 21.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d70f3cb85d8c02c758c5c1392a5022d03877561ac6f3e01358ae07414bff1b3e
|
|
| MD5 |
dea0f7f35f34cc041ee0ff01ebacac68
|
|
| BLAKE2b-256 |
f728a0fc83d4cca65cbf076a8db4957e05eeef7641b9ae190246a71cf5bbe5ad
|
Provenance
The following attestation bundles were made for promptdev-0.0.2a1.tar.gz:
Publisher:
release.yml on artefactop/promptdev
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
promptdev-0.0.2a1.tar.gz -
Subject digest:
d70f3cb85d8c02c758c5c1392a5022d03877561ac6f3e01358ae07414bff1b3e - Sigstore transparency entry: 547403550
- Sigstore integration time:
-
Permalink:
artefactop/promptdev@d84d4b35e720e88967d97152b276dacccf0f3246 -
Branch / Tag:
refs/tags/0.0.2a1 - Owner: https://github.com/artefactop
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@d84d4b35e720e88967d97152b276dacccf0f3246 -
Trigger Event:
release
-
Statement type:
File details
Details for the file promptdev-0.0.2a1-py3-none-any.whl.
File metadata
- Download URL: promptdev-0.0.2a1-py3-none-any.whl
- Upload date:
- Size: 25.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
92a7384ee8fc7a5d167840a64874d52c94ed7ec79e9f611086a49ef6ba812dbb
|
|
| MD5 |
a01374057b7bed9b3aff6faae0dd5175
|
|
| BLAKE2b-256 |
3069246fd3cd297ad73a348fecdcfd1a1854a8c9ff9dbf3b7756b403751dfecb
|
Provenance
The following attestation bundles were made for promptdev-0.0.2a1-py3-none-any.whl:
Publisher:
release.yml on artefactop/promptdev
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
promptdev-0.0.2a1-py3-none-any.whl -
Subject digest:
92a7384ee8fc7a5d167840a64874d52c94ed7ec79e9f611086a49ef6ba812dbb - Sigstore transparency entry: 547403577
- Sigstore integration time:
-
Permalink:
artefactop/promptdev@d84d4b35e720e88967d97152b276dacccf0f3246 -
Branch / Tag:
refs/tags/0.0.2a1 - Owner: https://github.com/artefactop
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@d84d4b35e720e88967d97152b276dacccf0f3246 -
Trigger Event:
release
-
Statement type: