Python-native prompt evaluation tool using PydanticAI

These details have not been verified by PyPI

Project description

Promptdev

promptdev is a prompt evaluation framework that provides comprehensive testing for AI agents across multiple providers.

Promptdev Demo

[!WARNING]

promptdev is in preview and is not ready for production use.

We're working hard to make it stable and feature-complete, but until then, expect to encounter bugs, missing features, and fatal errors.

Features

🔒 Type Safe - Full Pydantic validation for inputs, outputs, and configurations
🤖 PydanticAI Integration - Native support for PydanticAI agents (in progress) and evaluation framework
📊 Multi-Provider Testing - Test across OpenAI, Together.ai, Ollama, Bedrock, and more
⚡ Performance Optimized - File-based caching with TTL for faster repeated evaluations
📈 Rich Reporting - Beautiful console output with detailed failure analysis and provider comparisons
🧪 Promptfoo Compatible - Works with (some) existing promptfoo YAML configs and datasets
🎯 Comprehensive Assertions - Built-in evaluators plus custom Python assertion support

Quick Start

Installation

From PyPI (alpha version)

pip install promptdev --pre

From Source

git clone https://github.com/artefactop/promptdev.git
cd promptdev
pip install -e .

For Development

git clone https://github.com/artefactop/promptdev.git
cd promptdev
uv sync
uv run promptdev --help

Basic Usage

If installed via pip:

# Run evaluation (simple demo)
promptdev eval examples/demo/config.yaml

# Run evaluation (advanced example)
promptdev eval examples/calendar_event_summary/config.yaml

# Disable caching for a run
promptdev eval examples/demo/config.yaml --no-cache

# Export results
promptdev eval examples/demo/config.yaml --output json
promptdev eval examples/demo/config.yaml --output html

# Validate configuration
promptdev validate examples/demo/config.yaml

# Cache management
promptdev cache stats
promptdev cache clear

If running from source:

uv run promptdev --help

Assertion Types

Promptdev supports a comprehensive set of evaluators for different testing scenarios:

Type	Description
`equals`	Checks if the output exactly equals the provided value
`contains`	Checks if the output contains the expected output
`is_instance`	Checks if the output is an instance of a type with the given name
`max_duration`	Checks if the execution time is under the specified maximum
`is_json`	Checks if the output is a valid JSON string (optional json schema validation)
`contains_json`	Checks if the output contains a valid json (optional json schema validation)
`python`	Promptfoo compatible Allows you to provide a custom Python function to validate the LLM output

Configuration

Promptdev uses YAML configuration files compatible with Promptfoo format, but only a subset is available for now:

Promptfoo Compatibility

Promptdev maintains compatibility with promptfoo configurations to ease migration:

To migrate if you are using ids with format provider:chat|completion:model, just remove the middle part provider:model, promptdev only supports chat.

Some provider name can change for example togetherai is now togeher. Refer to pydantic_ai models for the full list.

YAML configs - Most promptfoo YAML configs work with minimal changes
JSONL datasets - Existing test datasets are fully supported
Python assertions - Custom get_assert functions work without modification
JSON schemas - Schema validation uses the same format

[!WARNING] Promptdev can run custom Python assertions. While powerful, running arbitrary Python code always comes with security issues. Use this feature only with code you trust.

Example of a Python assertion:

# tests/data/python_assert.py
from typing import Any


def get_assert(output:str, context:dict) -> bool | float | dict[str, Any]:
        """Test assertion that checks if output contains 'success'."""
        return "success" in str(output).lower()

Development

# Setup development environment
uv sync

# Run tests
uv run pytest

# Format and lint code
uv run ruff check . --fix
uv run ruff format .

# Type checking
uv run ty check

Roadmap

Core evaluation engine with PydanticAI integration
Multi-provider support for major AI platforms
YAML configuration loading with promptfoo compatibility
Comprehensive assertion types (JSON schema, Python, LLM-based)
File-based caching system with TTL support
Rich console reporting with failure analysis
Simple file disk cache
Better integration with PydanticAI, do not reinvent the wheel
Concurrent execution using PydanticAI natively, for faster large-scale evaluations
Code cleanup
Testing
Testing promptfoo files
Native support for PydanticAI agents
Add support to run multiple config files with one command
CI/CD integration helpers with change detection
SQLite persistence for evaluation history and analytics
Performance benchmarking and regression detection

Contributing

We welcome contributions! Here's how to get started:

Fork the repository
Create a feature branch: git checkout -b feature/amazing-feature
Install development dependencies: uv sync
Make your changes and add tests
Run tests: uv run pytest
Commit your changes: git commit -m 'Add amazing feature'
Push to the branch: git push origin feature/amazing-feature
Open a Pull Request

Code Style

We use ruff for code formatting and linting, ty for type checking, and pytest for testing. Please ensure your code follows these standards:

uv run ruff check .       # Lint code
uv run ruff format .      # Format code
uv run ty check           # Type checking
uv run pytest             # Run tests

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

Built on PydanticAI for type-safe AI agent development
Inspired by promptfoo for evaluation concepts
Uses Rich for beautiful console output

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.0.2a1 pre-release

Sep 22, 2025

0.0.1

Sep 5, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

promptdev-0.0.2a1.tar.gz (21.4 kB view details)

Uploaded Sep 22, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

promptdev-0.0.2a1-py3-none-any.whl (25.6 kB view details)

Uploaded Sep 22, 2025 Python 3

File details

Details for the file promptdev-0.0.2a1.tar.gz.

File metadata

Download URL: promptdev-0.0.2a1.tar.gz
Upload date: Sep 22, 2025
Size: 21.4 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for promptdev-0.0.2a1.tar.gz
Algorithm	Hash digest
SHA256	`d70f3cb85d8c02c758c5c1392a5022d03877561ac6f3e01358ae07414bff1b3e`
MD5	`dea0f7f35f34cc041ee0ff01ebacac68`
BLAKE2b-256	`f728a0fc83d4cca65cbf076a8db4957e05eeef7641b9ae190246a71cf5bbe5ad`

See more details on using hashes here.

Provenance

The following attestation bundles were made for promptdev-0.0.2a1.tar.gz:

Publisher: release.yml on artefactop/promptdev

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: promptdev-0.0.2a1.tar.gz
- Subject digest: d70f3cb85d8c02c758c5c1392a5022d03877561ac6f3e01358ae07414bff1b3e
- Sigstore transparency entry: 547403550
- Sigstore integration time: Sep 22, 2025
Source repository:
- Permalink: artefactop/promptdev@d84d4b35e720e88967d97152b276dacccf0f3246
- Branch / Tag: refs/tags/0.0.2a1
- Owner: https://github.com/artefactop
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@d84d4b35e720e88967d97152b276dacccf0f3246
- Trigger Event: release

File details

Details for the file promptdev-0.0.2a1-py3-none-any.whl.

File metadata

Download URL: promptdev-0.0.2a1-py3-none-any.whl
Upload date: Sep 22, 2025
Size: 25.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for promptdev-0.0.2a1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`92a7384ee8fc7a5d167840a64874d52c94ed7ec79e9f611086a49ef6ba812dbb`
MD5	`a01374057b7bed9b3aff6faae0dd5175`
BLAKE2b-256	`3069246fd3cd297ad73a348fecdcfd1a1854a8c9ff9dbf3b7756b403751dfecb`

See more details on using hashes here.

Provenance

The following attestation bundles were made for promptdev-0.0.2a1-py3-none-any.whl:

Publisher: release.yml on artefactop/promptdev

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: promptdev-0.0.2a1-py3-none-any.whl
- Subject digest: 92a7384ee8fc7a5d167840a64874d52c94ed7ec79e9f611086a49ef6ba812dbb
- Sigstore transparency entry: 547403577
- Sigstore integration time: Sep 22, 2025
Source repository:
- Permalink: artefactop/promptdev@d84d4b35e720e88967d97152b276dacccf0f3246
- Branch / Tag: refs/tags/0.0.2a1
- Owner: https://github.com/artefactop
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@d84d4b35e720e88967d97152b276dacccf0f3246
- Trigger Event: release

promptdev 0.0.2a1

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

Promptdev

Features

Quick Start

Installation

From PyPI (alpha version)

From Source

For Development

Basic Usage

If installed via pip:

If running from source:

Assertion Types

Configuration

Promptfoo Compatibility

Development

Roadmap

Contributing

Code Style

License

Acknowledgments

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance