Framework for evaluating stochastic code execution, especially code making use of LLMs

These details have been verified by PyPI

Project links

Owner

Pydantic

GitHub Statistics

Maintainers

dmontagu

These details have not been verified by PyPI

Project links

Project description

Pydantic Evals

This is a library for evaluating non-deterministic (or "stochastic") functions in Python. It provides a simple, Pythonic interface for defining and running stochastic functions, and analyzing the results of running those functions.

While this library is developed as part of Pydantic AI, it only uses Pydantic AI for a small subset of generative functionality internally, and it is designed to be used with arbitrary "stochastic function" implementations. In particular, it can be used with other (non-Pydantic AI) AI libraries, agent frameworks, etc.

As with Pydantic AI, this library prioritizes type safety and use of common Python syntax over esoteric, domain-specific use of Python syntax.

Full documentation is available at ai.pydantic.dev/evals.

Example

While you'd typically use Pydantic Evals with more complex functions (such as Pydantic AI agents or graphs), here's a quick example that evaluates a simple function against a test case using both custom and built-in evaluators:

from pydantic_evals import Case, Dataset
from pydantic_evals.evaluators import Evaluator, EvaluatorContext, IsInstance

# Define a test case with inputs and expected output
case = Case(
    name='capital_question',
    inputs='What is the capital of France?',
    expected_output='Paris',
)

# Define a custom evaluator
class MatchAnswer(Evaluator[str, str]):
    def evaluate(self, ctx: EvaluatorContext[str, str]) -> float:
        if ctx.output == ctx.expected_output:
            return 1.0
        elif isinstance(ctx.output, str) and ctx.expected_output.lower() in ctx.output.lower():
            return 0.8
        return 0.0

# Create a dataset with the test case and evaluators
dataset = Dataset(
    name='capital_eval',
    cases=[case],
    evaluators=[IsInstance(type_name='str'), MatchAnswer()],
)

# Define the function to evaluate
async def answer_question(question: str) -> str:
    return 'Paris'

# Run the evaluation
report = dataset.evaluate_sync(answer_question)
report.print(include_input=True, include_output=True)
"""
                                    Evaluation Summary: answer_question
┏━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━┓
┃ Case ID          ┃ Inputs                         ┃ Outputs ┃ Scores            ┃ Assertions ┃ Duration ┃
┡━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━┩
│ capital_question │ What is the capital of France? │ Paris   │ MatchAnswer: 1.00 │ ✔          │     10ms │
├──────────────────┼────────────────────────────────┼─────────┼───────────────────┼────────────┼──────────┤
│ Averages         │                                │         │ MatchAnswer: 1.00 │ 100.0% ✔   │     10ms │
└──────────────────┴────────────────────────────────┴─────────┴───────────────────┴────────────┴──────────┘
"""

Using the library with more complex functions, such as Pydantic AI agents, is similar — all you need to do is define a task function wrapping the function you want to evaluate, with a signature that matches the inputs and outputs of your test cases.

Logfire Integration

Pydantic Evals uses OpenTelemetry to record traces for each case in your evaluations.

You can send these traces to any OpenTelemetry-compatible backend. For the best experience, we recommend Pydantic Logfire, which includes custom views for evals:

You'll see full details about the inputs, outputs, token usage, execution durations, etc. And you'll have access to the full trace for each case — ideal for debugging, writing path-aware evaluators, or running the similar evaluations against production traces.

Basic setup:

import logfire

logfire.configure(
    send_to_logfire='if-token-present',
    environment='development',
    service_name='evals',
)

...

my_dataset.evaluate_sync(my_task)

Project details

These details have been verified by PyPI

Project links

Owner

Pydantic

GitHub Statistics

Maintainers

dmontagu

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

2.0.0b7 pre-release

Jun 10, 2026

2.0.0b6 pre-release

Jun 5, 2026

2.0.0b5 pre-release

Jun 2, 2026

2.0.0b4 pre-release

May 29, 2026

2.0.0b3 pre-release

May 23, 2026

This version

2.0.0b2 pre-release

May 22, 2026

2.0.0b1 pre-release

May 21, 2026

1.107.0

Jun 10, 2026

1.106.0

Jun 5, 2026

1.105.0

Jun 2, 2026

1.104.0

May 29, 2026

1.103.0

May 27, 2026

1.102.0

May 23, 2026

1.101.0

May 22, 2026

1.100.0

May 21, 2026

1.99.0

May 20, 2026

1.98.0

May 19, 2026

1.97.0

May 15, 2026

1.96.1

May 15, 2026

1.96.0

May 14, 2026

1.95.1

May 13, 2026

1.95.0

May 13, 2026

1.94.0

May 12, 2026

1.93.0

May 9, 2026

1.92.0

May 8, 2026

1.91.0

May 7, 2026

1.90.0

May 5, 2026

1.89.1

May 1, 2026

1.89.0

May 1, 2026

1.88.0

Apr 29, 2026

1.87.0

Apr 25, 2026

1.86.1

Apr 24, 2026

1.86.0

Apr 23, 2026

1.85.1

Apr 22, 2026

1.85.0

Apr 21, 2026

1.84.1

Apr 18, 2026

1.84.0

Apr 17, 2026

1.83.0

Apr 16, 2026

1.82.0

Apr 15, 2026

1.81.0

Apr 14, 2026

1.80.0

Apr 10, 2026

1.79.0

Apr 10, 2026

1.78.0

Apr 8, 2026

1.77.0

Apr 3, 2026

1.76.0

Apr 2, 2026

1.75.0

Apr 1, 2026

1.74.0

Mar 31, 2026

1.73.0

Mar 27, 2026

1.72.0

Mar 26, 2026

1.71.0

Mar 24, 2026

1.70.0

Mar 18, 2026

1.69.0

Mar 17, 2026

1.68.0

Mar 13, 2026

1.67.0

Mar 6, 2026

1.66.0

Mar 5, 2026

1.65.0

Mar 3, 2026

1.64.0

Mar 2, 2026

1.63.0

Feb 23, 2026

1.62.0

Feb 19, 2026

1.61.0

Feb 18, 2026

1.60.0

Feb 17, 2026

1.59.0

Feb 14, 2026

1.58.0

Feb 11, 2026

1.57.0

Feb 10, 2026

1.56.0

Feb 6, 2026

1.55.0

Feb 5, 2026

1.54.0

Feb 4, 2026

1.53.0

Feb 4, 2026

1.52.0

Feb 3, 2026

1.51.0

Jan 31, 2026

1.50.0

Jan 30, 2026

1.49.0

Jan 29, 2026

1.48.0

Jan 28, 2026

1.47.0

Jan 24, 2026

1.46.0

Jan 23, 2026

1.44.0

Jan 17, 2026

1.43.0

Jan 16, 2026

1.42.0

Jan 14, 2026

1.41.0

Jan 10, 2026

1.40.0

Jan 7, 2026

1.39.1

Jan 6, 2026

1.39.0

Dec 24, 2025

1.38.0

Dec 23, 2025

1.37.0

Dec 20, 2025

1.36.0

Dec 19, 2025

1.35.0

Dec 18, 2025

1.34.0

Dec 17, 2025

1.33.0

Dec 16, 2025

1.32.0

Dec 13, 2025

1.31.0

Dec 12, 2025

1.30.1

Dec 11, 2025

1.30.0

Dec 11, 2025

1.29.0

Dec 10, 2025

1.28.0

Dec 9, 2025

1.27.0

Dec 5, 2025

1.26.0

Dec 3, 2025

1.25.1

Nov 28, 2025

1.25.0

Nov 28, 2025

1.24.0

Nov 27, 2025

1.23.0

Nov 26, 2025

1.22.0

Nov 22, 2025

1.21.0

Nov 21, 2025

1.20.0

Nov 19, 2025

1.19.0

Nov 18, 2025

1.18.0

Nov 15, 2025

1.17.0

Nov 14, 2025

1.16.0

Nov 13, 2025

1.15.0

Nov 13, 2025

1.14.1

Nov 12, 2025

1.14.0

Nov 10, 2025

1.13.0

Nov 10, 2025

1.12.0

Nov 7, 2025

1.11.1

Nov 6, 2025

1.11.0

Nov 5, 2025

1.10.0

Nov 4, 2025

1.9.1

Oct 31, 2025

1.9.0

Oct 29, 2025

1.8.0

Oct 29, 2025

1.7.0

Oct 28, 2025

1.6.0

Oct 24, 2025

1.5.0

Oct 24, 2025

1.4.0

Oct 24, 2025

1.3.0

Oct 23, 2025

1.2.1

Oct 20, 2025

1.2.0

Oct 20, 2025

1.1.0

Oct 15, 2025

1.0.18

Oct 13, 2025

1.0.17

Oct 9, 2025

1.0.16

Oct 8, 2025

1.0.15

Oct 3, 2025

1.0.14

Oct 3, 2025

1.0.13

Oct 2, 2025

1.0.12

Oct 1, 2025

1.0.11

Sep 30, 2025

1.0.10

Sep 20, 2025

1.0.9

Sep 18, 2025

1.0.8

Sep 17, 2025

1.0.7

Sep 15, 2025

1.0.6

Sep 12, 2025

1.0.5

Sep 12, 2025

1.0.4

Sep 11, 2025

1.0.3

Sep 11, 2025

1.0.2

Sep 9, 2025

1.0.1

Sep 5, 2025

1.0.0

Sep 5, 2025

1.0.0b1 pre-release

Aug 30, 2025

0.8.1

Aug 29, 2025

0.8.0

Aug 26, 2025

0.7.6

Aug 26, 2025

0.7.5

Aug 25, 2025

0.7.4

Aug 20, 2025

0.7.3

Aug 19, 2025

0.7.2

Aug 14, 2025

0.7.1

Aug 13, 2025

0.7.0

Aug 12, 2025

0.6.2

Aug 7, 2025

0.6.1

Aug 7, 2025

0.6.0

Aug 6, 2025

0.5.1

Aug 6, 2025

0.5.0

Aug 4, 2025

0.4.11

Aug 2, 2025

0.4.10

Jul 30, 2025

0.4.9

Jul 28, 2025

0.4.8

Jul 28, 2025

0.4.7

Jul 24, 2025

0.4.6

Jul 23, 2025

0.4.5

Jul 22, 2025

0.4.4

Jul 18, 2025

0.4.3

Jul 16, 2025

0.4.2

Jul 10, 2025

0.4.1

Jul 10, 2025

0.4.0

Jul 8, 2025

0.3.7

Jul 7, 2025

0.3.6

Jul 4, 2025

0.3.5

Jun 30, 2025

0.3.4

Jun 26, 2025

0.3.3

Jun 24, 2025

0.3.2

Jun 21, 2025

0.3.1

Jun 18, 2025

0.3.0

Jun 18, 2025

0.2.20

Jun 18, 2025

0.2.19

Jun 17, 2025

0.2.18

Jun 13, 2025

0.2.17

Jun 12, 2025

0.2.16

Jun 8, 2025

0.2.15

Jun 5, 2025

0.2.14

Jun 3, 2025

0.2.13

Jun 3, 2025

0.2.12

May 29, 2025

0.2.11

May 28, 2025

0.2.10

May 27, 2025

0.2.9

May 26, 2025

0.2.8

May 25, 2025

0.2.7

May 24, 2025

0.2.6

May 21, 2025

0.2.5

May 20, 2025

0.2.4

May 14, 2025

0.2.3

May 13, 2025

0.2.2

May 13, 2025

0.2.1

May 13, 2025

0.2.0

May 12, 2025

0.1.12 yanked

May 12, 2025

Reason this release was yanked:

should have been 0.2.0

0.1.11

May 10, 2025

0.1.10

May 6, 2025

0.1.9

May 2, 2025

0.1.8

Apr 28, 2025

0.1.7

Apr 28, 2025

0.1.6

Apr 25, 2025

0.1.5

Apr 25, 2025

0.1.4

Apr 24, 2025

0.1.3

Apr 18, 2025

0.1.2

Apr 17, 2025

0.1.1

Apr 16, 2025

0.1.0

Apr 15, 2025

0.0.55

Apr 9, 2025

0.0.54

Apr 9, 2025

0.0.53

Apr 7, 2025

0.0.52

Apr 3, 2025

0.0.51

Apr 3, 2025

0.0.50

Apr 3, 2025

0.0.49

Apr 1, 2025

0.0.48

Mar 31, 2025

0.0.47

Mar 31, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pydantic_evals-2.0.0b2.tar.gz (75.4 kB view details)

Uploaded May 22, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

pydantic_evals-2.0.0b2-py3-none-any.whl (89.9 kB view details)

Uploaded May 22, 2026 Python 3

File details

Details for the file pydantic_evals-2.0.0b2.tar.gz.

File metadata

Download URL: pydantic_evals-2.0.0b2.tar.gz
Upload date: May 22, 2026
Size: 75.4 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.13

File hashes

Hashes for pydantic_evals-2.0.0b2.tar.gz
Algorithm	Hash digest
SHA256	`d68bd397cfa95c181e49e91a82f6ea55379f6c057b8d414ba0dab37c5c2dc869`
MD5	`32017b35ac2f89bac192c36a036cfad3`
BLAKE2b-256	`0d34e065e4d3255069e3c567a8ec9d2507de8199bb01b63975f525b2b4b457eb`

See more details on using hashes here.

Provenance

The following attestation bundles were made for pydantic_evals-2.0.0b2.tar.gz:

Publisher: ci.yml on pydantic/pydantic-ai

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: pydantic_evals-2.0.0b2.tar.gz
- Subject digest: d68bd397cfa95c181e49e91a82f6ea55379f6c057b8d414ba0dab37c5c2dc869
- Sigstore transparency entry: 1599477555
- Sigstore integration time: May 22, 2026
Source repository:
- Permalink: pydantic/pydantic-ai@b8dea15e8c8acabcce4a0ef70b0fb1b480413571
- Branch / Tag: refs/tags/v2.0.0b2
- Owner: https://github.com/pydantic
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: ci.yml@b8dea15e8c8acabcce4a0ef70b0fb1b480413571
- Trigger Event: push

File details

Details for the file pydantic_evals-2.0.0b2-py3-none-any.whl.

File metadata

Download URL: pydantic_evals-2.0.0b2-py3-none-any.whl
Upload date: May 22, 2026
Size: 89.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.13

File hashes

Hashes for pydantic_evals-2.0.0b2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`aba1718b75c2a7c722a43cd96b0bb28988c045c3a5967df7ba55bc272d5cd898`
MD5	`2b62308c433d67a841fe0da6b34927bd`
BLAKE2b-256	`314d1ada9e2b085191aa42493db619f9b75596b677f09d424b2c1203b711a83b`

See more details on using hashes here.

Provenance

The following attestation bundles were made for pydantic_evals-2.0.0b2-py3-none-any.whl:

Publisher: ci.yml on pydantic/pydantic-ai

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: pydantic_evals-2.0.0b2-py3-none-any.whl
- Subject digest: aba1718b75c2a7c722a43cd96b0bb28988c045c3a5967df7ba55bc272d5cd898
- Sigstore transparency entry: 1599478167
- Sigstore integration time: May 22, 2026
Source repository:
- Permalink: pydantic/pydantic-ai@b8dea15e8c8acabcce4a0ef70b0fb1b480413571
- Branch / Tag: refs/tags/v2.0.0b2
- Owner: https://github.com/pydantic
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: ci.yml@b8dea15e8c8acabcce4a0ef70b0fb1b480413571
- Trigger Event: push

pydantic-evals 2.0.0b2

Navigation

Verified details

Project links

Owner

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Pydantic Evals

Example

Logfire Integration

Project details

Verified details

Project links

Owner

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance