A pytest plugin for LLM evaluation tests with threshold-based pass/fail

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

These details have not been verified by PyPI

Project description

pytest-agent-eval

LLM evaluation tests that actually mean something. A pytest plugin for testing LLM agents with threshold-based pass/fail scoring, multi-turn transcripts, and LLM-as-judge rubrics — without breaking your CI bill.

Highlights

🎯 Threshold-based pass/fail — run each test N times, pass when ≥ threshold% succeed
📝 YAML or Python transcripts — pick the authoring style your team prefers
🔍 YAML auto-discovery — drop *.yaml files in any configured directory and they become pytest tests automatically
🛡 CI-safe by default — eval tests skip unless --agent-eval-live or EVAL_LIVE=1
⚡ Parallel-ready — pytest -n auto (via pytest-xdist) just works
📄 Markdown reports — full per-run trace with --agent-eval-report=eval.md

Installation

# pip
pip install pytest-agent-eval

# uv
uv add pytest-agent-eval

Supported frameworks

pytest-agent-eval ships first-class adapters for the major Python agent frameworks. Each is an optional extra so you only install what you use.

Framework	Extra	Adapter
pydantic-ai	(default)	`pytest_agent_eval.adapters.pydantic_ai.PydanticAIAdapter`
LangChain / LangGraph	`langchain`	`pytest_agent_eval.adapters.langchain.LangChainAdapter`
OpenAI SDK	`openai`	`pytest_agent_eval.adapters.openai.OpenAIAdapter`
smolagents	`smolagents`	`pytest_agent_eval.adapters.smolagents.SmolagentsAdapter`

pip install "pytest-agent-eval[langchain]"
pip install "pytest-agent-eval[openai]"
pip install "pytest-agent-eval[smolagents]"
# or with uv:
uv add "pytest-agent-eval[langchain]"
uv add "pytest-agent-eval[openai]"
uv add "pytest-agent-eval[smolagents]"

Bringing your own framework? Any async def agent(messages) -> (reply, tool_calls) callable works directly — no base class needed.

What you can test

pytest-agent-eval separates the kinds of checks you might want into composable evaluators:

Deterministic checks — ContainsEvaluator(any_of=["confirmed", "booked"]) for substring/regex assertions over the agent reply.
Tool-call assertions — ToolCallEvaluator(must_include=["create_booking"], ordered=True) to verify that the agent called the right tools, in the right order.
LLM-as-judge — JudgeEvaluator(rubric="Reply must be friendly, include a date, and confirm the booking.") for open-ended quality checks the agent under test should meet.

Mix and match per turn — every evaluator participates in the threshold score.

Quick start

import pytest
from pytest_agent_eval import Turn, Expect, ContainsEvaluator, ToolCallEvaluator, JudgeEvaluator

@pytest.mark.agent_eval(threshold=0.8, runs=3)
async def test_booking(agent_eval):
    result = await agent_eval.run(
        agent=my_agent,
        turns=[
            Turn(
                user="Book me a slot tomorrow at 10am",
                expect=Expect(evaluators=[
                    ContainsEvaluator(any_of=["confirmed", "booked"]),
                    ToolCallEvaluator(must_include=["create_booking"]),
                    JudgeEvaluator(rubric="Reply must include a reference number."),
                ]),
            )
        ],
    )
    result.assert_threshold()

pytest --agent-eval-live

See the full documentation for the YAML authoring style, configuration, and reporting options.

License

MIT — see LICENSE.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

murilo-cunha

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.2.0

Apr 30, 2026

This version

0.1.0

Apr 30, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pytest_agent_eval-0.1.0.tar.gz (273.9 kB view details)

Uploaded Apr 30, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

pytest_agent_eval-0.1.0-py3-none-any.whl (22.1 kB view details)

Uploaded Apr 30, 2026 Python 3

File details

Details for the file pytest_agent_eval-0.1.0.tar.gz.

File metadata

Download URL: pytest_agent_eval-0.1.0.tar.gz
Upload date: Apr 30, 2026
Size: 273.9 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for pytest_agent_eval-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`7287c54bb7005c2e71fabc09523a641487ab2a2555e5651a0963f1feae745828`
MD5	`570a95583f29e1db41c4ac871bc26473`
BLAKE2b-256	`ad784e48726efb65bcb7d14060adb72a06a33b02e4636c0238a7935ec6509fc4`

See more details on using hashes here.

Provenance

The following attestation bundles were made for pytest_agent_eval-0.1.0.tar.gz:

Publisher: release.yml on datarootsio/pytest-agent-eval

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: pytest_agent_eval-0.1.0.tar.gz
- Subject digest: 7287c54bb7005c2e71fabc09523a641487ab2a2555e5651a0963f1feae745828
- Sigstore transparency entry: 1409356527
- Sigstore integration time: Apr 30, 2026
Source repository:
- Permalink: datarootsio/pytest-agent-eval@1da9e4fb236ebba494b34f3b4c9188a820e0b1b8
- Branch / Tag: refs/tags/v0.1.0
- Owner: https://github.com/datarootsio
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@1da9e4fb236ebba494b34f3b4c9188a820e0b1b8
- Trigger Event: push

File details

Details for the file pytest_agent_eval-0.1.0-py3-none-any.whl.

File metadata

Download URL: pytest_agent_eval-0.1.0-py3-none-any.whl
Upload date: Apr 30, 2026
Size: 22.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for pytest_agent_eval-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`d3ea8bab73bc2cbf993c3ac80c64361f21570a45edd67535a2d0dc1637265cb4`
MD5	`93dd4e06f64102a04013d5804d7d408d`
BLAKE2b-256	`a18f6b02e9470af69ab2585b27b34788e1a9c196649582f47bde45bc7438c31a`

See more details on using hashes here.

Provenance

The following attestation bundles were made for pytest_agent_eval-0.1.0-py3-none-any.whl:

Publisher: release.yml on datarootsio/pytest-agent-eval

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: pytest_agent_eval-0.1.0-py3-none-any.whl
- Subject digest: d3ea8bab73bc2cbf993c3ac80c64361f21570a45edd67535a2d0dc1637265cb4
- Sigstore transparency entry: 1409356530
- Sigstore integration time: Apr 30, 2026
Source repository:
- Permalink: datarootsio/pytest-agent-eval@1da9e4fb236ebba494b34f3b4c9188a820e0b1b8
- Branch / Tag: refs/tags/v0.1.0
- Owner: https://github.com/datarootsio
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@1da9e4fb236ebba494b34f3b4c9188a820e0b1b8
- Trigger Event: push

pytest-agent-eval 0.1.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

pytest-agent-eval

Highlights

Installation

Supported frameworks

What you can test

Quick start

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance