Pytest plugin for testing AI agents — mock LLMs, assert tool calls, track tokens, regression-test prompts.

These details have not been verified by PyPI

Project links

Project description

pytest-agents

Pytest plugin for testing AI agents — mock LLMs, assert tool calls, track token usage, and regression-test prompt changes.

pip install pytest-agents

Why?

Every team building AI agents needs testing, but there's no standard way to:

Mock LLM responses deterministically (record/replay)
Assert tool call sequences ("did the agent call search before summarize?")
Track token costs per test ("this test costs $0.03")
Regression-test prompts ("did this prompt change break behavior?")
Set budgets ("fail if any test exceeds 5000 tokens")

pytest-agents solves all of these as a pytest plugin. Works with any framework — LangChain, CrewAI, OpenAI, Anthropic, LiteLLM, or raw HTTP.

Quick Start

1. Mock LLM responses

from pytest_agents import mock_llm, LLMResponse

def test_agent_greeting(mock_llm):
    """Mock returns deterministic responses."""
    mock_llm.add_response(LLMResponse(
        content="Hello! How can I help you today?",
        model="gpt-4o",
        tokens={"prompt": 10, "completion": 8},
    ))

    # Your agent code calls the LLM...
    result = my_agent.run("Hi there")

    assert "help" in result.lower()
    assert mock_llm.call_count == 1

2. Assert tool call sequences

from pytest_agents import AgentTracer

def test_agent_uses_correct_tools():
    """Verify the agent calls tools in the expected order."""
    tracer = AgentTracer()

    with tracer.trace():
        result = my_agent.run("What's the weather in Berlin?")

    tracer.assert_tools_called(["geocode", "weather_api"])
    tracer.assert_tool_called_with("geocode", city="Berlin")
    assert tracer.tool_count == 2

3. Track token costs

import pytest

@pytest.mark.max_tokens(5000)
def test_agent_is_efficient():
    """Fail if the agent uses more than 5000 tokens."""
    result = my_agent.run("Summarize this document")
    assert result is not None

@pytest.mark.max_cost_usd(0.05)
def test_agent_cost_budget():
    """Fail if this test costs more than $0.05."""
    result = my_agent.run("Complex analysis task")
    assert "analysis" in result.lower()

4. Record and replay LLM calls

from pytest_agents import record_llm, replay_llm

# First run: records real LLM responses to fixtures/
@record_llm("fixtures/greeting_test.json")
def test_greeting_record():
    result = my_agent.run("Hello")
    assert "hello" in result.lower()

# Subsequent runs: replays from fixtures (no API calls, free, fast)
@replay_llm("fixtures/greeting_test.json")
def test_greeting_replay():
    result = my_agent.run("Hello")
    assert "hello" in result.lower()

5. Regression-test prompt changes

from pytest_agents import prompt_snapshot

@prompt_snapshot("agent_system_prompt")
def test_system_prompt_unchanged():
    """Fails if the system prompt changed since last snapshot."""
    return my_agent.system_prompt

# Run with --snapshot-update to accept new prompt versions
# pytest --snapshot-update

Fixtures

Fixture	Description
`mock_llm`	Pre-configured LLM mock with response queue
`agent_tracer`	Tool call tracer (auto-starts/stops per test)
`token_tracker`	Token usage tracker for the current test
`llm_cassette`	VCR-style record/replay for LLM calls

Markers

Marker	Description
`@pytest.mark.agent`	Tag a test as an agent test (for filtering)
`@pytest.mark.max_tokens(n)`	Fail if test exceeds n tokens
`@pytest.mark.max_cost_usd(n)`	Fail if test exceeds $n
`@pytest.mark.slow_agent`	Mark slow agent tests (skip with `-m "not slow_agent"`)

CLI Options

pytest --agent-report          # Print token/cost summary after test run
pytest --snapshot-update       # Update prompt snapshots
pytest -m agent                # Run only agent tests
pytest -m "not slow_agent"     # Skip slow agent tests

Architecture

pytest-agents/
├── plugin.py          # Pytest plugin entry point (hooks + fixtures)
├── mock_llm.py        # LLM mock with response queue
├── tracer.py          # Tool call tracing and assertions
├── tokens.py          # Token counting and cost tracking
├── recorder.py        # Record/replay LLM calls (cassette)
├── snapshot.py        # Prompt snapshot regression testing
└── markers.py         # Custom pytest markers

Framework Compatibility

Framework	Support	Notes
OpenAI SDK	✅	Patches `openai.ChatCompletion.create`
Anthropic SDK	✅	Patches `anthropic.Anthropic.messages.create`
LiteLLM	✅	Patches `litellm.completion`
LangChain	✅	Works via LLM patches
Raw HTTP	✅	Use `mock_llm` fixture directly

Contributing

git clone https://github.com/naveenkumarbaskaran/pytest-agents.git
cd pytest-agents
python -m venv .venv && source .venv/bin/activate
pip install -e ".[dev]"
pytest

License

MIT

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.0

May 3, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pytest_agentkit-0.1.0.tar.gz (10.7 kB view details)

Uploaded May 3, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

pytest_agentkit-0.1.0-py3-none-any.whl (14.2 kB view details)

Uploaded May 3, 2026 Python 3

File details

Details for the file pytest_agentkit-0.1.0.tar.gz.

File metadata

Download URL: pytest_agentkit-0.1.0.tar.gz
Upload date: May 3, 2026
Size: 10.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for pytest_agentkit-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`e92e53081ff3665bd39cbee4f5d2e392283a6ab5d5fa72931a658f40e7507dfc`
MD5	`c6b5482613df89e50a1e8df0275f0d43`
BLAKE2b-256	`458544f09bf20d7aa12baaac08ed91b7a79a7cebaa83dc6a10b283e03b3d01b7`

See more details on using hashes here.

File details

Details for the file pytest_agentkit-0.1.0-py3-none-any.whl.

File metadata

Download URL: pytest_agentkit-0.1.0-py3-none-any.whl
Upload date: May 3, 2026
Size: 14.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for pytest_agentkit-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`17a7d58de812cf1d8281b0b01328d17a70b5d8f1b0152616b2cf58cf1d754623`
MD5	`b0fd7717693474f2555dded3d47eda3c`
BLAKE2b-256	`b1dc8c901f6ee37c19ce024d5102ec31f354313ce31f90c4f916beacd845043b`

See more details on using hashes here.

pytest-agentkit 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

pytest-agents

Why?

Quick Start

1. Mock LLM responses

2. Assert tool call sequences

3. Track token costs

4. Record and replay LLM calls

5. Regression-test prompt changes

Fixtures

Markers

CLI Options

Architecture

Framework Compatibility

Contributing

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes