Skip to main content

Pytest plugin for testing AI agents — mock LLMs, assert tool calls, track tokens, regression-test prompts.

Project description

pytest-agents

PyPI version Python License: MIT Tests

Pytest plugin for testing AI agents — mock LLMs, assert tool calls, track token usage, and regression-test prompt changes.

pip install pytest-agents

Why?

Every team building AI agents needs testing, but there's no standard way to:

  • Mock LLM responses deterministically (record/replay)
  • Assert tool call sequences ("did the agent call search before summarize?")
  • Track token costs per test ("this test costs $0.03")
  • Regression-test prompts ("did this prompt change break behavior?")
  • Set budgets ("fail if any test exceeds 5000 tokens")

pytest-agents solves all of these as a pytest plugin. Works with any framework — LangChain, CrewAI, OpenAI, Anthropic, LiteLLM, or raw HTTP.

Quick Start

1. Mock LLM responses

from pytest_agents import mock_llm, LLMResponse

def test_agent_greeting(mock_llm):
    """Mock returns deterministic responses."""
    mock_llm.add_response(LLMResponse(
        content="Hello! How can I help you today?",
        model="gpt-4o",
        tokens={"prompt": 10, "completion": 8},
    ))

    # Your agent code calls the LLM...
    result = my_agent.run("Hi there")

    assert "help" in result.lower()
    assert mock_llm.call_count == 1

2. Assert tool call sequences

from pytest_agents import AgentTracer

def test_agent_uses_correct_tools():
    """Verify the agent calls tools in the expected order."""
    tracer = AgentTracer()

    with tracer.trace():
        result = my_agent.run("What's the weather in Berlin?")

    tracer.assert_tools_called(["geocode", "weather_api"])
    tracer.assert_tool_called_with("geocode", city="Berlin")
    assert tracer.tool_count == 2

3. Track token costs

import pytest

@pytest.mark.max_tokens(5000)
def test_agent_is_efficient():
    """Fail if the agent uses more than 5000 tokens."""
    result = my_agent.run("Summarize this document")
    assert result is not None

@pytest.mark.max_cost_usd(0.05)
def test_agent_cost_budget():
    """Fail if this test costs more than $0.05."""
    result = my_agent.run("Complex analysis task")
    assert "analysis" in result.lower()

4. Record and replay LLM calls

from pytest_agents import record_llm, replay_llm

# First run: records real LLM responses to fixtures/
@record_llm("fixtures/greeting_test.json")
def test_greeting_record():
    result = my_agent.run("Hello")
    assert "hello" in result.lower()

# Subsequent runs: replays from fixtures (no API calls, free, fast)
@replay_llm("fixtures/greeting_test.json")
def test_greeting_replay():
    result = my_agent.run("Hello")
    assert "hello" in result.lower()

5. Regression-test prompt changes

from pytest_agents import prompt_snapshot

@prompt_snapshot("agent_system_prompt")
def test_system_prompt_unchanged():
    """Fails if the system prompt changed since last snapshot."""
    return my_agent.system_prompt

# Run with --snapshot-update to accept new prompt versions
# pytest --snapshot-update

Fixtures

Fixture Description
mock_llm Pre-configured LLM mock with response queue
agent_tracer Tool call tracer (auto-starts/stops per test)
token_tracker Token usage tracker for the current test
llm_cassette VCR-style record/replay for LLM calls

Markers

Marker Description
@pytest.mark.agent Tag a test as an agent test (for filtering)
@pytest.mark.max_tokens(n) Fail if test exceeds n tokens
@pytest.mark.max_cost_usd(n) Fail if test exceeds $n
@pytest.mark.slow_agent Mark slow agent tests (skip with -m "not slow_agent")

CLI Options

pytest --agent-report          # Print token/cost summary after test run
pytest --snapshot-update       # Update prompt snapshots
pytest -m agent                # Run only agent tests
pytest -m "not slow_agent"     # Skip slow agent tests

Architecture

pytest-agents/
├── plugin.py          # Pytest plugin entry point (hooks + fixtures)
├── mock_llm.py        # LLM mock with response queue
├── tracer.py          # Tool call tracing and assertions
├── tokens.py          # Token counting and cost tracking
├── recorder.py        # Record/replay LLM calls (cassette)
├── snapshot.py        # Prompt snapshot regression testing
└── markers.py         # Custom pytest markers

Framework Compatibility

Framework Support Notes
OpenAI SDK Patches openai.ChatCompletion.create
Anthropic SDK Patches anthropic.Anthropic.messages.create
LiteLLM Patches litellm.completion
LangChain Works via LLM patches
Raw HTTP Use mock_llm fixture directly

Contributing

git clone https://github.com/naveenkumarbaskaran/pytest-agents.git
cd pytest-agents
python -m venv .venv && source .venv/bin/activate
pip install -e ".[dev]"
pytest

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pytest_agentkit-0.1.0.tar.gz (10.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pytest_agentkit-0.1.0-py3-none-any.whl (14.2 kB view details)

Uploaded Python 3

File details

Details for the file pytest_agentkit-0.1.0.tar.gz.

File metadata

  • Download URL: pytest_agentkit-0.1.0.tar.gz
  • Upload date:
  • Size: 10.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for pytest_agentkit-0.1.0.tar.gz
Algorithm Hash digest
SHA256 e92e53081ff3665bd39cbee4f5d2e392283a6ab5d5fa72931a658f40e7507dfc
MD5 c6b5482613df89e50a1e8df0275f0d43
BLAKE2b-256 458544f09bf20d7aa12baaac08ed91b7a79a7cebaa83dc6a10b283e03b3d01b7

See more details on using hashes here.

File details

Details for the file pytest_agentkit-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for pytest_agentkit-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 17a7d58de812cf1d8281b0b01328d17a70b5d8f1b0152616b2cf58cf1d754623
MD5 b0fd7717693474f2555dded3d47eda3c
BLAKE2b-256 b1dc8c901f6ee37c19ce024d5102ec31f354313ce31f90c4f916beacd845043b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page