pytest-compatible test harness for AI agents — deterministic record & replay for Anthropic Claude

These details have not been verified by PyPI

Project links

Project description

agentprobe

pytest-compatible test harness for AI agents — deterministic record & replay for Anthropic Claude.

Test your Claude agents in CI without hitting the real API on every run. Record once, replay forever — zero cost, zero flakiness.

def test_agent_uses_bash(agentprobe):
    with agentprobe.replay("tests/fixtures/list_files.jsonl") as probe:
        result = my_agent.run("list files in /tmp")
        probe.assert_tool_called("bash")
        probe.assert_max_iterations(3)
        probe.assert_output_contains("/tmp")

Install

pip install pytest-agentprobe

Requires Python 3.9+ and anthropic>=0.40.0.

How it works

agentprobe intercepts calls to anthropic.Anthropic.messages.create (and the async equivalent) at the class level — no changes to your agent code needed.

Record mode — runs your agent against the real API and saves every request/response pair to a JSONL fixture file.
Replay mode — feeds the saved responses back to your agent instead of making real API calls. Deterministic, instant, free.
Auto mode — records on first run, replays on every subsequent run.

Quick start

1. Record a session

from agentprobe import Session
import anthropic

session = Session()
client = anthropic.Anthropic()  # uses ANTHROPIC_API_KEY

with session.record("tests/fixtures/my_agent.jsonl") as probe:
    result = my_agent.run(client, "what files are in /tmp?")
    # assertions are optional during recording
    probe.assert_tool_called("bash")

# fixture is written to disk — commit it to your repo

2. Replay in CI

def test_my_agent(agentprobe):
    client = anthropic.Anthropic(api_key="dummy")  # not used during replay

    with agentprobe.replay("tests/fixtures/my_agent.jsonl") as probe:
        result = my_agent.run(client, "what files are in /tmp?")

        probe.assert_tool_called("bash")
        probe.assert_not_tool_called("web_search")
        probe.assert_tool_called_with("bash", command="ls /tmp")
        probe.assert_max_iterations(4)
        probe.assert_output_contains("/tmp")
        probe.assert_stop_reason("end_turn")
        probe.assert_max_tokens(500)

3. Auto mode (record-on-first-run)

def test_my_agent(agentprobe):
    with agentprobe.auto("tests/fixtures/my_agent.jsonl") as probe:
        result = my_agent.run(client, "what files are in /tmp?")
        probe.assert_tool_called("bash")

Async agents

Full async support via AsyncAnthropic:

import pytest
import anthropic
from agentprobe import Session

@pytest.mark.asyncio
async def test_async_agent():
    session = Session()
    client = anthropic.AsyncAnthropic(api_key="dummy")

    async with session.async_replay("tests/fixtures/my_agent.jsonl") as probe:
        result = await my_async_agent.run(client, "list files in /tmp")
        probe.assert_tool_called("bash")
        probe.assert_max_iterations(3)

Async equivalents: async_record, async_replay, async_auto.

Assertion API

All assertions return self for chaining.

Iteration count

Assertion	Description
`assert_max_iterations(n)`	At most n LLM calls
`assert_min_iterations(n)`	At least n LLM calls
`assert_iteration_count(n)`	Exactly n LLM calls

Tool calls

Assertion	Description
`assert_tool_called(name)`	Tool was called at least once
`assert_not_tool_called(name)`	Tool was never called
`assert_tool_called_with(name, **kwargs)`	Tool was called with these input fields
`assert_tool_called_before(first, second)`	First tool was called before second

Output

Assertion	Description
`assert_output_contains(text)`	Final text response contains text
`assert_output_not_contains(text)`	Final text response does not contain text
`assert_stop_reason(reason)`	Final call stop reason equals reason (e.g. `"end_turn"`)

Token budget

Assertion	Description
`assert_max_tokens(n)`	Total tokens across all calls ≤ n

Introspection

probe.iteration_count       # int — number of LLM calls
probe.tools_called          # list[str] — sorted tool names used
probe.final_output          # str | None — last text block in session
probe.total_tokens          # int — input + output tokens across all calls
probe.total_input_tokens    # int
probe.total_output_tokens   # int

CLI

Inspect and compare fixtures without writing Python:

# Pretty-print a fixture
agentprobe show tests/fixtures/my_agent.jsonl

# Compare two fixtures (exits 1 if differences found)
agentprobe diff tests/fixtures/v1.jsonl tests/fixtures/v2.jsonl

Example show output:

fixture: tests/fixtures/my_agent.jsonl  (2 call(s))

── Call 1/2  model=claude-opus-4-8  stop=tool_use  in=50 out=30  312ms
  [tool_use] bash({"command": "ls /tmp"})

── Call 2/2  model=claude-opus-4-8  stop=end_turn  in=80 out=25  280ms
  [text] The /tmp directory contains: file1.txt, file2.txt, temp.log

total tokens: 185  (130 in + 55 out)

pytest fixture

agentprobe is auto-registered as a pytest plugin. The agentprobe fixture is available in all tests without any conftest.py setup:

def test_something(agentprobe):
    with agentprobe.replay("tests/fixtures/session.jsonl") as probe:
        ...

To use Session directly (e.g. in scripts or non-pytest contexts):

from agentprobe import Session

session = Session()
with session.replay("tests/fixtures/session.jsonl") as probe:
    ...

Fixture format

Fixtures are newline-delimited JSON (.jsonl). Each line is one messages.create call:

{"request": {"model": "...", "messages": [...], "tools": [...]}, "response": {"id": "...", "content": [...], "stop_reason": "tool_use", "usage": {"input_tokens": 50, "output_tokens": 30}}, "timestamp": 1748700000.0, "duration_ms": 312.5}

Fixtures are plain text — safe to commit to git, diff in PRs, and edit by hand.

License

MIT

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.0

Jun 1, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pytest_agentprobe-0.1.0.tar.gz (10.2 kB view details)

Uploaded Jun 1, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

pytest_agentprobe-0.1.0-py3-none-any.whl (9.8 kB view details)

Uploaded Jun 1, 2026 Python 3

File details

Details for the file pytest_agentprobe-0.1.0.tar.gz.

File metadata

Download URL: pytest_agentprobe-0.1.0.tar.gz
Upload date: Jun 1, 2026
Size: 10.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.11.17 {"installer":{"name":"uv","version":"0.11.17","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"26.04","id":"resolute","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for pytest_agentprobe-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`dab9f2463ac91924bb131f62348e7f9d6ae6e5f8d5a8a778a0fdca41ae8a3acb`
MD5	`fec69600b3461bc01541c64fbd0cdc6f`
BLAKE2b-256	`0333bf781408119691b0f9c1232bbe2d1300257f9e50e31f4ee70de14d19f42d`

See more details on using hashes here.

File details

Details for the file pytest_agentprobe-0.1.0-py3-none-any.whl.

File metadata

Download URL: pytest_agentprobe-0.1.0-py3-none-any.whl
Upload date: Jun 1, 2026
Size: 9.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.11.17 {"installer":{"name":"uv","version":"0.11.17","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"26.04","id":"resolute","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for pytest_agentprobe-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`11b3e9970075efe9f57f6df21f51f5bba6dde47a39c1079ae1c44ba3ecc9cb38`
MD5	`f5b08f37e151108e8cdcb1e44cbf48ef`
BLAKE2b-256	`dd1c4fe3f5965874dac62365d7ae80a6556520421648982d6cdd7258fc58e269`

See more details on using hashes here.

pytest-agentprobe 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

agentprobe

Install

How it works

Quick start

1. Record a session

2. Replay in CI

3. Auto mode (record-on-first-run)

Async agents

Assertion API

Iteration count

Tool calls

Output

Token budget

Introspection

CLI

pytest fixture

Fixture format

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes