pytest-compatible test harness for AI agents — deterministic record & replay for Anthropic Claude
Project description
agentprobe
pytest-compatible test harness for AI agents — deterministic record & replay for Anthropic Claude.
Test your Claude agents in CI without hitting the real API on every run. Record once, replay forever — zero cost, zero flakiness.
def test_agent_uses_bash(agentprobe):
with agentprobe.replay("tests/fixtures/list_files.jsonl") as probe:
result = my_agent.run("list files in /tmp")
probe.assert_tool_called("bash")
probe.assert_max_iterations(3)
probe.assert_output_contains("/tmp")
Install
pip install pytest-agentprobe
Requires Python 3.9+ and anthropic>=0.40.0.
How it works
agentprobe intercepts calls to anthropic.Anthropic.messages.create (and the async equivalent) at the class level — no changes to your agent code needed.
- Record mode — runs your agent against the real API and saves every request/response pair to a JSONL fixture file.
- Replay mode — feeds the saved responses back to your agent instead of making real API calls. Deterministic, instant, free.
- Auto mode — records on first run, replays on every subsequent run.
Quick start
1. Record a session
from agentprobe import Session
import anthropic
session = Session()
client = anthropic.Anthropic() # uses ANTHROPIC_API_KEY
with session.record("tests/fixtures/my_agent.jsonl") as probe:
result = my_agent.run(client, "what files are in /tmp?")
# assertions are optional during recording
probe.assert_tool_called("bash")
# fixture is written to disk — commit it to your repo
2. Replay in CI
def test_my_agent(agentprobe):
client = anthropic.Anthropic(api_key="dummy") # not used during replay
with agentprobe.replay("tests/fixtures/my_agent.jsonl") as probe:
result = my_agent.run(client, "what files are in /tmp?")
probe.assert_tool_called("bash")
probe.assert_not_tool_called("web_search")
probe.assert_tool_called_with("bash", command="ls /tmp")
probe.assert_max_iterations(4)
probe.assert_output_contains("/tmp")
probe.assert_stop_reason("end_turn")
probe.assert_max_tokens(500)
3. Auto mode (record-on-first-run)
def test_my_agent(agentprobe):
with agentprobe.auto("tests/fixtures/my_agent.jsonl") as probe:
result = my_agent.run(client, "what files are in /tmp?")
probe.assert_tool_called("bash")
Async agents
Full async support via AsyncAnthropic:
import pytest
import anthropic
from agentprobe import Session
@pytest.mark.asyncio
async def test_async_agent():
session = Session()
client = anthropic.AsyncAnthropic(api_key="dummy")
async with session.async_replay("tests/fixtures/my_agent.jsonl") as probe:
result = await my_async_agent.run(client, "list files in /tmp")
probe.assert_tool_called("bash")
probe.assert_max_iterations(3)
Async equivalents: async_record, async_replay, async_auto.
Assertion API
All assertions return self for chaining.
Iteration count
| Assertion | Description |
|---|---|
assert_max_iterations(n) |
At most n LLM calls |
assert_min_iterations(n) |
At least n LLM calls |
assert_iteration_count(n) |
Exactly n LLM calls |
Tool calls
| Assertion | Description |
|---|---|
assert_tool_called(name) |
Tool was called at least once |
assert_not_tool_called(name) |
Tool was never called |
assert_tool_called_with(name, **kwargs) |
Tool was called with these input fields |
assert_tool_called_before(first, second) |
First tool was called before second |
Output
| Assertion | Description |
|---|---|
assert_output_contains(text) |
Final text response contains text |
assert_output_not_contains(text) |
Final text response does not contain text |
assert_stop_reason(reason) |
Final call stop reason equals reason (e.g. "end_turn") |
Token budget
| Assertion | Description |
|---|---|
assert_max_tokens(n) |
Total tokens across all calls ≤ n |
Introspection
probe.iteration_count # int — number of LLM calls
probe.tools_called # list[str] — sorted tool names used
probe.final_output # str | None — last text block in session
probe.total_tokens # int — input + output tokens across all calls
probe.total_input_tokens # int
probe.total_output_tokens # int
CLI
Inspect and compare fixtures without writing Python:
# Pretty-print a fixture
agentprobe show tests/fixtures/my_agent.jsonl
# Compare two fixtures (exits 1 if differences found)
agentprobe diff tests/fixtures/v1.jsonl tests/fixtures/v2.jsonl
Example show output:
fixture: tests/fixtures/my_agent.jsonl (2 call(s))
── Call 1/2 model=claude-opus-4-8 stop=tool_use in=50 out=30 312ms
[tool_use] bash({"command": "ls /tmp"})
── Call 2/2 model=claude-opus-4-8 stop=end_turn in=80 out=25 280ms
[text] The /tmp directory contains: file1.txt, file2.txt, temp.log
total tokens: 185 (130 in + 55 out)
pytest fixture
agentprobe is auto-registered as a pytest plugin. The agentprobe fixture is available in all tests without any conftest.py setup:
def test_something(agentprobe):
with agentprobe.replay("tests/fixtures/session.jsonl") as probe:
...
To use Session directly (e.g. in scripts or non-pytest contexts):
from agentprobe import Session
session = Session()
with session.replay("tests/fixtures/session.jsonl") as probe:
...
Fixture format
Fixtures are newline-delimited JSON (.jsonl). Each line is one messages.create call:
{"request": {"model": "...", "messages": [...], "tools": [...]}, "response": {"id": "...", "content": [...], "stop_reason": "tool_use", "usage": {"input_tokens": 50, "output_tokens": 30}}, "timestamp": 1748700000.0, "duration_ms": 312.5}
Fixtures are plain text — safe to commit to git, diff in PRs, and edit by hand.
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pytest_agentprobe-0.1.0.tar.gz.
File metadata
- Download URL: pytest_agentprobe-0.1.0.tar.gz
- Upload date:
- Size: 10.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.11.17 {"installer":{"name":"uv","version":"0.11.17","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"26.04","id":"resolute","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
dab9f2463ac91924bb131f62348e7f9d6ae6e5f8d5a8a778a0fdca41ae8a3acb
|
|
| MD5 |
fec69600b3461bc01541c64fbd0cdc6f
|
|
| BLAKE2b-256 |
0333bf781408119691b0f9c1232bbe2d1300257f9e50e31f4ee70de14d19f42d
|
File details
Details for the file pytest_agentprobe-0.1.0-py3-none-any.whl.
File metadata
- Download URL: pytest_agentprobe-0.1.0-py3-none-any.whl
- Upload date:
- Size: 9.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.11.17 {"installer":{"name":"uv","version":"0.11.17","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"26.04","id":"resolute","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
11b3e9970075efe9f57f6df21f51f5bba6dde47a39c1079ae1c44ba3ecc9cb38
|
|
| MD5 |
f5b08f37e151108e8cdcb1e44cbf48ef
|
|
| BLAKE2b-256 |
dd1c4fe3f5965874dac62365d7ae80a6556520421648982d6cdd7258fc58e269
|