pytest-based behavioral testing framework for AI agents
Project description
AgentProof
pytest-based behavioral testing for AI agents.
pip install agentproof
One line. No config. Works with any agent framework.
The Problem
You can observe your agents. You can trace them. You can log every token.
But can you test them?
89% of teams have agent observability. Only 52% have agent evaluation. Zero have behavioral testing in CI.
Your agent calls the wrong tool? Ships to production. Costs $40 on a $0.50 task? Ships to production. Hallucinates the answer? Ships. To. Production.
The Solution
AgentProof brings pytest-style behavioral testing to AI agents. Test what your agent does, not just what it outputs.
# test_booking_agent.py
import agentproof
from agentproof import assert_tool_called, assert_tool_order, assert_max_cost
def test_booking_agent_searches_before_booking(agent_run):
run = make_booking_run() # your agent execution
# Did it use the right tools?
assert_tool_called(run, "search_flights")
assert_tool_called(run, "book_flight")
# Did it use them in the right order?
assert_tool_order(run, ["search_flights", "compare_prices", "book_flight"])
# Did it stay within budget?
assert_max_cost(run, max_usd=0.50)
$ pytest
========================= test session starts =========================
test_booking_agent.py::test_booking_agent_searches_before_booking PASSED
test_booking_agent.py::test_cost_stays_under_budget PASSED
test_booking_agent.py::test_no_hallucination PASSED
========================= 3 passed in 0.04s ============================
Quickstart
1. Install
pip install agentproof
2. Record an agent run
import agentproof
@agentproof.record
def my_agent(prompt: str) -> str:
# Your agent code here
agentproof.add_tool_call("search", arguments={"query": prompt})
agentproof.add_llm_call("gpt-4o", prompt_tokens=500, completion_tokens=200)
return "The answer is 42"
result = my_agent("What is the meaning of life?")
run = my_agent.last_run
3. Write tests
from agentproof import (
assert_tool_called,
assert_tool_order,
assert_max_cost,
assert_max_steps,
assert_no_hallucination,
)
def test_my_agent():
run = my_agent.last_run
assert_tool_called(run, "search")
assert_max_cost(run, 0.10)
assert_max_steps(run, 5)
assert_no_hallucination(run, ground_truth="42")
Core Assertions
| Assertion | What it tests |
|---|---|
assert_tool_called(run, "search") |
Tool was used (optionally N times) |
assert_tool_order(run, ["a", "b", "c"]) |
Tools called in correct sequence |
assert_max_cost(run, 0.50) |
Total cost under budget (200+ models) |
assert_max_steps(run, 10) |
Agent didn't spin out |
assert_no_hallucination(run, truth) |
Output grounded in source (TF-IDF, no API) |
Framework Adapters
Works with any agent framework. Install the adapter you need:
pip install agentproof[langchain]
pip install agentproof[crewai]
pip install agentproof[openai]
pip install agentproof[otel] # Any OTEL-instrumented framework
# LangChain
from agentproof.adapters.langchain import from_langchain_run
run = from_langchain_run(agent_executor.invoke({"input": "..."}))
# CrewAI
from agentproof.adapters.crewai import from_crewai_result
run = from_crewai_result(crew.kickoff())
# OpenAI Agents SDK
from agentproof.adapters.openai_agents import from_openai_response
run = from_openai_response(Runner.run(agent, "..."))
Replay Testing
Record once, test forever. Capture a real agent run and replay it in CI without making API calls:
from agentproof import save_trace, load_trace
# Record
save_trace(run, "tests/fixtures/booking_success.jsonl")
# Replay in tests
def test_booking_replay():
run = load_trace("tests/fixtures/booking_success.jsonl")
assert_tool_called(run, "book_flight")
assert_max_cost(run, 0.50)
Snapshot Testing
Like Jest snapshots but for agent behavior. Detect behavioral regressions automatically:
from agentproof import assert_snapshot
def test_agent_behavior_stable(snapshot_dir):
run = my_agent("book a flight to NYC")
assert_snapshot(run, snapshot_dir / "booking_agent.json")
# First run: creates snapshot
# Subsequent runs: compares against saved snapshot
Update snapshots: pytest --agentproof-update-snapshots
Bundled Cost Database
200+ LLM models with up-to-date pricing. No API calls needed.
from agentproof import get_model_pricing, calculate_run_cost
pricing = get_model_pricing("gpt-4o")
# {"provider": "openai", "prompt": 2.50, "completion": 10.00}
cost = calculate_run_cost(run)
# CostBreakdown(total_cost_usd=0.0325, ...)
Models: GPT-4o, Claude 4.5, Gemini 2.5, Llama 4, DeepSeek V3, Mistral Large, and 190+ more.
CI/CD Integration
GitHub Actions
- uses: praxiumlabs/agentproof@v1
with:
test-path: tests/
Or manually:
- run: |
pip install agentproof
pytest tests/ -v
Comparison
| Feature | AgentProof | DeepEval | Braintrust |
|---|---|---|---|
| pytest native | Yes | Yes | No |
| Behavioral assertions | Yes | No | No |
| Tool sequence testing | Yes | No | No |
| Cost assertions | Yes | No | No |
| No API key needed | Yes | No | No |
| JSONL replay | Yes | No | No |
| Snapshot testing | Yes | No | No |
| Framework adapters | 4 | 2 | 1 |
| Package size | <500KB | ~50MB | Cloud |
| Price | Free | Freemium | Paid |
Honest Limitations
- Hallucination detection uses TF-IDF — it catches obvious fabrication, not subtle inaccuracies. For LLM-as-judge evaluation, use DeepEval or Braintrust.
- Token counts must be provided — we don't intercept API calls (by design). Use framework adapters or log tokens manually.
- No real-time monitoring — AgentProof is a testing tool, not an observability platform. Use it alongside your existing tracing.
- Cost DB needs community updates — model pricing changes. Submit a PR to update
agentproof/data/models.yaml.
Contributing
git clone https://github.com/praxiumlabs/agentproof
cd agentproof
pip install -e ".[dev]"
pytest
Adding a model to the cost DB:
Edit agentproof/data/models.yaml:
new-model-name:
provider: provider-name
prompt: 1.50
completion: 5.00
License
MIT License. See LICENSE.
Built by Sarak Dahal at Praxium Labs. Star the repo if AgentProof saves your agents.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file agentproof-0.1.0.tar.gz.
File metadata
- Download URL: agentproof-0.1.0.tar.gz
- Upload date:
- Size: 23.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
35801703d5ad635f08c63c4da921f52ff187131bee7685ed31d733f3a367b76a
|
|
| MD5 |
2c8d3324fb2e0519aef8f7f9b9b4076f
|
|
| BLAKE2b-256 |
868940ff7c6bf1101f3ddbc66c8d0d36e59adffbea3677a900944ada53fb8845
|
Provenance
The following attestation bundles were made for agentproof-0.1.0.tar.gz:
Publisher:
publish.yml on praxiumlabs/agentproof
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
agentproof-0.1.0.tar.gz -
Subject digest:
35801703d5ad635f08c63c4da921f52ff187131bee7685ed31d733f3a367b76a - Sigstore transparency entry: 956306198
- Sigstore integration time:
-
Permalink:
praxiumlabs/agentproof@27266045805633e6cd94b6b3124ce894350a54a7 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/praxiumlabs
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@27266045805633e6cd94b6b3124ce894350a54a7 -
Trigger Event:
push
-
Statement type:
File details
Details for the file agentproof-0.1.0-py3-none-any.whl.
File metadata
- Download URL: agentproof-0.1.0-py3-none-any.whl
- Upload date:
- Size: 30.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
68f8f3d0b29770f32c093154832735c3bb0c3b35b1b791705ffa1f94b09af2df
|
|
| MD5 |
a9983befc2d775e3cf219aeec0d702d9
|
|
| BLAKE2b-256 |
63141f5432c55bec48a882fe2217d94500179a7d2f8b6686ca638ba9dcb262a5
|
Provenance
The following attestation bundles were made for agentproof-0.1.0-py3-none-any.whl:
Publisher:
publish.yml on praxiumlabs/agentproof
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
agentproof-0.1.0-py3-none-any.whl -
Subject digest:
68f8f3d0b29770f32c093154832735c3bb0c3b35b1b791705ffa1f94b09af2df - Sigstore transparency entry: 956306201
- Sigstore integration time:
-
Permalink:
praxiumlabs/agentproof@27266045805633e6cd94b6b3124ce894350a54a7 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/praxiumlabs
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@27266045805633e6cd94b6b3124ce894350a54a7 -
Trigger Event:
push
-
Statement type: