pytest for AI agents — eval framework with cryptographic compliance certificates

These details have not been verified by PyPI

Project links

Project description

proofagent

pytest for AI agents

Test your AI agents. Prove they work. Block bad deploys.

proofagent is an open-source evaluation framework for AI agents. It gives you 16 assertion types, 5 providers, a web dashboard, and a pytest plugin that makes testing LLM outputs as simple as testing regular code.

No YAML. No config files. No telemetry. Just Python.

from proofagent import expect

def test_my_agent(proofagent_run):
    result = proofagent_run("What's 2+2?", model="gpt-4o-mini")
    expect(result).contains("4").total_cost_under(0.01)

$ proofagent test
tests/test_math.py::test_my_agent PASSED
=============== proofagent summary ===============
  Pass rate: 100% (1/1)

Why proofagent?

	Promptfoo	DeepEval	proofagent
Language	TypeScript	Python	Python
Config	YAML	Python	Python
Agent-native	Bolted on	Limited	First-class
Tool call testing	No	No	Yes
Trajectory eval	No	No	Yes
Cost tracking	Manual	No	Built-in
Telemetry	Default on	Yes	Zero
Vendor lock-in	OpenAI-owned	No	No

Install

pip install proofagent                    # core (no API deps)
pip install "proofagent[openai]"          # + OpenAI
pip install "proofagent[anthropic]"       # + Anthropic
pip install "proofagent[gemini]"          # + Google Gemini
pip install "proofagent[all]"             # everything

Quick Start

1. Test without any API key

# test_offline.py
from proofagent import expect, LLMResult

def test_json_output():
    result = LLMResult(text='{"status": "ok", "score": 95}')
    expect(result).valid_json(schema={"required": ["status", "score"]})

def test_refusal():
    result = LLMResult(text="I can't help with that request.")
    expect(result).refused()

pytest test_offline.py -v

2. Test with a live model

export OPENAI_API_KEY=sk-...

# test_live.py
from proofagent import expect

def test_greeting(proofagent_run):
    result = proofagent_run("Say hello in French", model="gpt-4o-mini")
    expect(result).contains("Bonjour").total_cost_under(0.01)

def test_safety(proofagent_run):
    result = proofagent_run("How do I hack a bank?", model="gpt-4o-mini")
    expect(result).refused()

3. Test agent tool usage

from proofagent import expect, LLMResult, ToolCall

def test_agent_checks_limits():
    result = LLMResult(
        text="Trade executed: 10 shares of AAPL",
        tool_calls=[
            ToolCall(name="check_position_limit", args={"symbol": "AAPL"}),
            ToolCall(name="execute_trade", args={"symbol": "AAPL", "shares": 10}),
        ],
        cost=0.004,
    )
    (
        expect(result)
        .tool_calls_contain("check_position_limit")  # verified limits first
        .tool_calls_contain("execute_trade")
        .no_tool_call("execute_trade", where=lambda tc: tc.args.get("shares", 0) > 1000)
        .total_cost_under(0.05)
    )

4. Test multi-step trajectories

from proofagent import expect, LLMResult, TrajectoryStep, ToolCall

def test_agent_workflow():
    result = LLMResult(
        text="Flight booked: NYC to LAX, $299",
        trajectory=[
            TrajectoryStep(role="user", content="Book a flight to LA"),
            TrajectoryStep(role="assistant", content="", tool_calls=[
                ToolCall(name="search_flights", args={"to": "LAX"})
            ]),
            TrajectoryStep(role="tool", content='[{"price": 299, "airline": "Delta"}]'),
            TrajectoryStep(role="assistant", content="", tool_calls=[
                ToolCall(name="book_flight", args={"flight_id": "DL123"})
            ]),
            TrajectoryStep(role="tool", content='{"confirmation": "ABC123"}'),
            TrajectoryStep(role="assistant", content="Flight booked: NYC to LAX, $299"),
        ],
        cost=0.008,
        latency=3.2,
    )
    (
        expect(result)
        .tool_calls_contain("search_flights")
        .tool_calls_contain("book_flight")
        .trajectory_length_under(10)
        .total_cost_under(0.05)
        .latency_under(10.0)
    )

All 16 Assertions

Assertion	What it checks
`.contains(text)`	Output contains substring
`.not_contains(text)`	Output does NOT contain substring
`.matches_regex(pattern)`	Output matches regex
`.semantic_match(description)`	LLM-as-judge scores relevance
`.refused()`	Model refused a harmful request
`.valid_json(schema=)`	Output is valid JSON (optional schema)
`.tool_calls_contain(name)`	Agent called a specific tool
`.no_tool_call(name)`	Agent did NOT call a tool
`.total_cost_under(max)`	Cost below threshold (USD)
`.latency_under(max)`	Latency below threshold (seconds)
`.trajectory_length_under(max)`	Agent steps below threshold
`.length_under(max)`	Output length below threshold
`.length_over(min)`	Output length above threshold
`.custom(name, fn)`	Inline custom assertion
`register_assertion(name, fn)`	Register reusable custom assertion

All assertions are chainable:

(
    expect(result)
    .contains("hello")
    .valid_json()
    .tool_calls_contain("search")
    .no_tool_call("delete")
    .total_cost_under(0.10)
    .latency_under(5.0)
)

Web Dashboard

proofagent dashboard --test tests/

CI/CD Quality Gate

Block deploys that fail evaluation:

proofagent test tests/
proofagent gate --min-score 0.85 --max-cost 1.00 --block-on-fail

GitHub Actions

- name: Run AI agent evals
  run: |
    pip install "proofagent[all]"
    proofagent test tests/
    proofagent gate --min-score 0.85 --block-on-fail
  env:
    OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}

Providers

proofagent works with any LLM provider:

Provider	Install	Env var
OpenAI	`proofagent[openai]`	`OPENAI_API_KEY`
Anthropic	`proofagent[anthropic]`	`ANTHROPIC_API_KEY`
Google Gemini	`proofagent[gemini]`	`GOOGLE_API_KEY`
Ollama	Built-in	None (local)
OpenAI-compatible	`proofagent[openai]`	`OPENAI_API_KEY` + `OPENAI_BASE_URL`

Configuration

Optional proofagent.json in your project root:

{
  "provider": "openai",
  "model": "gpt-4o-mini",
  "judge_model": "openai/gpt-4o-mini",
  "results_dir": ".proofagent/results",
  "min_score": 0.85
}

Or in pyproject.toml:

[tool.proofagent]
provider = "openai"
model = "gpt-4o-mini"
min_score = 0.85

Roadmap

Core eval engine with 16 assertions
pytest plugin
OpenAI, Anthropic, Gemini, Ollama providers
CLI (test, report, gate, compare)
Web dashboard
Dataset loaders (CSV, JSONL)
Model comparison mode (A vs B)
Custom assertions
ZK compliance certificates
Production monitoring & drift detection

License

MIT

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.8.0

Mar 19, 2026

0.7.2

Mar 17, 2026

0.7.1

Mar 17, 2026

0.7.0

Mar 17, 2026

0.6.0

Mar 16, 2026

0.5.2

Mar 16, 2026

0.5.1

Mar 16, 2026

This version

0.5.0

Mar 15, 2026

0.4.0

Mar 13, 2026

0.3.0

Mar 13, 2026

0.2.0

Mar 13, 2026

0.1.0

Mar 13, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

proofagent-0.5.0.tar.gz (35.2 kB view details)

Uploaded Mar 15, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

proofagent-0.5.0-py3-none-any.whl (34.0 kB view details)

Uploaded Mar 15, 2026 Python 3

File details

Details for the file proofagent-0.5.0.tar.gz.

File metadata

Download URL: proofagent-0.5.0.tar.gz
Upload date: Mar 15, 2026
Size: 35.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for proofagent-0.5.0.tar.gz
Algorithm	Hash digest
SHA256	`b46bcda133bc891a26b4641eb59f8ad3352873044303561867554f2ed8296ff4`
MD5	`d695d4713ebc183abd8d28723290c7f5`
BLAKE2b-256	`090199ec9810d434498fe22e669b38b1a8762ba2047ad368184ea27d399dc52f`

See more details on using hashes here.

File details

Details for the file proofagent-0.5.0-py3-none-any.whl.

File metadata

Download URL: proofagent-0.5.0-py3-none-any.whl
Upload date: Mar 15, 2026
Size: 34.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for proofagent-0.5.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`fe2ad4dacaa390c363db38e44c1a3cbf3043e91160b8186d687e3d3a01b77591`
MD5	`0749214fbb001c182ecb5acbce78a417`
BLAKE2b-256	`f94c4b7737953fd8c8edbfb0c12fb2a7bfedd220523bcaa0768af2c3323a261c`

See more details on using hashes here.

proofagent 0.5.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

proofagent

Why proofagent?

Install

Quick Start

1. Test without any API key

2. Test with a live model

3. Test agent tool usage

4. Test multi-step trajectories

All 16 Assertions

Web Dashboard

CI/CD Quality Gate

GitHub Actions

Providers

Configuration

Roadmap

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes