Behavioral testing framework for AI agents — pytest for AI agents
Project description
AgentAssert
The Behavioral Testing Framework for AI Agents
"pytest for AI Agents" — Write tests. Run agents. Ship with confidence.
Quick Start • Features • Documentation • Contributing
The Problem
Teams building AI agents face a critical gap in their development workflow:
| What Exists | What's Missing |
|---|---|
| Observability (LangSmith, Langfuse) | Behavioral test runners |
| Evaluation dashboards | CI/CD pass/fail gates |
| LLM output quality metrics | Agent execution path testing |
AgentAssert fills this gap. It's an open-source, framework-agnostic, local-first behavioral test framework designed specifically for AI agent pipelines.
Why AgentAssert?
- Framework Agnostic — Works with LangChain, CrewAI, AutoGen, LlamaIndex, or raw API calls
- Local First — No cloud accounts, no dashboards, no external services required
- CI/CD Native — Designed for
git push → test → deployworkflows - Deterministic — Seeded execution for reproducible test runs
- Developer Friendly — Familiar pytest-like syntax and workflow
Quick Start
Installation
pip install agentassert
Write Your First Test
# tests/test_my_agent.py
from agentassert import agent_test, expect, mock_tool, contains
@agent_test
def test_research_agent_workflow(agent_harness):
"""Test that the research agent follows the correct tool sequence."""
# 1. Create mock tools with deterministic responses
search = mock_tool("web_search", returns={"results": ["AI breakthrough news"]})
summarize = mock_tool("summarize", returns="Key finding: AI is advancing rapidly")
# 2. Run your agent under test
trace = agent_harness.run(
agent=my_research_agent,
input="Find the latest AI news",
tools=[search, summarize]
)
# 3. Assert on behavioral expectations
expect(trace).tool("web_search").was_called()
expect(trace).tool("web_search").called_before("summarize")
expect(trace).tool("web_search").called_with(query=contains("AI"))
expect(trace).completed_within_steps(10)
expect(trace).output.not_empty()
Run Your Tests
$ agentassert run tests/
AgentAssert v0.1.0 — Behavioral Testing Framework for AI Agents
collecting ... 3 tests
tests/test_my_agent.py
✓ test_research_agent_workflow (2 steps, $0.002, 0.3s)
✓ test_handles_api_failure (1 steps, $0.001, 0.1s)
✓ test_stays_within_budget (4 steps, $0.008, 0.5s)
════════════════════════════════════════════════════════════
3 passed in 0.9s
════════════════════════════════════════════════════════════
Features
Mock Tools
Create deterministic tool responses for predictable testing:
# Static response
search = mock_tool("web_search", returns={"results": ["item1", "item2"]})
# Sequential responses
api = mock_tool("api_call", returns_sequence=[
{"status": "pending"},
{"status": "processing"},
{"status": "complete"}
])
# Simulate failures
flaky_api = mock_tool("external_service", raises=ConnectionError("timeout"))
# Rate limiting simulation
limited_api = mock_tool("rate_limited_api", returns="ok", fail_after=5)
Behavioral Assertions
Assert on how your agent behaves, not just what it outputs:
# Tool invocation assertions
expect(trace).tool("search").was_called()
expect(trace).tool("search").was_not_called()
expect(trace).tool("search").called_exactly(3)
expect(trace).tool("search").called_at_least(1)
# Execution order assertions
expect(trace).tool("fetch_data").called_before("process_data")
expect(trace).tool("cleanup").called_after("main_task")
# Argument matching with flexible matchers
expect(trace).tool("api").called_with(
query=contains("search term"),
limit=greater_than(0),
filters=has_key("category")
)
# Execution behavior assertions
expect(trace).completed()
expect(trace).completed_within_steps(15)
expect(trace).failed_gracefully()
# Output assertions
expect(trace).output.not_empty()
expect(trace).output.contains("success")
expect(trace).output.is_valid_json()
Rich Matchers
Flexible matchers for complex assertion scenarios:
from agentassert import (
# String matchers
contains, matches, starts_with, ends_with, any_string,
# Numeric matchers
greater_than, less_than, between,
# Collection matchers
has_key, has_length, contains_item,
# Logical matchers
all_of, any_of, not_, anything
)
# Combine matchers for precise assertions
expect(trace).tool("search").called_with(
query=all_of(
starts_with("user:"),
contains("search"),
not_(contains("admin"))
)
)
CLI Reference
# Run all tests in current directory
agentassert run
# Run tests in a specific directory
agentassert run tests/
# Run a specific test file
agentassert run tests/test_research_agent.py
# Filter tests by keyword
agentassert run -k "search"
# Verbose output with full tracebacks
agentassert run -v
# Minimal output (dots only)
agentassert run -q
# Set random seed for reproducibility
agentassert run --seed 12345
# Stop on first failure
agentassert run -x
Framework Integration
AgentAssert is designed to work with any agent framework:
|
LangChain @agent_test
def test_langchain_agent(agent_harness):
from langchain.agents import AgentExecutor
trace = agent_harness.run(
agent=my_langchain_agent,
input="Analyze this data"
)
expect(trace).completed()
|
Custom Agents @agent_test
def test_custom_agent(agent_harness):
def my_agent(prompt, tools):
# Your custom logic
return result
trace = agent_harness.run(
agent=my_agent,
input="Process request"
)
expect(trace).completed()
|
CI/CD Integration
GitHub Actions
name: Agent Tests
on: [push, pull_request]
jobs:
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: '3.11'
- name: Install dependencies
run: |
pip install agentassert
pip install -r requirements.txt
- name: Run agent tests
run: agentassert run tests/ -v
GitLab CI
agent-tests:
image: python:3.11
script:
- pip install agentassert
- agentassert run tests/
Comparison with Alternatives
| Feature | AgentAssert | LangSmith | Langfuse | DeepEval |
|---|---|---|---|---|
| Local execution | Yes | No | Partial | Yes |
| No account required | Yes | No | No | Yes |
| Framework agnostic | Yes | No | Yes | Yes |
| Behavioral assertions | Yes | No | No | No |
| Tool call testing | Yes | Partial | Partial | No |
| CI/CD native | Yes | Partial | Partial | Yes |
| Deterministic replay | Yes | No | No | No |
Documentation
- Sample Tests — Working examples to get started
- Contributing Guide — How to contribute to the project
Contributing
We welcome contributions from the community. Please read our Contributing Guidelines before submitting PRs.
License
AgentAssert is released under the MIT License. See LICENSE for details.
Built by Kaushik Dhola
pip install agentassert
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file agentassert-0.1.1.tar.gz.
File metadata
- Download URL: agentassert-0.1.1.tar.gz
- Upload date:
- Size: 33.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1eba930b7a31c4f5838f99bbd3d1541039d0edda2c6e0d42150150ad7a7df4ce
|
|
| MD5 |
101397c8351a0aa4970d0bc0b1beb044
|
|
| BLAKE2b-256 |
5644749ecba64e6e5e5012b5707296e7ba36c3e1faa470ae27ed2ac52604e36d
|
File details
Details for the file agentassert-0.1.1-py3-none-any.whl.
File metadata
- Download URL: agentassert-0.1.1-py3-none-any.whl
- Upload date:
- Size: 41.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
22a6027c99c816e273a1e0571ea80c371708a258aa90455c8f852e96dac64ec1
|
|
| MD5 |
cc1b3b68556dc58386fc8b54601c718f
|
|
| BLAKE2b-256 |
ed063d18f9d156d2c0233e8077e05377c90cd8e39e5b272e1c8c1a841a506664
|