Skip to main content

Behavioral testing framework for AI agents — pytest for AI agents

Project description

Version Python License

AgentAssert

The Behavioral Testing Framework for AI Agents

"pytest for AI Agents" — Write tests. Run agents. Ship with confidence.

Quick StartFeaturesDocumentationContributing


The Problem

Teams building AI agents face a critical gap in their development workflow:

What Exists What's Missing
Observability (LangSmith, Langfuse) Behavioral test runners
Evaluation dashboards CI/CD pass/fail gates
LLM output quality metrics Agent execution path testing

AgentAssert fills this gap. It's an open-source, framework-agnostic, local-first behavioral test framework designed specifically for AI agent pipelines.

Why AgentAssert?

  • Framework Agnostic — Works with LangChain, CrewAI, AutoGen, LlamaIndex, or raw API calls
  • Local First — No cloud accounts, no dashboards, no external services required
  • CI/CD Native — Designed for git push → test → deploy workflows
  • Deterministic — Seeded execution for reproducible test runs
  • Developer Friendly — Familiar pytest-like syntax and workflow

Quick Start

Installation

pip install agentassert

Write Your First Test

# tests/test_my_agent.py
from agentassert import agent_test, expect, mock_tool, contains

@agent_test
def test_research_agent_workflow(agent_harness):
    """Test that the research agent follows the correct tool sequence."""
    
    # 1. Create mock tools with deterministic responses
    search = mock_tool("web_search", returns={"results": ["AI breakthrough news"]})
    summarize = mock_tool("summarize", returns="Key finding: AI is advancing rapidly")

    # 2. Run your agent under test
    trace = agent_harness.run(
        agent=my_research_agent,
        input="Find the latest AI news",
        tools=[search, summarize]
    )

    # 3. Assert on behavioral expectations
    expect(trace).tool("web_search").was_called()
    expect(trace).tool("web_search").called_before("summarize")
    expect(trace).tool("web_search").called_with(query=contains("AI"))
    expect(trace).completed_within_steps(10)
    expect(trace).output.not_empty()

Run Your Tests

$ agentassert run tests/

AgentAssert v0.1.0  Behavioral Testing Framework for AI Agents
collecting ... 3 tests

tests/test_my_agent.py
   test_research_agent_workflow        (2 steps, $0.002, 0.3s)
   test_handles_api_failure            (1 steps, $0.001, 0.1s)
   test_stays_within_budget            (4 steps, $0.008, 0.5s)

════════════════════════════════════════════════════════════
3 passed in 0.9s
════════════════════════════════════════════════════════════

Features

Mock Tools

Create deterministic tool responses for predictable testing:

# Static response
search = mock_tool("web_search", returns={"results": ["item1", "item2"]})

# Sequential responses
api = mock_tool("api_call", returns_sequence=[
    {"status": "pending"},
    {"status": "processing"},
    {"status": "complete"}
])

# Simulate failures
flaky_api = mock_tool("external_service", raises=ConnectionError("timeout"))

# Rate limiting simulation
limited_api = mock_tool("rate_limited_api", returns="ok", fail_after=5)

Behavioral Assertions

Assert on how your agent behaves, not just what it outputs:

# Tool invocation assertions
expect(trace).tool("search").was_called()
expect(trace).tool("search").was_not_called()
expect(trace).tool("search").called_exactly(3)
expect(trace).tool("search").called_at_least(1)

# Execution order assertions
expect(trace).tool("fetch_data").called_before("process_data")
expect(trace).tool("cleanup").called_after("main_task")

# Argument matching with flexible matchers
expect(trace).tool("api").called_with(
    query=contains("search term"),
    limit=greater_than(0),
    filters=has_key("category")
)

# Execution behavior assertions
expect(trace).completed()
expect(trace).completed_within_steps(15)
expect(trace).failed_gracefully()

# Output assertions
expect(trace).output.not_empty()
expect(trace).output.contains("success")
expect(trace).output.is_valid_json()

Rich Matchers

Flexible matchers for complex assertion scenarios:

from agentassert import (
    # String matchers
    contains, matches, starts_with, ends_with, any_string,
    
    # Numeric matchers
    greater_than, less_than, between,
    
    # Collection matchers
    has_key, has_length, contains_item,
    
    # Logical matchers
    all_of, any_of, not_, anything
)

# Combine matchers for precise assertions
expect(trace).tool("search").called_with(
    query=all_of(
        starts_with("user:"),
        contains("search"),
        not_(contains("admin"))
    )
)

CLI Reference

# Run all tests in current directory
agentassert run

# Run tests in a specific directory
agentassert run tests/

# Run a specific test file
agentassert run tests/test_research_agent.py

# Filter tests by keyword
agentassert run -k "search"

# Verbose output with full tracebacks
agentassert run -v

# Minimal output (dots only)
agentassert run -q

# Set random seed for reproducibility
agentassert run --seed 12345

# Stop on first failure
agentassert run -x

Framework Integration

AgentAssert is designed to work with any agent framework:

LangChain

@agent_test
def test_langchain_agent(agent_harness):
    from langchain.agents import AgentExecutor
    
    trace = agent_harness.run(
        agent=my_langchain_agent,
        input="Analyze this data"
    )
    expect(trace).completed()

Custom Agents

@agent_test
def test_custom_agent(agent_harness):
    def my_agent(prompt, tools):
        # Your custom logic
        return result
    
    trace = agent_harness.run(
        agent=my_agent,
        input="Process request"
    )
    expect(trace).completed()

CI/CD Integration

GitHub Actions

name: Agent Tests

on: [push, pull_request]

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      
      - name: Set up Python
        uses: actions/setup-python@v5
        with:
          python-version: '3.11'
      
      - name: Install dependencies
        run: |
          pip install agentassert
          pip install -r requirements.txt
      
      - name: Run agent tests
        run: agentassert run tests/ -v

GitLab CI

agent-tests:
  image: python:3.11
  script:
    - pip install agentassert
    - agentassert run tests/

Comparison with Alternatives

Feature AgentAssert LangSmith Langfuse DeepEval
Local execution Yes No Partial Yes
No account required Yes No No Yes
Framework agnostic Yes No Yes Yes
Behavioral assertions Yes No No No
Tool call testing Yes Partial Partial No
CI/CD native Yes Partial Partial Yes
Deterministic replay Yes No No No

Documentation


Contributing

We welcome contributions from the community. Please read our Contributing Guidelines before submitting PRs.


License

AgentAssert is released under the MIT License. See LICENSE for details.


Built by Kaushik Dhola

pip install agentassert

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

agentassert-0.1.1.tar.gz (33.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

agentassert-0.1.1-py3-none-any.whl (41.6 kB view details)

Uploaded Python 3

File details

Details for the file agentassert-0.1.1.tar.gz.

File metadata

  • Download URL: agentassert-0.1.1.tar.gz
  • Upload date:
  • Size: 33.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.0

File hashes

Hashes for agentassert-0.1.1.tar.gz
Algorithm Hash digest
SHA256 1eba930b7a31c4f5838f99bbd3d1541039d0edda2c6e0d42150150ad7a7df4ce
MD5 101397c8351a0aa4970d0bc0b1beb044
BLAKE2b-256 5644749ecba64e6e5e5012b5707296e7ba36c3e1faa470ae27ed2ac52604e36d

See more details on using hashes here.

File details

Details for the file agentassert-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: agentassert-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 41.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.0

File hashes

Hashes for agentassert-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 22a6027c99c816e273a1e0571ea80c371708a258aa90455c8f852e96dac64ec1
MD5 cc1b3b68556dc58386fc8b54601c718f
BLAKE2b-256 ed063d18f9d156d2c0233e8077e05377c90cd8e39e5b272e1c8c1a841a506664

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page