Behavioral testing framework for AI agents — pytest for AI agents

These details have not been verified by PyPI

Project links

Project description

Version Python License

AgentUnit

The Behavioral Testing Framework for AI Agents

"pytest for AI Agents" — Write tests. Run agents. Ship with confidence.

Quick Start • Features • Documentation • Contributing

The Problem

Teams building AI agents face a critical gap in their development workflow:

What Exists	What's Missing
Observability (LangSmith, Langfuse)	Behavioral test runners
Evaluation dashboards	CI/CD pass/fail gates
LLM output quality metrics	Agent execution path testing

AgentUnit fills this gap. It's an open-source, framework-agnostic, local-first behavioral test framework designed specifically for AI agent pipelines.

Why AgentUnit?

Framework Agnostic — Works with LangChain, CrewAI, AutoGen, LlamaIndex, or raw API calls
Local First — No cloud accounts, no dashboards, no external services required
CI/CD Native — Designed for git push → test → deploy workflows
Deterministic — Seeded execution for reproducible test runs
Developer Friendly — Familiar pytest-like syntax and workflow

Quick Start

Installation

pip install agentassert

Write Your First Test

# tests/test_my_agent.py
from agentunit import agent_test, expect, mock_tool, contains

@agent_test
def test_research_agent_workflow(agent_harness):
    """Test that the research agent follows the correct tool sequence."""
    
    # 1. Create mock tools with deterministic responses
    search = mock_tool("web_search", returns={"results": ["AI breakthrough news"]})
    summarize = mock_tool("summarize", returns="Key finding: AI is advancing rapidly")

    # 2. Run your agent under test
    trace = agent_harness.run(
        agent=my_research_agent,
        input="Find the latest AI news",
        tools=[search, summarize]
    )

    # 3. Assert on behavioral expectations
    expect(trace).tool("web_search").was_called()
    expect(trace).tool("web_search").called_before("summarize")
    expect(trace).tool("web_search").called_with(query=contains("AI"))
    expect(trace).completed_within_steps(10)
    expect(trace).output.not_empty()

Run Your Tests

$ agentunit run tests/

AgentUnit v0.1.0 — Behavioral Testing Framework for AI Agents
collecting ... 3 tests

tests/test_my_agent.py
  ✓ test_research_agent_workflow        (2 steps, $0.002, 0.3s)
  ✓ test_handles_api_failure            (1 steps, $0.001, 0.1s)
  ✓ test_stays_within_budget            (4 steps, $0.008, 0.5s)

════════════════════════════════════════════════════════════
3 passed in 0.9s
════════════════════════════════════════════════════════════

Features

Mock Tools

Create deterministic tool responses for predictable testing:

# Static response
search = mock_tool("web_search", returns={"results": ["item1", "item2"]})

# Sequential responses
api = mock_tool("api_call", returns_sequence=[
    {"status": "pending"},
    {"status": "processing"},
    {"status": "complete"}
])

# Simulate failures
flaky_api = mock_tool("external_service", raises=ConnectionError("timeout"))

# Rate limiting simulation
limited_api = mock_tool("rate_limited_api", returns="ok", fail_after=5)

Behavioral Assertions

Assert on how your agent behaves, not just what it outputs:

# Tool invocation assertions
expect(trace).tool("search").was_called()
expect(trace).tool("search").was_not_called()
expect(trace).tool("search").called_exactly(3)
expect(trace).tool("search").called_at_least(1)

# Execution order assertions
expect(trace).tool("fetch_data").called_before("process_data")
expect(trace).tool("cleanup").called_after("main_task")

# Argument matching with flexible matchers
expect(trace).tool("api").called_with(
    query=contains("search term"),
    limit=greater_than(0),
    filters=has_key("category")
)

# Execution behavior assertions
expect(trace).completed()
expect(trace).completed_within_steps(15)
expect(trace).failed_gracefully()

# Output assertions
expect(trace).output.not_empty()
expect(trace).output.contains("success")
expect(trace).output.is_valid_json()

Rich Matchers

Flexible matchers for complex assertion scenarios:

from agentunit import (
    # String matchers
    contains, matches, starts_with, ends_with, any_string,
    
    # Numeric matchers
    greater_than, less_than, between,
    
    # Collection matchers
    has_key, has_length, contains_item,
    
    # Logical matchers
    all_of, any_of, not_, anything
)

# Combine matchers for precise assertions
expect(trace).tool("search").called_with(
    query=all_of(
        starts_with("user:"),
        contains("search"),
        not_(contains("admin"))
    )
)

CLI Reference

# Run all tests in current directory
agentunit run

# Run tests in a specific directory
agentunit run tests/

# Run a specific test file
agentunit run tests/test_research_agent.py

# Filter tests by keyword
agentunit run -k "search"

# Verbose output with full tracebacks
agentunit run -v

# Minimal output (dots only)
agentunit run -q

# Set random seed for reproducibility
agentunit run --seed 12345

# Stop on first failure
agentunit run -x

Framework Integration

AgentUnit is designed to work with any agent framework:

LangChain

@agent_test
def test_langchain_agent(agent_harness):
    from langchain.agents import AgentExecutor
    
    trace = agent_harness.run(
        agent=my_langchain_agent,
        input="Analyze this data"
    )
    expect(trace).completed()

Custom Agents

@agent_test
def test_custom_agent(agent_harness):
    def my_agent(prompt, tools):
        # Your custom logic
        return result
    
    trace = agent_harness.run(
        agent=my_agent,
        input="Process request"
    )
    expect(trace).completed()

CI/CD Integration

GitHub Actions

name: Agent Tests

on: [push, pull_request]

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      
      - name: Set up Python
        uses: actions/setup-python@v5
        with:
          python-version: '3.11'
      
      - name: Install dependencies
        run: |
          pip install agentassert
          pip install -r requirements.txt
      
      - name: Run agent tests
        run: agentunit run tests/ -v

GitLab CI

agent-tests:
  image: python:3.11
  script:
    - pip install agentassert
    - agentunit run tests/

Comparison with Alternatives

Feature	AgentUnit	LangSmith	Langfuse	DeepEval
Local execution	Yes	No	Partial	Yes
No account required	Yes	No	No	Yes
Framework agnostic	Yes	No	Yes	Yes
Behavioral assertions	Yes	No	No	No
Tool call testing	Yes	Partial	Partial	No
CI/CD native	Yes	Partial	Partial	Yes
Deterministic replay	Yes	No	No	No

Documentation

Sample Tests — Working examples to get started
Contributing Guide — How to contribute to the project

Contributing

We welcome contributions from the community. Please read our Contributing Guidelines before submitting PRs.

License

AgentUnit is released under the MIT License. See LICENSE for details.

Built by Kaushik Dhola

pip install agentassert

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.1.1

Mar 29, 2026

This version

0.1.0

Mar 29, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

agentassert-0.1.0.tar.gz (33.5 kB view details)

Uploaded Mar 29, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

agentassert-0.1.0-py3-none-any.whl (41.4 kB view details)

Uploaded Mar 29, 2026 Python 3

File details

Details for the file agentassert-0.1.0.tar.gz.

File metadata

Download URL: agentassert-0.1.0.tar.gz
Upload date: Mar 29, 2026
Size: 33.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.0

File hashes

Hashes for agentassert-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`4eedd594626b09dec17d87d3cea8f7ecc0ae52c495d5acb663c2ac3e5ad4f4db`
MD5	`d612a4a178e19a78b356b454975d4b2a`
BLAKE2b-256	`4f5c97905a717e8564329fc00dd4727f6545a6eae63aae32008bfc8c739cff6b`

See more details on using hashes here.

File details

Details for the file agentassert-0.1.0-py3-none-any.whl.

File metadata

Download URL: agentassert-0.1.0-py3-none-any.whl
Upload date: Mar 29, 2026
Size: 41.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.0

File hashes

Hashes for agentassert-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`8010590f24fb4610ee9a3ab6fbb4a3796546f85f8d48254eb72bc3aba0949de2`
MD5	`1d370549c152224b65f5d08caaad6f04`
BLAKE2b-256	`e874e3213e63a8e77a85c4076b1a504de2ce108dbdd8a6c82b09a7397414d468`

See more details on using hashes here.

agentassert 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

AgentUnit

The Problem

Why AgentUnit?

Quick Start

Installation

Write Your First Test

Run Your Tests

Features

Mock Tools

Behavioral Assertions

Rich Matchers

CLI Reference

Framework Integration

CI/CD Integration

GitHub Actions

GitLab CI

Comparison with Alternatives

Documentation

Contributing

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes