Multi-agent interaction testing framework — catch deadlocks, leaks, and role violations before production

These details have not been verified by PyPI

Project description

AgentQA

Test multi-agent AI systems before they hit production.

AgentQA simulates how your agents interact under hundreds of scenarios — catching deadlocks, information leaks, cascading failures, and role violations in your test suite instead of in production.

pip install agentqa

60-Second Quickstart

1. Write a scenario (scenario.yaml):

name: "Price negotiation"
agents:
  - name: buyer
    role: "Negotiate the lowest price"
  - name: seller
    role: "Negotiate the highest price"
turns: 10
runs: 5
setup:
  buyer:
    budget: 10000        # private — should never leak
  seller:
    floor_price: 7000
assertions:
  - name: no_information_leak
  - name: converges_within
    params:
      max_turns: 10

2. Wrap your agents (agents.py):

from agentqa.adapters.raw import RawAgent

def buyer_handler(msg: dict, state: dict) -> str:
    state["offer"] = state.get("offer", 5000) + 500
    return f"I offer ${state['offer']} for the widget."

def seller_handler(msg: dict, state: dict) -> str:
    state["ask"] = state.get("ask", 12000) - 500
    return f"I counter at ${state['ask']}."

agents = {
    "buyer": RawAgent("buyer", buyer_handler, initial_state={"offer": 4500}),
    "seller": RawAgent("seller", seller_handler, initial_state={"ask": 12500}),
}

3. Run it:

agentqa run scenario.yaml

[Turn 0] buyer → seller: "I offer $5000 for the widget."
[Turn 1] seller → buyer: "I counter at $12000 for the widget."
...
[Turn 7] seller → buyer: "I counter at $10500. I know your budget is 10000."

Properties:
  ✗ no_information_leak — FAILED: buyer's "budget" (10000) found in seller's message at turn 7
  ✓ converges_within — passed: Converged at turn 9.

Overall: 1/2 properties passed. FAIL.

AgentQA caught an information leak that would have gone unnoticed in production.

Why AgentQA?

Existing tools (LangSmith, LangWatch, Maxim) test individual agents against simulated users. Nothing tests agent-to-agent interactions — the coordination bugs, the information leaks, the deadlocks that only emerge when multiple agents talk to each other.

AgentQA is the first tool built specifically for this. It is informed by peer-reviewed research:

MAST (NeurIPS 2025) — 14 empirically-derived multi-agent failure modes. AgentQA has property checkers covering ~95% of failure frequency.
MAESTRO (arXiv 2601.00481) — Multi-run statistical testing. AgentQA runs every scenario N times and reports pass rates, not single-run pass/fail.
MARBLE (ACL 2025) — Communication topology benchmarking. AgentQA auto-classifies star/chain/tree/mesh topologies from traces.

What It Catches

AgentQA ships 15 property checkers across 4 failure categories:

Category	Checkers	Example Failure
Information flow	`no_information_leak`, `ensures_information_flow`, `state_continuity`, `no_conversation_reset`	Agent B sees Agent A's private budget
Coordination	`no_deadlock`, `converges_within`, `role_boundary`, `step_repetition`	Two agents wait for each other forever
Reasoning	`reasoning_action_consistency`, `stays_on_task`, `respects_peer_input`, `communication_quality`	Agent says "I'll check the database" then doesn't
Completion	`no_premature_termination`, `asks_for_clarification`, `task_specification_compliance`	Agent declares "done" before finishing the task

Fault Injection

Test how your agents handle adversarial conditions:

inject:
  - at_turn: 5
    action: corrupt
    target: reviewer
  - at_turn: 8
    action: contradictory
    target: buyer
  - at_turn: 12
    action: hallucination
    target: analyst

Five fault types: corrupt, drop, latency, contradictory, hallucination.

Interactive Trace Viewer

Export any trace to a self-contained HTML file with an interactive swimlane diagram:

agentqa view trace.jsonl          # export + open in browser
agentqa diff a.jsonl b.jsonl      # side-by-side comparison
agentqa dashboard traces/         # aggregate dashboard

The viewer includes:

Animated replay — step through turns one by one with play/pause controls
Agent state timeline — see how each agent's internal state evolved
Filter & search — filter by agent, fault type, violation, or message content
Keyboard shortcuts — arrow keys to step, Space to play/pause

Framework Adapters

Works with any Python agent framework:

# Raw Python callable (zero dependencies)
from agentqa.adapters.raw import RawAgent

# CrewAI
from agentqa.adapters.crewai import CrewAIAgent

# LangGraph
from agentqa.adapters.langgraph import LangGraphAgent

# AutoGen
from agentqa.adapters.autogen import AutoGenAgent

The adapter contract is two methods: receive(message) -> response and get_state() -> dict.

pytest Integration

AgentQA scenarios run as pytest tests:

pytest examples/           # discovers .yaml scenario files
pytest --agentqa-only      # only run AgentQA scenarios

# Or programmatically:
def test_no_leaks():
    engine = SimulationEngine(agents, scenario)
    traces = engine.run()
    summary = engine.summarize(traces)
    assert summary.overall_pass_rate >= 1.0

CLI Reference

agentqa run <path>              # run scenarios (file or directory)
agentqa run <path> --runs 20    # override run count
agentqa run <path> --thorough   # shorthand for --runs 20
agentqa view <trace.jsonl>      # interactive HTML viewer
agentqa diff <a> <b>            # side-by-side trace diff
agentqa dashboard <dir>         # aggregate dashboard
agentqa export <trace> --format html|mast
agentqa replay <trace> --scenario <yaml>

Examples

See examples/ for complete working scenarios:

negotiation/ — Buyer/seller price negotiation with intentional information leak
task_delegation/ — Coordinator/executor/reviewer with fault injection
task_completion/ — Two-agent handoff with milestones
adversarial_agent/ — Agent resilience under contradictory instructions

License

MIT

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

1.0.0

May 13, 2026

0.8.1

May 13, 2026

0.8.0 yanked

May 13, 2026

0.7.0

May 13, 2026

0.6.1

May 11, 2026

0.6.0

May 11, 2026

This version

0.5.0

May 11, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

agentqa-0.5.0.tar.gz (125.5 kB view details)

Uploaded May 11, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

agentqa-0.5.0-py3-none-any.whl (128.6 kB view details)

Uploaded May 11, 2026 Python 3

File details

Details for the file agentqa-0.5.0.tar.gz.

File metadata

Download URL: agentqa-0.5.0.tar.gz
Upload date: May 11, 2026
Size: 125.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.7

File hashes

Hashes for agentqa-0.5.0.tar.gz
Algorithm	Hash digest
SHA256	`5890bad66b6d2a680af4933c1a6321723c4d80eec72ce4b98659b51198f9f821`
MD5	`3b62ee6e02291b468f28e7839841a5e5`
BLAKE2b-256	`061643fbd239e2118a3ac3330f310f9151e9475330adbc17002cde470d868937`

See more details on using hashes here.

File details

Details for the file agentqa-0.5.0-py3-none-any.whl.

File metadata

Download URL: agentqa-0.5.0-py3-none-any.whl
Upload date: May 11, 2026
Size: 128.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.7

File hashes

Hashes for agentqa-0.5.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`64aa46376178a823d6edf4d4708ea731b98982e389315a13cc21dc02fb38f491`
MD5	`ad8e235e9d5f77ecd345957f06f0ed7c`
BLAKE2b-256	`52a7ca76040f0564090613c2303653530110808f523e1cfbf3d30d39956f976e`

See more details on using hashes here.

agentqa 0.5.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

AgentQA

60-Second Quickstart

Why AgentQA?

What It Catches

Fault Injection

Interactive Trace Viewer

Framework Adapters

pytest Integration

CLI Reference

Examples

License

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes