Skip to main content

The first reliability testing framework for multi-agent AI systems

Project description

swarm-test

The first reliability testing framework for multi-agent AI systems.

swarm-test builds a NetworkX interaction graph of your agent swarm and runs 5 automated chaos tests to surface cascade failures, context leakage, intent drift, collusion, and blast radius risks — all from a 3-line API.

CrewAI, LangGraph, AutoGen — one tool.

from swarm_test import SwarmProbe

probe  = SwarmProbe(crew)
report = probe.run_all()
report.print_summary()

Features

Test What it checks
Cascade Failure Which agents, if they fail, bring down the most of the swarm
Context Leakage Sensitive data (credentials, PII) crossing agent boundaries
Intent Drift Agents acting outside their role; prompt injection; goal hijacking
Collusion Detection Dense cliques, echo chambers, orchestrator-bypass cycles
Blast Radius Single points of failure, critical path, redundancy score

Supported Frameworks

Framework Adapter Status
CrewAI CrewAIAdapter Stable
LangGraph LangGraphAdapter Stable
AutoGen AutoGenAdapterGroupChat, GroupChatManager, ConversableAgent Stable
Generic / static graph BaseAdapter Stable

Installation

pip install swarm-test
# or with framework extras:
pip install "swarm-test[crewai]"
pip install "swarm-test[langgraph]"
pip install "swarm-test[langchain]"
pip install "swarm-test[autogen]"

From source:

git clone https://github.com/surajkumar811/swarm-test
cd swarm-test
pip install -e ".[dev]"

Quick Start

With a CrewAI crew

from crewai import Crew, Agent, Task
from swarm_test import SwarmProbe

researcher = Agent(role="researcher", goal="...", backstory="...")
writer     = Agent(role="writer",     goal="...", backstory="...")
crew = Crew(agents=[researcher, writer], tasks=[...])

probe  = SwarmProbe(crew, swarm_name="my-crew")
report = probe.run_all()
report.print_summary()
report.to_html("report.html")   # D3 graph visualization

With a LangGraph workflow

from langgraph.graph import StateGraph
from swarm_test import SwarmProbe

graph = StateGraph(dict)
graph.add_node("researcher", researcher_fn)
graph.add_node("writer", writer_fn)
graph.add_edge("researcher", "writer")
compiled = graph.compile()

probe  = SwarmProbe(compiled, swarm_name="my-langgraph")
report = probe.run_all()
report.print_summary()
report.to_json("report.json")   # Structured JSON with stable finding IDs

With an AutoGen GroupChat

from autogen import ConversableAgent, GroupChat, GroupChatManager
from swarm_test import SwarmProbe

planner  = ConversableAgent(name="Planner",  system_message="...")
coder    = ConversableAgent(name="Coder",    system_message="...")
reviewer = ConversableAgent(name="Reviewer", system_message="...")

groupchat = GroupChat(
    agents=[planner, coder, reviewer],
    allowed_or_disallowed_speaker_transitions={
        planner:  [coder],
        coder:    [reviewer],
        reviewer: [planner],
    },
    speaker_transitions_type="allowed",
)
manager = GroupChatManager(groupchat=groupchat)

probe  = SwarmProbe(manager, swarm_name="my-autogen")
report = probe.run_all()
report.print_summary()

From the CLI:

swarm-test run autogen_app.py            # auto-detects `groupchat` / `manager`

Static graph (no live swarm)

from swarm_test import SwarmProbe, AgentNode, InteractionEvent, EventType

a = AgentNode(name="Fetcher", role="researcher")
b = AgentNode(name="Summarizer", role="writer")

probe = SwarmProbe(
    swarm_name="my-swarm",
    agents=[a, b],
    events=[InteractionEvent(
        source_agent_id=a.id,
        target_agent_id=b.id,
        event_type=EventType.TASK_DELEGATE,
    )],
)
report = probe.run_all()
report.print_summary()

CLI

# Run against a Python script containing a `crew` variable
swarm-test probe my_crew.py --output report.html --fail-on-critical

# Static scan from the command line
swarm-test scan \
  --agents Researcher --agents Analyst --agents Writer \
  --edges "Researcher:Analyst" --edges "Analyst:Writer" \
  --output report.html

Configuration

swarm-test supports a YAML config file for repeatable runs and CI gates. Copy the example and edit it to taste:

cp .swarmtest.example.yml .swarmtest.yml

A minimal .swarmtest.yml:

fail_on_severity: high        # critical | high | medium | low | info | none
max_blast_radius: 0.5         # 0.0 - 1.0 — findings above this threshold fail
disabled_tests:               # skip individual tests
  - collusion
sensitive_patterns:           # extra regexes added to the sensitive-data scanner
  - "INTERNAL-[A-Z0-9]+"
output_format: html           # console | json | markdown | html
output_path: ./swarm.html
quick_scan: false
timeout_seconds: 30
strict: false                 # treat ANY finding as a failure

Run with the new run subcommand:

swarm-test run --config .swarmtest.yml
swarm-test run -a "A,B,C" -e "A>B,B>C" --strict
swarm-test run my_crew.py --config custom-config.yml --output-format json

Auto-discovery. With no --config flag, swarm-test discovers .swarmtest.yml, .swarmtest.yaml, or swarmtest.yml in the project root, falling back to a [tool.swarmtest] table in pyproject.toml.

CLI flags always override config-file values. Exit codes from run: 0 (passed), 1 (findings exceed thresholds), 2 (config or runtime error).


Architecture

swarm_test/
├── core/
│   ├── models.py       # Pydantic models (AgentNode, Finding, SwarmReport, …)
│   ├── graph.py        # NetworkX SwarmGraph
│   ├── interceptor.py  # Monkey-patch agent methods, sensitive-data scanner
│   └── probe.py        # SwarmProbe — main entry point
├── attacks/
│   ├── cascade.py          # Cascade failure simulation
│   ├── context_leakage.py  # Sensitive-data boundary check
│   ├── intent_drift.py     # Role violations + goal hijacking
│   ├── collusion.py        # Clique/echo-chamber/cycle detection
│   └── blast_radius.py     # Topological SPOF + redundancy analysis
├── integrations/
│   ├── base.py                # BaseAdapter
│   ├── crewai_adapter.py      # CrewAI Crew ingestion
│   ├── langgraph_adapter.py   # LangGraph StateGraph / CompiledGraph ingestion
│   └── autogen_adapter.py     # AutoGen GroupChat / ConversableAgent ingestion
├── reporters/
│   ├── console.py          # Rich terminal output
│   └── html.py             # D3 force-directed graph report
└── cli.py                  # Click CLI

Report Output

Terminal (Rich)

─────────────────── SWARM-TEST RELIABILITY REPORT ───────────────────

 Summary
 Swarm: research-crew-demo    Framework: crewai
 Agents: 4   Edges: 6
 Risk Score: 45/100
 Duration: 12ms

╭─────────────────── Test Results ─────────────────────╮
│ Test                  Status   Findings  Critical  High │
│ cascade_failure       FAILED       2         1       1  │
│ context_leakage       PASSED       0         0       0  │
│ intent_drift          PASSED       0         0       0  │
│ collusion_detection   PASSED       0         0       0  │
│ blast_radius          FAILED       1         1       0  │
╰───────────────────────────────────────────────────────╯

HTML Report

Interactive D3.js force-directed graph showing agent nodes, interaction edges, and color-coded findings.


Extending

Custom attack

from swarm_test.attacks.base import BaseAttack
from swarm_test.core.models import Finding, Severity, TestResult

class MyCustomAttack(BaseAttack):
    name = "my_custom_attack"

    def run(self, graph):
        findings = []
        # ... analyze graph.graph, graph.events ...
        return TestResult(test_name=self.name, findings=findings)

Custom adapter

from swarm_test.integrations.base import BaseAdapter

class MyFrameworkAdapter(BaseAdapter):
    framework_name = "my-framework"

    def _ingest_impl(self, swarm, graph):
        for raw_agent in swarm.my_agents:
            node = self._make_agent_node(raw_agent.name, raw_agent.role)
            graph.add_agent(node)

Integrations

swarm-test exports (agent_health scores and structural findings) can feed runtime risk gates. Each integration has its own page under docs/integrations/:

  • Black_Wall — pre-action risk gate; consumes agent_health as a downside-only prior.

Development

pip install -e ".[dev]"
pytest tests/ -v --cov=swarm_test
ruff check swarm_test/
black swarm_test/

License

MIT — see LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

swarm_test-0.2.7.tar.gz (92.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

swarm_test-0.2.7-py3-none-any.whl (78.0 kB view details)

Uploaded Python 3

File details

Details for the file swarm_test-0.2.7.tar.gz.

File metadata

  • Download URL: swarm_test-0.2.7.tar.gz
  • Upload date:
  • Size: 92.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.14

File hashes

Hashes for swarm_test-0.2.7.tar.gz
Algorithm Hash digest
SHA256 e66eb9ac91de43191a286ff12982e705f3ce7d110a8098aef312e38ba36f3409
MD5 c9effcc6f40cd6b146d46d47436859ac
BLAKE2b-256 aadad8090aa7e37fc8e81fa4c8ed7ed5fc23969f691ee7a0eec4d3cc2b35a649

See more details on using hashes here.

File details

Details for the file swarm_test-0.2.7-py3-none-any.whl.

File metadata

  • Download URL: swarm_test-0.2.7-py3-none-any.whl
  • Upload date:
  • Size: 78.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.14

File hashes

Hashes for swarm_test-0.2.7-py3-none-any.whl
Algorithm Hash digest
SHA256 267d4091e77a09e87f691b695c1e3296b20dad692ee2044c25f7abc16cfe2936
MD5 7cfba8c04864e6fe21407af0fcc30133
BLAKE2b-256 7faa4ed0d2b1cabe791fefc5bf162875d19d0efb41a05a650c6f57d310e25c04

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page