The first reliability testing framework for multi-agent AI systems
Project description
swarm-test
The first reliability testing framework for multi-agent AI systems.
swarm-test builds a NetworkX interaction graph of your agent swarm and runs 5 automated chaos tests to surface cascade failures, context leakage, intent drift, collusion, and blast radius risks — all from a 3-line API.
from swarm_test import SwarmProbe
probe = SwarmProbe(crew)
report = probe.run_all()
report.print_summary()
Features
| Test | What it checks |
|---|---|
| Cascade Failure | Which agents, if they fail, bring down the most of the swarm |
| Context Leakage | Sensitive data (credentials, PII) crossing agent boundaries |
| Intent Drift | Agents acting outside their role; prompt injection; goal hijacking |
| Collusion Detection | Dense cliques, echo chambers, orchestrator-bypass cycles |
| Blast Radius | Single points of failure, critical path, redundancy score |
Installation
pip install swarm-test
# or with framework extras:
pip install "swarm-test[crewai]"
pip install "swarm-test[langgraph]"
pip install "swarm-test[langchain]"
From source:
git clone https://github.com/surajkumar811/swarm-test
cd swarm-test
pip install -e ".[dev]"
Quick Start
With a CrewAI crew
from crewai import Crew, Agent, Task
from swarm_test import SwarmProbe
researcher = Agent(role="researcher", goal="...", backstory="...")
writer = Agent(role="writer", goal="...", backstory="...")
crew = Crew(agents=[researcher, writer], tasks=[...])
probe = SwarmProbe(crew, swarm_name="my-crew")
report = probe.run_all()
report.print_summary()
report.to_html("report.html") # D3 graph visualization
With a LangGraph workflow
from langgraph.graph import StateGraph
from swarm_test import SwarmProbe
graph = StateGraph(dict)
graph.add_node("researcher", researcher_fn)
graph.add_node("writer", writer_fn)
graph.add_edge("researcher", "writer")
compiled = graph.compile()
probe = SwarmProbe(compiled, swarm_name="my-langgraph")
report = probe.run_all()
report.print_summary()
report.to_json("report.json") # Structured JSON with stable finding IDs
Static graph (no live swarm)
from swarm_test import SwarmProbe, AgentNode, InteractionEvent, EventType
a = AgentNode(name="Fetcher", role="researcher")
b = AgentNode(name="Summarizer", role="writer")
probe = SwarmProbe(
swarm_name="my-swarm",
agents=[a, b],
events=[InteractionEvent(
source_agent_id=a.id,
target_agent_id=b.id,
event_type=EventType.TASK_DELEGATE,
)],
)
report = probe.run_all()
report.print_summary()
CLI
# Run against a Python script containing a `crew` variable
swarm-test probe my_crew.py --output report.html --fail-on-critical
# Static scan from the command line
swarm-test scan \
--agents Researcher --agents Analyst --agents Writer \
--edges "Researcher:Analyst" --edges "Analyst:Writer" \
--output report.html
Architecture
swarm_test/
├── core/
│ ├── models.py # Pydantic models (AgentNode, Finding, SwarmReport, …)
│ ├── graph.py # NetworkX SwarmGraph
│ ├── interceptor.py # Monkey-patch agent methods, sensitive-data scanner
│ └── probe.py # SwarmProbe — main entry point
├── attacks/
│ ├── cascade.py # Cascade failure simulation
│ ├── context_leakage.py # Sensitive-data boundary check
│ ├── intent_drift.py # Role violations + goal hijacking
│ ├── collusion.py # Clique/echo-chamber/cycle detection
│ └── blast_radius.py # Topological SPOF + redundancy analysis
├── integrations/
│ ├── base.py # BaseAdapter
│ └── crewai_adapter.py # CrewAI Crew ingestion
├── reporters/
│ ├── console.py # Rich terminal output
│ └── html.py # D3 force-directed graph report
└── cli.py # Click CLI
Report Output
Terminal (Rich)
─────────────────── SWARM-TEST RELIABILITY REPORT ───────────────────
Summary
Swarm: research-crew-demo Framework: crewai
Agents: 4 Edges: 6
Risk Score: 45/100
Duration: 12ms
╭─────────────────── Test Results ─────────────────────╮
│ Test Status Findings Critical High │
│ cascade_failure FAILED 2 1 1 │
│ context_leakage PASSED 0 0 0 │
│ intent_drift PASSED 0 0 0 │
│ collusion_detection PASSED 0 0 0 │
│ blast_radius FAILED 1 1 0 │
╰───────────────────────────────────────────────────────╯
HTML Report
Interactive D3.js force-directed graph showing agent nodes, interaction edges, and color-coded findings.
Extending
Custom attack
from swarm_test.attacks.base import BaseAttack
from swarm_test.core.models import Finding, Severity, TestResult
class MyCustomAttack(BaseAttack):
name = "my_custom_attack"
def run(self, graph):
findings = []
# ... analyze graph.graph, graph.events ...
return TestResult(test_name=self.name, findings=findings)
Custom adapter
from swarm_test.integrations.base import BaseAdapter
class MyFrameworkAdapter(BaseAdapter):
framework_name = "my-framework"
def _ingest_impl(self, swarm, graph):
for raw_agent in swarm.my_agents:
node = self._make_agent_node(raw_agent.name, raw_agent.role)
graph.add_agent(node)
Development
pip install -e ".[dev]"
pytest tests/ -v --cov=swarm_test
ruff check swarm_test/
black swarm_test/
License
MIT — see LICENSE.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file swarm_test-0.2.0.tar.gz.
File metadata
- Download URL: swarm_test-0.2.0.tar.gz
- Upload date:
- Size: 61.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
630dba5636f718a95fd61c2f97678caf0239d2456ed45167ce1de34328815c93
|
|
| MD5 |
7a23c156c20b4e3b443e12ecda958abe
|
|
| BLAKE2b-256 |
7cd78bf603481185949e8207739bdaf347ae338524bb45038eb85ee77c6c4096
|
File details
Details for the file swarm_test-0.2.0-py3-none-any.whl.
File metadata
- Download URL: swarm_test-0.2.0-py3-none-any.whl
- Upload date:
- Size: 54.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
17d05ead42b0a94d0c7528c314bb12dd810c37d8d565e08f84aab509840e80e2
|
|
| MD5 |
acf35cc74ed1733621fc6cd129b8675a
|
|
| BLAKE2b-256 |
6ae246eab04a4ea95675e328fa4a8ddef69d59a9f3cbb3e2ed4287db4e3f1425
|