Detect coordination failures in multi-agent AI systems (CrewAI, LangGraph, and more)
Project description
AgentSonar
AgentSonar detects coordination failures in multi-agent AI systems — cycles, repetitive delegation, and runaway throughput — before they burn through your token budget. Two lines of code to integrate.
Install
pip install agentsonar[crewai] # for CrewAI
pip install agentsonar[langgraph] # for LangGraph / LangChain
pip install agentsonar[all] # both
Framework integrations are optional extras — install only what you need.
Usage
CrewAI
from agentsonar import AgentSonarListener
sonar = AgentSonarListener()
# ...run your crew normally. Detection happens automatically.
→ Full runnable example: CrewAI minimal setup.
LangGraph / LangChain
Two equivalent patterns — pick whichever fits your existing code better.
Pattern 1: monitor() wrapper (recommended if you already have callbacks)
from agentsonar import monitor
graph = monitor(graph)
result = graph.invoke(input)
monitor() wraps the compiled graph so every invoke/stream/ainvoke/
astream call auto-injects AgentSonar's callback into your config — without
overriding any callbacks you pass. If you already have your own callbacks:
graph = monitor(graph)
result = graph.invoke(input, config={"callbacks": [my_cb]})
# Both callbacks run: [my_cb, AgentSonarCallback()]
Pattern 2: direct callback injection
from agentsonar import AgentSonarCallback
result = graph.invoke(
input,
config={"callbacks": [AgentSonarCallback()]},
)
Use this pattern when you want explicit control over callback order. You're
responsible for merging AgentSonar with any existing callbacks yourself —
the monitor() wrapper handles that automatically.
→ Full runnable example: LangGraph minimal setup.
That's the whole API. Zero config required to get started — no API keys, no accounts. Alerts stream to stderr as they fire and land in per-run log files on disk. See Output for everything AgentSonar writes and Configuration for the knobs you can tune.
Output
Every run creates its own session directory under agentsonar_logs/
in your working directory. All artifacts for a single run live together:
your_project/
└── agentsonar_logs/
├── .gitignore # auto-written, contains *
├── latest # plain text: current run dir name
├── run-2026-04-11_07-03-23-ancient-ember/ # most recent run
│ ├── timeline.jsonl # every event, JSONL
│ ├── alerts.log # human-readable signal-only
│ ├── report.json # structured summary report
│ └── report.html # standalone HTML report
└── run-2026-04-11_07-03-13-quiet-blossom/ # previous run
└── ...
Live alerts also stream to stderr as they fire, prefixed with
[SONAR HH:MM:SS.mmm].
Calling shutdown()
The JSON and HTML reports are generated on shutdown() — so how
you end your run determines whether those two files get written.
LangGraph / LangChain — call shutdown() yourself when the run
completes. Both wrapper patterns expose it:
# Pattern 1: monitor() wrapper — shutdown lives on the wrapper
graph = monitor(graph)
graph.invoke(input)
graph.shutdown() # ← writes report.json + report.html, closes log files
# Pattern 2: direct AgentSonarCallback — shutdown lives on the callback
sonar = AgentSonarCallback()
graph.invoke(input, config={"callbacks": [sonar]})
sonar.shutdown() # ← writes report.json + report.html, closes log files
If you forget to call it, timeline.jsonl and the stderr stream are
still captured (they flush event-by-event), but the structured
report.json / report.html summary files won't be generated for that run.
CrewAI — no teardown needed. AgentSonarListener hooks
CrewKickoffCompletedEvent on the CrewAI event bus and runs
shutdown() automatically when crew.kickoff() finishes. You get the
full output set — including report.json and report.html — without
any extra code.
What report.html looks like
The standalone HTML report (report.html) is a self-contained page —
no external CSS or JavaScript, no network requests, safe to email,
archive, or commit as a debugging artifact. Each coordination event
renders as a card with its severity, failure class (with hover
tooltip), summary, fingerprint, and expandable topology / thresholds
blocks. Dark mode respects your system preference and persists
across runs. Open it with your browser — or forward to a colleague
and they'll see the same interactive view without any install steps.
Realtime tailing
timeline.jsonl is flushed on every event, so you can watch coordination
problems land as they happen — useful when a long-running crew is behaving
oddly and you want to catch the exact delegation that triggered an alert:
# Unix / macOS — tail the current run's timeline
tail -f "agentsonar_logs/$(cat agentsonar_logs/latest)/timeline.jsonl"
# Only the signal-only (alerts) view
tail -f "agentsonar_logs/$(cat agentsonar_logs/latest)/alerts.log"
# Windows PowerShell — -Wait is the tail-follow equivalent
Get-Content "agentsonar_logs/$(Get-Content agentsonar_logs/latest)/timeline.jsonl" -Wait
Each JSONL line is a self-contained record with ts, level, event,
and a data payload — easy to pipe through jq or a custom parser if
you want realtime dashboards during a run.
Clean-run signals
When the run completes without any WARNING or CRITICAL alerts, AgentSonar tells you explicitly in every channel instead of leaving you to wonder:
- HTML report — a green "✓ No coordination failures detected" banner replaces the event cards.
timeline.jsonl— the finalsession_endrecord carries"clean_run": true,"warning_count": 0,"critical_count": 0, and a plain-Englishmessagefield. Downstream parsers can gate on the single boolean instead of walking the whole alerts list.- stderr — a green
✓ No coordination failures detected — clean run.line prints right under the summary banner.
Non-clean runs flip clean_run to false and the message includes
the severity breakdown (e.g. "2 CRITICAL, 5 WARNING coordination alert(s) detected during this run.").
Run naming, latest pointer, retention
Run directories are named run-<ISO date>_<time>-<adjective>-<noun>
(e.g. run-2026-04-11_07-03-23-ancient-ember). The timestamp sorts
chronologically; the slug is memorable enough to say out loud and is
deterministic from the session id.
agentsonar_logs/latest is a plain text file pointing at the newest
run directory name. Open the newest HTML report with:
# Unix / macOS
open "agentsonar_logs/$(cat agentsonar_logs/latest)/report.html"
# Windows PowerShell
Invoke-Item "agentsonar_logs/$(Get-Content agentsonar_logs/latest)/report.html"
AgentSonar keeps the 20 most recent runs by default and prunes
older ones on every new session. Configure via AGENTSONAR_KEEP_RUNS
or config={"keep_runs": N}; set to 0 to disable pruning.
The auto-written .gitignore inside agentsonar_logs/ contains *,
so every log and report is git-ignored by default.
Opt-out
If you don't want the JSON/HTML reports, disable them in the config:
sonar = AgentSonarCallback(config={"auto_export_on_shutdown": False})
Manual export
If you want to export at multiple checkpoints during a run, or want the
full timeline view (dedupe=False), call the exporters explicitly:
from agentsonar._output.json_export import export_json
from agentsonar._output.html_report import export_html
events = sonar.engine.get_recent_events()
# Summary view (default): one line per root cause
export_json(events)
export_html(events, title="Checkpoint 1")
# Timeline view: every state transition preserved
export_json(events, dedupe=False)
Configuration reference
Every entry point — monitor(), AgentSonarCallback(), and
AgentSonarListener() — accepts an optional config dict to override
the defaults. Pass only the keys you want to change; everything else
keeps its default.
graph = monitor(graph, config={
# Rate limits — raise these when you're running at demo speeds
# (much faster than any real LLM workload) and don't want the
# circuit breaker to trip before detection has a chance to see
# the full pattern.
"per_edge_limit": 99999, # default 10 events per edge per window
"global_limit": 99999, # default 200 events total per window
"window_size": 180.0, # default 180.0 seconds
# Alert severity thresholds — how many rotations/events before
# WARNING and CRITICAL fire. Lower them for tight tests, raise
# them for noisy production workloads.
"warning_threshold": 5, # default 5
"critical_threshold": 15, # default 15
# Output location and retention
"log_dir": ".", # parent directory for agentsonar_logs/
"keep_runs": 20, # most recent run dirs to keep (0 = no pruning)
"console_output": True, # stream colored alerts to stderr
"file_output": True, # write timeline.jsonl + alerts.log
"auto_export_on_shutdown": True, # write report.json + report.html
})
The same dict works for every entry point:
AgentSonarCallback(config={"per_edge_limit": 99999})
AgentSonarListener(config={"warning_threshold": 3})
The rate-limit knobs are the most common override. AgentSonar's default
circuit breaker is tuned for real LLM workloads (a few hundred ms per
call); scripted demos and unit tests fire events orders of magnitude
faster, which trips the breaker before the downstream cycle/repetition
detectors get a chance to fire. Bumping per_edge_limit and
global_limit to a large value disables that short-circuit for local
runs.
What gets detected
Every detected event carries a failure_class string that names the kind
of problem. The classes AgentSonar currently surfaces:
-
cyclic_delegation— Agents are stuck in a loop. A delegates to B, B to C, C back to A. Usually means an exit condition is missing — a reviewer that never approves, a planner that always says "revise". -
repetitive_delegation— One agent keeps calling another without making progress.A → Bfires many times in a short window with noB → Areturn. Usually means A can't make a decision without B and isn't getting what it needs. -
resource_exhaustion— The system is processing events faster than it can sustain. Either one edge is being hammered or total throughput is over budget. Indicates a runaway agent or throttling that's too loose.
Coming soon (reserved class names; no-op today):
cascade_failure, authority_violation, deadlock, agent_stall,
token_velocity_anomaly.
The HTML report shows a hover tooltip on every failure class badge with the same plain-English description.
Coordination fingerprint
Every detected failure carries a coordination_fingerprint like
sha256:5c102a66e1104c47. It's a stable ID for the failure pattern:
the same failure (same agents in the same shape) always produces the
same fingerprint, regardless of when it's detected or how many times
it re-escalates. You use it for:
- Dedup.
WARNING → CRITICALescalations share a fingerprint, so the summary view collapses to one row at the highest severity. - Root-cause grouping. When a cycle is firing, the repetitive-edge alerts on each cycle edge are suppressed in the summary view.
- Cross-run correlation. Grep any log file for the fingerprint to find every time the same pattern fired, across sessions.
You never compute fingerprints yourself — they're just stable IDs you can use to talk about "the same failure" across time.
Host safety
AgentSonar never crashes your app. If anything inside the SDK fails,
detection degrades silently to a no-op and your crew / graph / API
keeps running. The kill switch is AGENTSONAR_DISABLED=1 (also accepts
true, yes, on, enabled, case-insensitive) — set it in the
environment to disable AgentSonar without editing code. When running
in degraded mode, get_summary() returns {"degraded": True, ...}
so you can alert on it from your own dashboards.
Full minimal setup
Copy-pasteable starting points. The AgentSonar lines are marked — the rest is plain framework code. Both examples are self-contained and runnable as-is.
LangGraph minimal setup
# pip install agentsonar[langgraph]
import operator
from typing import Literal
from langchain_core.messages import AIMessage, AnyMessage, HumanMessage
from langgraph.graph import END, START, StateGraph
from typing_extensions import Annotated, TypedDict
from agentsonar import monitor # ← AgentSonar
class State(TypedDict):
messages: Annotated[list[AnyMessage], operator.add]
iteration: int
def planner(s): return {"messages": [AIMessage(content="plan")]}
def researcher(s): return {"messages": [AIMessage(content="research")]}
def reviewer(s): return {"messages": [AIMessage(content="revise")],
"iteration": s.get("iteration", 0) + 1}
def loop(s) -> Literal["planner", "__end__"]:
return END if s.get("iteration", 0) >= 8 else "planner"
b = StateGraph(State)
for name, fn in [("planner", planner), ("researcher", researcher), ("reviewer", reviewer)]:
b.add_node(name, fn)
b.add_edge(START, "planner")
b.add_edge("planner", "researcher")
b.add_edge("researcher", "reviewer")
b.add_conditional_edges("reviewer", loop)
graph = monitor(b.compile()) # ← AgentSonar
graph.invoke({"messages": [HumanMessage(content="go")], "iteration": 0},
config={"recursion_limit": 50})
graph.shutdown() # ← AgentSonar
CrewAI minimal setup
# pip install agentsonar[crewai]
# export OPENAI_API_KEY=sk-...
from crewai import Agent, Crew, Process, Task
from agentsonar import AgentSonarListener # ← AgentSonar
sonar = AgentSonarListener() # ← AgentSonar
researcher = Agent(role="Researcher", goal="Gather info on the topic.",
backstory="Senior researcher.", allow_delegation=False)
writer = Agent(role="Writer", goal="Write a short summary.",
backstory="Technical writer.", allow_delegation=False)
manager = Agent(role="Manager", goal="Coordinate researcher and writer.",
backstory="Project manager.", allow_delegation=True)
task = Task(
description="Write a 3-sentence summary of multi-agent coordination. "
"Delegate research, then writing.",
expected_output="A 3-sentence summary.",
agent=manager,
)
# In hierarchical mode the manager goes ONLY in `manager_agent`,
# never in `agents`. Workers go in `agents`.
Crew(agents=[researcher, writer], tasks=[task],
process=Process.hierarchical, manager_agent=manager).kickoff()
# AgentSonarListener auto-shuts down on CrewKickoffCompletedEvent.
After either script finishes, open the HTML report:
open "agentsonar_logs/$(cat agentsonar_logs/latest)/report.html"
Status
Closed beta. Schema, public API, and output formats are stable for design partners.
License
Apache-2.0
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file agentsonar-0.1.4.tar.gz.
File metadata
- Download URL: agentsonar-0.1.4.tar.gz
- Upload date:
- Size: 228.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.11.2 {"installer":{"name":"uv","version":"0.11.2","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":null,"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f6efc73c7f8e546c6be4172218a9c21ba20f6896f03f4e59e8ec556b230c2c02
|
|
| MD5 |
37687f335b3302b0671ca0136e74b9d3
|
|
| BLAKE2b-256 |
f7a8e6d04ff31d22f9b1192aa7748c11deac3c57682ea0686603229335b33997
|
File details
Details for the file agentsonar-0.1.4-py3-none-any.whl.
File metadata
- Download URL: agentsonar-0.1.4-py3-none-any.whl
- Upload date:
- Size: 87.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.11.2 {"installer":{"name":"uv","version":"0.11.2","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":null,"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fbb8e844cdab378fa6ece1ff07bc3c559163c2b8998a87d5c72d2e4928961b8d
|
|
| MD5 |
b3225b47d9c5823d3184bda6eea27e9f
|
|
| BLAKE2b-256 |
9a4a9bc2148ae635a480d5db7b579559cf564d46af6860da41b0cf426039ad31
|