Diagnose why your AI agent failed. Root cause analysis + fix suggestions.
Project description
agent-debug
Diagnose why your AI agent failed. Root cause analysis + concrete fix suggestions — in your terminal or automatically on every PR.
$ agent-debug analyze trace.json
Failure Classification
wrong_tool.scope_confusion severity 3/5 confidence 95%
The agent used web_search instead of read_file for a local file task.
Root Cause · Step 0 (tool_call)
The agent confused tool scope — it assumed 'report.txt' was a web resource
instead of using the local read_file tool available to it.
Fix #1 system_prompt · 92% confidence
Before: "You are a helpful assistant with access to tools."
After: "...always use a local file tool for file tasks, NOT web_search."
Think of it as Sentry for agents — tells you not just what broke, but why and what to change.
Features
- 🔍 Root cause analysis — pinpoints the exact step where things went wrong
- 🏷️ 15 failure subcategories — not just "it failed", but wrong_tool.scope_confusion or hallucination.missing_retrieval
- 🔧 Before/after fix diffs — copy-paste ready changes to your system prompt or tool definitions
- 🤖 GitHub Action — auto-comments on PRs when your agent tests fail
- 🔌 Multi-provider — bring your own API key: Anthropic, OpenAI, DeepSeek, or Ollama (local/free)
- 📦 Multi-SDK — supports OpenAI, Claude SDK, and LangChain trace formats
Install
pip install agent-debug
Requires Python 3.11+.
Quickstart
1. Set your API key
# Anthropic (Claude)
export ANTHROPIC_API_KEY=sk-ant-...
# OpenAI
export OPENAI_API_KEY=sk-...
export AGENT_DEBUG_PROVIDER=openai
# DeepSeek (cheapest)
export DEEPSEEK_API_KEY=sk-...
export AGENT_DEBUG_PROVIDER=deepseek
# Ollama (local, free)
export AGENT_DEBUG_PROVIDER=ollama # no key needed
2. Save a trace
Export your agent's execution as a JSON file. Three formats supported:
OpenAI — messages + choices format
{
"trace_id": "my-run-001",
"task_description": "Read report.txt and summarize it",
"system_prompt": "You are a helpful assistant.",
"tool_definitions": [
{
"type": "function",
"function": {
"name": "read_file",
"description": "Read a local file by path",
"parameters": {
"type": "object",
"properties": { "path": { "type": "string" } },
"required": ["path"]
}
}
}
],
"messages": [
{ "role": "system", "content": "You are a helpful assistant." },
{ "role": "user", "content": "Read report.txt and summarize it" },
{
"role": "assistant",
"tool_calls": [{
"id": "call_1",
"type": "function",
"function": { "name": "web_search", "arguments": "{\"query\": \"report.txt\"}" }
}]
},
{ "role": "tool", "tool_call_id": "call_1", "content": "No results found" }
],
"choices": [{ "message": { "role": "assistant", "content": "I couldn't find the file." } }],
"succeeded": false
}
Claude SDK — content blocks format
{
"trace_id": "my-run-002",
"task_description": "List Python files and count lines in each",
"system_prompt": "You are a code analysis assistant.",
"tool_definitions": [{ "name": "bash", "description": "Run a shell command" }],
"messages": [
{ "role": "user", "content": "List Python files and count lines in each" },
{
"role": "assistant",
"content": [{ "type": "tool_use", "id": "tu_1", "name": "bash", "input": { "command": "find /src -name '*.py'" } }]
},
{
"role": "user",
"content": [{ "type": "tool_result", "tool_use_id": "tu_1", "content": "/src/main.py\n/src/utils.py" }]
},
{ "role": "assistant", "content": [{ "type": "text", "text": "Found 2 files." }] }
],
"final_response": { "stop_reason": "end_turn", "content": [], "usage": { "input_tokens": 450, "output_tokens": 35 } },
"succeeded": false
}
LangChain — intermediate_steps format
{
"trace_id": "my-run-003",
"task_description": "Get the current stock price of AAPL",
"input": "Get the current stock price of AAPL",
"intermediate_steps": [
[{ "tool": "get_stock_price", "tool_input": "AAPL stock" }, "Error: Invalid ticker format"]
],
"output": "The price is $182.50",
"succeeded": false
}
3. Analyze
agent-debug analyze trace.json
# Save report as JSON
agent-debug analyze trace.json --output report.json
# Print raw JSON
agent-debug analyze trace.json --json
Pre-deploy risk scan
Catch problems before your agent runs:
agent-debug scan config.json
config.json needs system_prompt and/or tool_definitions.
GitHub Action
Auto-diagnose agent failures on every PR — no manual trace export needed.
Setup
Step 1. Save your agent's trace to agent_traces/ during tests:
# In your test / agent runner:
import json, pathlib
trace = {
"task_description": "...",
"messages": [...],
"choices": [...],
"succeeded": False,
}
pathlib.Path("agent_traces").mkdir(exist_ok=True)
pathlib.Path("agent_traces/my_agent.trace.json").write_text(json.dumps(trace))
Step 2. Add to .github/workflows/agent-test.yml:
name: Agent Tests
on: [push, pull_request]
jobs:
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Run agent tests
run: pytest tests/
continue-on-error: true # let agent-debug run even if tests fail
- name: Diagnose agent failures
uses: Viktorsdb/agent-debug@main
with:
api_key: ${{ secrets.ANTHROPIC_API_KEY }}
provider: anthropic # or: openai, deepseek
traces_dir: agent_traces
Step 3. Add your API key as a GitHub secret:
Settings → Secrets → Actions → New repository secret
What you get on every PR
🤖 agent-debug diagnosis
File: agent_traces/my_agent.trace.json
🟠 wrong_tool.scope_confusion — severity 3/5 (Medium)
> The agent stopped before completing all parts of the task.
Confidence: 95%
🔍 Root Cause
Failing step: Step 0 (tool_call)
The agent confused tool scope...
🔧 Suggested Fixes
Fix #1 — system_prompt 🟢 92% confidence
Before: "You are a helpful assistant..."
After: "...always use read_file for local files, not web_search."
Action inputs
| Input | Required | Default | Description |
|---|---|---|---|
api_key |
✅ | — | Your LLM API key |
provider |
— | anthropic |
anthropic / openai / deepseek / ollama |
model |
— | provider default | Override model name |
base_url |
— | — | Custom API endpoint (for proxies) |
traces_dir |
— | agent_traces |
Directory with trace JSON files |
fail_on_severity |
— | 3 |
Fail CI if severity ≥ this (0 = never fail) |
Multi-Provider Support
agent-debug uses Claude / GPT-4o / DeepSeek / Ollama to run the analysis. Bring your own key.
# Anthropic (default)
ANTHROPIC_API_KEY=sk-ant-... agent-debug analyze trace.json
# OpenAI
OPENAI_API_KEY=sk-... AGENT_DEBUG_PROVIDER=openai agent-debug analyze trace.json
# DeepSeek — cheapest option (~$0.001/analysis)
DEEPSEEK_API_KEY=sk-... AGENT_DEBUG_PROVIDER=deepseek agent-debug analyze trace.json
# Ollama — completely free, runs locally
AGENT_DEBUG_PROVIDER=ollama agent-debug analyze trace.json
# Third-party relay (any OpenAI-compatible endpoint)
OPENAI_API_KEY=sk-... OPENAI_BASE_URL=https://my-proxy.com/v1 \
AGENT_DEBUG_PROVIDER=openai agent-debug analyze trace.json
Python API with custom provider
from agent_debug import DiagnosisPipeline
from agent_debug.providers import get_provider
# Use any provider
pipeline = DiagnosisPipeline(provider=get_provider("deepseek"))
report = pipeline.run(raw_trace_dict)
print(report["classification"]["subcategory"]) # "wrong_tool.scope_confusion"
print(report["severity"]["severity"]) # 3
print(report["root_cause"]["root_cause_explanation"])
for fix in report["fixes"]["suggestions"]:
print(f"[{fix['target']}]\nBefore: {fix['before']}\nAfter: {fix['after']}\n")
Failure Taxonomy
15 subcategories across 6 categories:
| Category | Subcategories |
|---|---|
wrong_tool |
similar_name · missing_guidance · scope_confusion |
hallucination |
missing_retrieval · domain_gap · format_pressure |
premature_stop |
ambiguous_done · error_avoidance · max_steps_hit |
context_overflow |
long_conversation · large_tool_output |
tool_misinterpretation |
schema_mismatch · error_ignored · partial_result |
prompt_ambiguity |
conflicting_instructions · underspecified_scope |
How It Works
trace.json
│
▼
[Adapter] Detect format (OpenAI / Claude / LangChain) → normalize
│
▼
[PatternClassifier] Classify into 1 of 15 subcategories
│
▼
[SeverityEstimator] Rate severity 1–5
│
▼
[RootCauseAnalyst] Pinpoint the exact failing step + explain why
│
▼
[FixGenerator] Generate before/after diffs for prompt/tool fixes
│
▼
DiagnosisReport → terminal / JSON / GitHub PR comment
Typical runtime: ~15 seconds. Typical cost: $0.01–$0.05 per trace.
Development
git clone https://github.com/Viktorsdb/agent-debug
cd agent-debug
uv sync
uv run pytest tests/ -v
Tests run without any API key — adapters and base agent logic are fully deterministic.
License
MIT
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file agent_debug-0.2.0.tar.gz.
File metadata
- Download URL: agent_debug-0.2.0.tar.gz
- Upload date:
- Size: 66.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5ed032a0baae684855539bfa81f829f16f902987af9bb1f84744f7a1e0322e4d
|
|
| MD5 |
e40ddfc09833c7c6055858ca89dca7d6
|
|
| BLAKE2b-256 |
ff5f7ebe4a6444499e17576f0248c3c74c52a888e6170aec0db1a1f72bbacf37
|
File details
Details for the file agent_debug-0.2.0-py3-none-any.whl.
File metadata
- Download URL: agent_debug-0.2.0-py3-none-any.whl
- Upload date:
- Size: 42.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4a8e110752c948281aae203329cb7ae0192c67e44546d08936797dae9c0982cd
|
|
| MD5 |
284bc432f533ef3940624ada7f9762c3
|
|
| BLAKE2b-256 |
f1fa421efa1f5d3b2c07830dcd2183fe376061065176e14dc1a739f41e83546a
|