Skip to main content

Diagnose why your AI agent failed. Root cause analysis + fix suggestions.

Project description

agent-debug

Diagnose why your AI agent failed. Root cause analysis + concrete fix suggestions, in your terminal.

$ agent-debug analyze trace.json

  Failure Classification
  wrong_tool.scope_confusion  severity 3/5  confidence 95%
  The agent used web_search instead of read_file for a local file task.

  Root Cause
  Step 0 (tool_call)
  The agent confused the scope of available tools by incorrectly assuming
  'report.txt' was a web-accessible resource...

  Fix #1  target: system_prompt  confidence 92%
  Before: "You are a helpful assistant with access to tools."
  After:  "...When given a task involving a specific file (e.g., 'read report.txt'),
           always use a local file tool, NOT a web search."

Why

Debugging a failed agent trace is painful. You stare at 50 lines of tool calls trying to figure out why it went wrong and what to change. agent-debug runs a multi-agent analysis pipeline on the trace and gives you:

  • What failed — one of 15 precise subcategories (not just "it broke")
  • Why it failed — root cause explanation pointing to the exact step
  • How to fix it — before/after diffs for your system prompt or tool definitions

Think of it as Sentry for agents.


Install

pip install agent-debug

Requires Python 3.11+ and an Anthropic API key.


Quick Start

1. Capture a trace

Save your agent's execution as a JSON file. Supported formats: OpenAI, Claude SDK, LangChain.

OpenAI format
{
  "task_description": "Read report.txt and summarize it",
  "system_prompt": "You are a helpful assistant.",
  "tool_definitions": [],
  "messages": [],
  "choices": [],
  "succeeded": false
}
Claude SDK format
{
  "task_description": "List Python files and count lines",
  "system_prompt": "You are a code analysis assistant.",
  "tool_definitions": [],
  "messages": [],
  "final_response": { "stop_reason": "end_turn", "content": [], "usage": {} },
  "succeeded": false
}
LangChain format
{
  "task_description": "Get the stock price of AAPL",
  "input": "Get the stock price of AAPL",
  "intermediate_steps": [
    [{"tool": "get_stock_price", "tool_input": "AAPL stock"}, "Error: Invalid ticker"]
  ],
  "output": "The price is $182.50",
  "succeeded": false
}

2. Analyze

export ANTHROPIC_API_KEY=sk-...

agent-debug analyze trace.json

Save report to file:

agent-debug analyze trace.json --output report.json

3. Pre-deploy risk scan

Catch problems before your agent runs:

agent-debug scan config.json

Where config.json contains your system_prompt and tool_definitions.


Failure Taxonomy

agent-debug classifies failures into 15 subcategories across 6 categories:

Category Subcategories
wrong_tool similar_name · missing_guidance · scope_confusion
hallucination missing_retrieval · domain_gap · format_pressure
premature_stop ambiguous_done · error_avoidance · max_steps_hit
context_overflow long_conversation · large_tool_output
tool_misinterpretation schema_mismatch · error_ignored · partial_result
prompt_ambiguity conflicting_instructions · underspecified_scope

How It Works

4 Claude agents run in sequence on your trace:

trace.json
    │
    ▼
[Adapter]             Normalize OpenAI / Claude / LangChain → common format
    │
    ▼
[PatternClassifier]   Classify into 1 of 15 subcategories
    │
    ▼
[SeverityEstimator]   Rate severity 1–5
    │
    ▼
[RootCauseAnalyst]    Pinpoint the exact failing step + explain why
    │
    ▼
[FixGenerator]        Generate before/after diffs for prompt/tool fixes
    │
    ▼
DiagnosisReport

Typical runtime: ~15 seconds. Typical cost: $0.02–$0.05 per trace.


Python API

from agent_debug import DiagnosisPipeline

pipeline = DiagnosisPipeline()
report = pipeline.run(raw_trace_dict)

print(report["classification"]["subcategory"])   # e.g. "wrong_tool.scope_confusion"
print(report["severity"]["severity"])            # e.g. 3
print(report["root_cause"]["root_cause_explanation"])
for fix in report["fixes"]["suggestions"]:
    print(fix["before"], "→", fix["after"])

Custom base URL (third-party Claude API)

import anthropic
from agent_debug import DiagnosisPipeline

client = anthropic.Anthropic(
    api_key="sk-...",
    base_url="https://your-proxy.example.com/claude",
)
pipeline = DiagnosisPipeline(client=client)
report = pipeline.run(trace)

Or via environment variables:

export ANTHROPIC_API_KEY=sk-...
export ANTHROPIC_BASE_URL=https://your-proxy.example.com/claude
agent-debug analyze trace.json

Development

git clone https://github.com/Viktorsdb/agent-debug
cd agent-debug
uv sync
uv run pytest tests/test_adapters/ tests/test_agents/ -v

Tests run without an API key (adapters and base agent logic are fully deterministic).


License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

agent_debug-0.1.0.tar.gz (54.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

agent_debug-0.1.0-py3-none-any.whl (26.5 kB view details)

Uploaded Python 3

File details

Details for the file agent_debug-0.1.0.tar.gz.

File metadata

  • Download URL: agent_debug-0.1.0.tar.gz
  • Upload date:
  • Size: 54.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for agent_debug-0.1.0.tar.gz
Algorithm Hash digest
SHA256 97c8450bb58dbd1fc1b6110f0123bef5b09a31601e7796a30968dda5478b0d31
MD5 5b2574ba1be3142065e160817e1b30d7
BLAKE2b-256 bcba40a2cda3049b59d14a68a1ee0c5a976c6ed31ec9b51893c94f2ba5ac4f04

See more details on using hashes here.

File details

Details for the file agent_debug-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: agent_debug-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 26.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for agent_debug-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 80e45c4f620fad3246dba2f9d0a884273942ea4a08e007bc462feffaa178bca1
MD5 2913a382c6961867d664291ddbddcf1b
BLAKE2b-256 3845b21adf3ee4dc8551250e35eb367eb4771127a290e86c1bce0ea97c24feb1

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page