Diagnose why your AI agent failed. Root cause analysis + fix suggestions.

These details have not been verified by PyPI

Project links

Project description

agent-debug

Diagnose why your AI agent failed. Root cause analysis + concrete fix suggestions, in your terminal.

$ agent-debug analyze trace.json

  Failure Classification
  wrong_tool.scope_confusion  severity 3/5  confidence 95%
  The agent used web_search instead of read_file for a local file task.

  Root Cause
  Step 0 (tool_call)
  The agent confused the scope of available tools by incorrectly assuming
  'report.txt' was a web-accessible resource...

  Fix #1  target: system_prompt  confidence 92%
  Before: "You are a helpful assistant with access to tools."
  After:  "...When given a task involving a specific file (e.g., 'read report.txt'),
           always use a local file tool, NOT a web search."

Why

Debugging a failed agent trace is painful. You stare at 50 lines of tool calls trying to figure out why it went wrong and what to change. agent-debug runs a multi-agent analysis pipeline on the trace and gives you:

What failed — one of 15 precise subcategories (not just "it broke")
Why it failed — root cause explanation pointing to the exact step
How to fix it — before/after diffs for your system prompt or tool definitions

Think of it as Sentry for agents.

Install

pip install agent-debug

Requires Python 3.11+ and an Anthropic API key.

Quick Start

1. Capture a trace

Save your agent's execution as a JSON file. Supported formats: OpenAI, Claude SDK, LangChain.

OpenAI format

{
  "task_description": "Read report.txt and summarize it",
  "system_prompt": "You are a helpful assistant.",
  "tool_definitions": [],
  "messages": [],
  "choices": [],
  "succeeded": false
}

Claude SDK format

{
  "task_description": "List Python files and count lines",
  "system_prompt": "You are a code analysis assistant.",
  "tool_definitions": [],
  "messages": [],
  "final_response": { "stop_reason": "end_turn", "content": [], "usage": {} },
  "succeeded": false
}

LangChain format

{
  "task_description": "Get the stock price of AAPL",
  "input": "Get the stock price of AAPL",
  "intermediate_steps": [
    [{"tool": "get_stock_price", "tool_input": "AAPL stock"}, "Error: Invalid ticker"]
  ],
  "output": "The price is $182.50",
  "succeeded": false
}

2. Analyze

export ANTHROPIC_API_KEY=sk-...

agent-debug analyze trace.json

Save report to file:

agent-debug analyze trace.json --output report.json

3. Pre-deploy risk scan

Catch problems before your agent runs:

agent-debug scan config.json

Where config.json contains your system_prompt and tool_definitions.

Failure Taxonomy

agent-debug classifies failures into 15 subcategories across 6 categories:

Category	Subcategories
`wrong_tool`	`similar_name` · `missing_guidance` · `scope_confusion`
`hallucination`	`missing_retrieval` · `domain_gap` · `format_pressure`
`premature_stop`	`ambiguous_done` · `error_avoidance` · `max_steps_hit`
`context_overflow`	`long_conversation` · `large_tool_output`
`tool_misinterpretation`	`schema_mismatch` · `error_ignored` · `partial_result`
`prompt_ambiguity`	`conflicting_instructions` · `underspecified_scope`

How It Works

4 Claude agents run in sequence on your trace:

trace.json
    │
    ▼
[Adapter]             Normalize OpenAI / Claude / LangChain → common format
    │
    ▼
[PatternClassifier]   Classify into 1 of 15 subcategories
    │
    ▼
[SeverityEstimator]   Rate severity 1–5
    │
    ▼
[RootCauseAnalyst]    Pinpoint the exact failing step + explain why
    │
    ▼
[FixGenerator]        Generate before/after diffs for prompt/tool fixes
    │
    ▼
DiagnosisReport

Typical runtime: ~15 seconds. Typical cost: $0.02–$0.05 per trace.

Python API

from agent_debug import DiagnosisPipeline

pipeline = DiagnosisPipeline()
report = pipeline.run(raw_trace_dict)

print(report["classification"]["subcategory"])   # e.g. "wrong_tool.scope_confusion"
print(report["severity"]["severity"])            # e.g. 3
print(report["root_cause"]["root_cause_explanation"])
for fix in report["fixes"]["suggestions"]:
    print(fix["before"], "→", fix["after"])

Custom base URL (third-party Claude API)

import anthropic
from agent_debug import DiagnosisPipeline

client = anthropic.Anthropic(
    api_key="sk-...",
    base_url="https://your-proxy.example.com/claude",
)
pipeline = DiagnosisPipeline(client=client)
report = pipeline.run(trace)

Or via environment variables:

export ANTHROPIC_API_KEY=sk-...
export ANTHROPIC_BASE_URL=https://your-proxy.example.com/claude
agent-debug analyze trace.json

Development

git clone https://github.com/Viktorsdb/agent-debug
cd agent-debug
uv sync
uv run pytest tests/test_adapters/ tests/test_agents/ -v

Tests run without an API key (adapters and base agent logic are fully deterministic).

License

MIT

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.4.0

Apr 15, 2026

0.3.0

Apr 15, 2026

0.2.0

Apr 14, 2026

This version

0.1.0

Apr 14, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

agent_debug-0.1.0.tar.gz (54.6 kB view details)

Uploaded Apr 14, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

agent_debug-0.1.0-py3-none-any.whl (26.5 kB view details)

Uploaded Apr 14, 2026 Python 3

File details

Details for the file agent_debug-0.1.0.tar.gz.

File metadata

Download URL: agent_debug-0.1.0.tar.gz
Upload date: Apr 14, 2026
Size: 54.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for agent_debug-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`97c8450bb58dbd1fc1b6110f0123bef5b09a31601e7796a30968dda5478b0d31`
MD5	`5b2574ba1be3142065e160817e1b30d7`
BLAKE2b-256	`bcba40a2cda3049b59d14a68a1ee0c5a976c6ed31ec9b51893c94f2ba5ac4f04`

See more details on using hashes here.

File details

Details for the file agent_debug-0.1.0-py3-none-any.whl.

File metadata

Download URL: agent_debug-0.1.0-py3-none-any.whl
Upload date: Apr 14, 2026
Size: 26.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for agent_debug-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`80e45c4f620fad3246dba2f9d0a884273942ea4a08e007bc462feffaa178bca1`
MD5	`2913a382c6961867d664291ddbddcf1b`
BLAKE2b-256	`3845b21adf3ee4dc8551250e35eb367eb4771127a290e86c1bce0ea97c24feb1`

See more details on using hashes here.

agent-debug 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

agent-debug

Why

Install

Quick Start

1. Capture a trace

2. Analyze

3. Pre-deploy risk scan

Failure Taxonomy

How It Works

Python API

Custom base URL (third-party Claude API)

Development

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes