Open-source diagnostic SDK for detecting failure pathologies in AI agent systems

These details have not been verified by PyPI

Project links

Project description

agentdx

Open-source diagnostic SDK for detecting failure pathologies in AI agent systems.

agentdx detects operational failure modes in AI agent systems — the kind of failures that observability tools miss because they happen at the reasoning level, not the infrastructure level. It analyses agent execution traces and produces structured diagnostic reports identifying specific pathologies.

Why agentdx?

Existing agent observability tools (LangSmith, Arize, Datadog LLM) excel at tracing: they show you what happened. agentdx does diagnosis: it tells you what went wrong and why.

Tool Type	What it answers	Examples
Tracing / Observability	"What calls did the agent make?"	LangSmith, Arize, Datadog
Diagnostics	"Why did the agent fail?"	agentdx

The Seven Pathologies

agentdx detects seven operational failure modes, designed to align with the OWASP Top 10 for Agentic Applications and UC Berkeley's MAST framework:

Pathology	What it means
Context Erosion	Agent loses critical context over long conversations or multi-step tasks
Tool Thrashing	Agent repeatedly calls tools with ineffective or contradictory parameters
Instruction Drift	Agent gradually deviates from its original instructions or mandate
Recovery Blindness	Agent fails to detect or recover from errors in its own execution
Hallucinated Tool Success	Agent treats a failed tool call as successful and proceeds on false premises
Goal Hijacking	Agent's objective is altered by adversarial input or environmental manipulation
Silent Degradation	Agent's output quality deteriorates without triggering explicit errors

Quick Start

pip install agentdx

from agentdx import Diagnoser, JSONParser

# Parse an agent execution trace (file path, dict, or message list)
parser = JSONParser()
trace = parser.parse("path/to/trace.json")

# Run all detectors
diagnoser = Diagnoser()
report = diagnoser.diagnose(trace)

# View results
print(report.summary())
report.to_json("diagnostic_report.json")
report.to_markdown("diagnostic_report.md")

The trace file is a JSON object with a messages array. Each message has a role, content, and optional tool_calls:

{
  "trace_id": "my-trace",
  "messages": [
    {"role": "user", "content": "Find the best Python testing framework."},
    {
      "role": "assistant",
      "content": "Let me search for that.",
      "tool_calls": [
        {
          "tool_name": "web_search",
          "arguments": {"query": "best Python testing framework"},
          "result": "No relevant results found.",
          "success": true
        }
      ]
    }
  ]
}

Architecture

agentdx uses a three-tier detection architecture:

Trace Input → Parser → Normalised Trace → Detectors → Diagnostic Report
                                              │
                                   ┌──────────┼──────────┐
                                   │          │          │
                              Rule-Based   ML Model   LLM-as-Judge
                              (< 10ms)    (< 100ms)   (async)

Tier 1 — Rule-Based (v0.1.0): Deterministic pattern matching on trace structure. Fast, interpretable, zero external dependencies.

Tier 2 — ML Classifier (planned): Trained classifiers for pathologies that require statistical pattern recognition.

Tier 3 — LLM-as-Judge (planned): Asynchronous LLM evaluation for complex, context-dependent pathology detection.

Evaluation

We validated the Tier 1 detectors against 42 synthetic agent traces covering all 7 pathologies. The evaluation is a development validation — it confirms detectors fire on intended patterns, not that they generalise to production traces.

What the evaluation shows:

Detectors correctly identify the target pathology in traces designed to exhibit it
Cross-detector interference reveals that agent failures cluster (e.g., Recovery Blindness and Hallucinated Tool Success co-fire on failed tool calls)
One known false negative (gh_03): a cooking assistant answering quantum physics questions — topic shifts without injection keywords are invisible to rule-based detection

What it does not show:

Performance on production or adversarial traces (all traces are synthetic)
Statistical significance (3–6 traces per detector; 95% CI for 3/3 recall is [0.29, 1.00])
Comparison against baselines or alternative approaches

agentdx is best used as a debugging aid during agent development. See case_study/walkthrough.ipynb for the full evaluation with dual metrics, confidence intervals, and per-detector analysis.

Supported Frameworks

Framework	Parser	Status
Raw JSON traces	`JSONParser`	v0.1.0
LangChain / LangGraph	`LangChainParser`	Planned
CrewAI	`CrewAIParser`	Planned
AutoGen / AG2	`AutoGenParser`	Planned
OpenAI Agents SDK	`OpenAIAgentParser`	Planned

Output Formats

JSON — machine-readable diagnostic report
Markdown — human-readable summary with severity ratings

Research

agentdx is developed by MLDeep Systems, an AI and data consulting firm specialising in agent reliability. See mldeep.io/research for related publications.

The failure pathology taxonomy draws on:

OWASP GenAI Security Project. (2025). Top 10 for Agentic Applications. genai.owasp.org
Cemri, M., Pan, M., et al. (2025). Why Do Multi-Agent LLM Systems Fail? arXiv:2503.13657. arxiv.org

Contributing

We welcome contributions. See CONTRIBUTING.md for guidelines.

Citation

If you use agentdx in your research, please cite:

@software{mldeepsystemsagentdx,
  author    = {Parimoo, Anmol},
  title     = {agentdx: An Open-Source Diagnostic SDK for AI Agent Reliability},
  year      = {2026},
  url       = {https://github.com/mldeepsystems/agentdx},
  license   = {MIT}
}

License

MIT. See LICENSE for details.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.0a3 pre-release

Mar 17, 2026

0.1.0a2 pre-release

Mar 16, 2026

0.1.0a1 pre-release

Mar 16, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

agentdx-0.1.0a3.tar.gz (52.7 kB view details)

Uploaded Mar 17, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

agentdx-0.1.0a3-py3-none-any.whl (32.4 kB view details)

Uploaded Mar 17, 2026 Python 3

File details

Details for the file agentdx-0.1.0a3.tar.gz.

File metadata

Download URL: agentdx-0.1.0a3.tar.gz
Upload date: Mar 17, 2026
Size: 52.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for agentdx-0.1.0a3.tar.gz
Algorithm	Hash digest
SHA256	`ee4eb7c036927113383abdbaa3a3115c037b7bfedb0749f545c3c25370708893`
MD5	`5c61d5672d193bbc5c54b8ec7bbc7c66`
BLAKE2b-256	`37a09ca284d449951f43bc36f3689287e516273cb263b453cbbd137832a5e0de`

See more details on using hashes here.

File details

Details for the file agentdx-0.1.0a3-py3-none-any.whl.

File metadata

Download URL: agentdx-0.1.0a3-py3-none-any.whl
Upload date: Mar 17, 2026
Size: 32.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for agentdx-0.1.0a3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`6ec4f042271ceb958228eaab6248349df08c8456e177aeda27153d573e236022`
MD5	`91a9d3b8726453fb05f54c7913989b91`
BLAKE2b-256	`9bb550b381d8ee5c81e3ca71c9edc7552dba26d3b8f0d99a70336a07a84bcfc6`

See more details on using hashes here.

agentdx 0.1.0a3

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

agentdx

Why agentdx?

The Seven Pathologies

Quick Start

Architecture

Evaluation

Supported Frameworks

Output Formats

Research

Contributing

Citation

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes