Skip to main content

Open-source diagnostic SDK for detecting failure pathologies in AI agent systems

Project description

agentdx

Open-source diagnostic SDK for detecting failure pathologies in AI agent systems.

License: MIT Python 3.10+ PyPI


agentdx detects operational failure modes in AI agent systems — the kind of failures that observability tools miss because they happen at the reasoning level, not the infrastructure level. It analyses agent execution traces and produces structured diagnostic reports identifying specific pathologies.

Why agentdx?

Existing agent observability tools (LangSmith, Arize, Datadog LLM) excel at tracing: they show you what happened. agentdx does diagnosis: it tells you what went wrong and why.

Tool Type What it answers Examples
Tracing / Observability "What calls did the agent make?" LangSmith, Arize, Datadog
Diagnostics "Why did the agent fail?" agentdx

The Seven Pathologies

agentdx detects seven operational failure modes, designed to align with the OWASP Top 10 for Agentic Applications and UC Berkeley's MAST framework:

Pathology What it means
Context Erosion Agent loses critical context over long conversations or multi-step tasks
Tool Thrashing Agent repeatedly calls tools with ineffective or contradictory parameters
Instruction Drift Agent gradually deviates from its original instructions or mandate
Recovery Blindness Agent fails to detect or recover from errors in its own execution
Hallucinated Tool Success Agent treats a failed tool call as successful and proceeds on false premises
Goal Hijacking Agent's objective is altered by adversarial input or environmental manipulation
Silent Degradation Agent's output quality deteriorates without triggering explicit errors

Quick Start

pip install agentdx
from agentdx import Diagnoser, JSONParser

# Parse an agent execution trace (file path, dict, or message list)
parser = JSONParser()
trace = parser.parse("path/to/trace.json")

# Run all detectors
diagnoser = Diagnoser()
report = diagnoser.diagnose(trace)

# View results
print(report.summary())
report.to_json("diagnostic_report.json")
report.to_markdown("diagnostic_report.md")

The trace file is a JSON object with a messages array. Each message has a role, content, and optional tool_calls:

{
  "trace_id": "my-trace",
  "messages": [
    {"role": "user", "content": "Find the best Python testing framework."},
    {
      "role": "assistant",
      "content": "Let me search for that.",
      "tool_calls": [
        {
          "tool_name": "web_search",
          "arguments": {"query": "best Python testing framework"},
          "result": "No relevant results found.",
          "success": true
        }
      ]
    }
  ]
}

Architecture

agentdx uses a three-tier detection architecture:

Trace Input → Parser → Normalised Trace → Detectors → Diagnostic Report
                                              │
                                   ┌──────────┼──────────┐
                                   │          │          │
                              Rule-Based   ML Model   LLM-as-Judge
                              (< 10ms)    (< 100ms)   (async)

Tier 1 — Rule-Based (v0.1.0): Deterministic pattern matching on trace structure. Fast, interpretable, zero external dependencies.

Tier 2 — ML Classifier (planned): Trained classifiers for pathologies that require statistical pattern recognition.

Tier 3 — LLM-as-Judge (planned): Asynchronous LLM evaluation for complex, context-dependent pathology detection.

Supported Frameworks

Framework Parser Status
Raw JSON traces JSONParser v0.1.0
LangChain / LangGraph LangChainParser Planned
CrewAI CrewAIParser Planned
AutoGen / AG2 AutoGenParser Planned
OpenAI Agents SDK OpenAIAgentParser Planned

Output Formats

  • JSON — machine-readable diagnostic report
  • Markdown — human-readable summary with severity ratings

Research

agentdx is developed by MLDeep Systems, an AI and data consulting firm specialising in agent reliability.

Related publications:

  • Polara, A. (2026). The Case for a Global Agent Identity Regime. MLDeep Systems. mldeep.io/research

The failure pathology taxonomy draws on:

  • OWASP GenAI Security Project. (2025). Top 10 for Agentic Applications. genai.owasp.org
  • Cemri, M., Pan, M., et al. (2025). Why Do Multi-Agent LLM Systems Fail? arXiv:2503.13657. arxiv.org

Contributing

We welcome contributions. See CONTRIBUTING.md for guidelines.

Citation

If you use agentdx in your research, please cite:

@software{mldeepsystemsagentdx,
  author    = {Parimoo, Anmol},
  title     = {agentdx: An Open-Source Diagnostic SDK for AI Agent Reliability},
  year      = {2026},
  url       = {https://github.com/mldeepsystems/agentdx},
  license   = {MIT}
}

License

MIT. See LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

agentdx-0.1.0a1.tar.gz (43.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

agentdx-0.1.0a1-py3-none-any.whl (30.7 kB view details)

Uploaded Python 3

File details

Details for the file agentdx-0.1.0a1.tar.gz.

File metadata

  • Download URL: agentdx-0.1.0a1.tar.gz
  • Upload date:
  • Size: 43.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for agentdx-0.1.0a1.tar.gz
Algorithm Hash digest
SHA256 45ac3ac90dce7151aad3d609cac74e538066e323c070caa13b9bcde433401718
MD5 1695f572ea92f4672f3d745fd1bedfd1
BLAKE2b-256 e200cb1bc00df754a53a7f0c3ad529aac0549c5fe5fce33ce3e089e2ade68f4f

See more details on using hashes here.

File details

Details for the file agentdx-0.1.0a1-py3-none-any.whl.

File metadata

  • Download URL: agentdx-0.1.0a1-py3-none-any.whl
  • Upload date:
  • Size: 30.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for agentdx-0.1.0a1-py3-none-any.whl
Algorithm Hash digest
SHA256 eaadb881d9a81af2c55f58f5bf5a82be478750c7b45df6c8d78bfc45530d71a8
MD5 fe63719329171768b6e3060e39e99ed4
BLAKE2b-256 b733ee8c772c0320aa22955255f8f2847327f08b6792139b622fec04e4406641

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page