Open-source diagnostic SDK for detecting failure pathologies in AI agent systems
Project description
agentdx
Open-source diagnostic SDK for detecting failure pathologies in AI agent systems.
agentdx detects operational failure modes in AI agent systems — the kind of failures that observability tools miss because they happen at the reasoning level, not the infrastructure level. It analyses agent execution traces and produces structured diagnostic reports identifying specific pathologies.
Why agentdx?
Existing agent observability tools (LangSmith, Arize, Datadog LLM) excel at tracing: they show you what happened. agentdx does diagnosis: it tells you what went wrong and why.
| Tool Type | What it answers | Examples |
|---|---|---|
| Tracing / Observability | "What calls did the agent make?" | LangSmith, Arize, Datadog |
| Diagnostics | "Why did the agent fail?" | agentdx |
The Seven Pathologies
agentdx detects seven operational failure modes, designed to align with the OWASP Top 10 for Agentic Applications and UC Berkeley's MAST framework:
| Pathology | What it means |
|---|---|
| Context Erosion | Agent loses critical context over long conversations or multi-step tasks |
| Tool Thrashing | Agent repeatedly calls tools with ineffective or contradictory parameters |
| Instruction Drift | Agent gradually deviates from its original instructions or mandate |
| Recovery Blindness | Agent fails to detect or recover from errors in its own execution |
| Hallucinated Tool Success | Agent treats a failed tool call as successful and proceeds on false premises |
| Goal Hijacking | Agent's objective is altered by adversarial input or environmental manipulation |
| Silent Degradation | Agent's output quality deteriorates without triggering explicit errors |
Quick Start
pip install agentdx
from agentdx import Diagnoser, JSONParser
# Parse an agent execution trace (file path, dict, or message list)
parser = JSONParser()
trace = parser.parse("path/to/trace.json")
# Run all detectors
diagnoser = Diagnoser()
report = diagnoser.diagnose(trace)
# View results
print(report.summary())
report.to_json("diagnostic_report.json")
report.to_markdown("diagnostic_report.md")
The trace file is a JSON object with a messages array. Each message has a role, content, and optional tool_calls:
{
"trace_id": "my-trace",
"messages": [
{"role": "user", "content": "Find the best Python testing framework."},
{
"role": "assistant",
"content": "Let me search for that.",
"tool_calls": [
{
"tool_name": "web_search",
"arguments": {"query": "best Python testing framework"},
"result": "No relevant results found.",
"success": true
}
]
}
]
}
Architecture
agentdx uses a three-tier detection architecture:
Trace Input → Parser → Normalised Trace → Detectors → Diagnostic Report
│
┌──────────┼──────────┐
│ │ │
Rule-Based ML Model LLM-as-Judge
(< 10ms) (< 100ms) (async)
Tier 1 — Rule-Based (v0.1.0): Deterministic pattern matching on trace structure. Fast, interpretable, zero external dependencies.
Tier 2 — ML Classifier (planned): Trained classifiers for pathologies that require statistical pattern recognition.
Tier 3 — LLM-as-Judge (planned): Asynchronous LLM evaluation for complex, context-dependent pathology detection.
Supported Frameworks
| Framework | Parser | Status |
|---|---|---|
| Raw JSON traces | JSONParser |
v0.1.0 |
| LangChain / LangGraph | LangChainParser |
Planned |
| CrewAI | CrewAIParser |
Planned |
| AutoGen / AG2 | AutoGenParser |
Planned |
| OpenAI Agents SDK | OpenAIAgentParser |
Planned |
Output Formats
- JSON — machine-readable diagnostic report
- Markdown — human-readable summary with severity ratings
Research
agentdx is developed by MLDeep Systems, an AI and data consulting firm specialising in agent reliability.
Related publications:
- Polara, A. (2026). The Case for a Global Agent Identity Regime. MLDeep Systems. mldeep.io/research
The failure pathology taxonomy draws on:
- OWASP GenAI Security Project. (2025). Top 10 for Agentic Applications. genai.owasp.org
- Cemri, M., Pan, M., et al. (2025). Why Do Multi-Agent LLM Systems Fail? arXiv:2503.13657. arxiv.org
Contributing
We welcome contributions. See CONTRIBUTING.md for guidelines.
Citation
If you use agentdx in your research, please cite:
@software{mldeepsystemsagentdx,
author = {Parimoo, Anmol},
title = {agentdx: An Open-Source Diagnostic SDK for AI Agent Reliability},
year = {2026},
url = {https://github.com/mldeepsystems/agentdx},
license = {MIT}
}
License
MIT. See LICENSE for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file agentdx-0.1.0a1.tar.gz.
File metadata
- Download URL: agentdx-0.1.0a1.tar.gz
- Upload date:
- Size: 43.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
45ac3ac90dce7151aad3d609cac74e538066e323c070caa13b9bcde433401718
|
|
| MD5 |
1695f572ea92f4672f3d745fd1bedfd1
|
|
| BLAKE2b-256 |
e200cb1bc00df754a53a7f0c3ad529aac0549c5fe5fce33ce3e089e2ade68f4f
|
File details
Details for the file agentdx-0.1.0a1-py3-none-any.whl.
File metadata
- Download URL: agentdx-0.1.0a1-py3-none-any.whl
- Upload date:
- Size: 30.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
eaadb881d9a81af2c55f58f5bf5a82be478750c7b45df6c8d78bfc45530d71a8
|
|
| MD5 |
fe63719329171768b6e3060e39e99ed4
|
|
| BLAKE2b-256 |
b733ee8c772c0320aa22955255f8f2847327f08b6792139b622fec04e4406641
|