Skip to main content

Diagnose and auto-fix AI agent performance bottlenecks.

Project description

agentslow

AI agent diagnostics CLI — diagnose, benchmark, and auto-fix AI agent performance.

Part of A.I Shovels — tools that dig into AI infrastructure problems.

pip install agentslow

What it does

One command tells you why your AI agent is slow, expensive, or unreliable — and fixes it.

agentslow diagnose trace.yaml

agentslow classifies your agent into one of four performance regimes (Context-Bound, Reasoning-Bound, Tool-Bound, IO-Bound), computes novel metrics like Token Efficiency Ratio and Tool Re-entry Rate, then prescribes auto-applicable fixes with machine-readable config patches.

Actual CLI Output (not marketing — run it yourself)

$ agentslow diagnose examples/openclaw_research_agent.yaml --entropy --fidelity \
    --config examples/agent_config_baseline.yaml --dry-run

═══ agentslow v0.8.0 ═══
Agent: openclaw_research_v2 (langgraph)
Task: Research competitor pricing for SaaS product and compile report
Status: ✓ Success

── REGIME CLASSIFICATION ──
  Primary: CONTEXT-BOUND
  Confidence: 90%

── KEY METRICS ──
  Token Efficiency Ratio (TER): 0.0147
  Tool Re-entry Rate:           0.2500
  Time-to-First-Action:         1800ms
  Reasoning Ratio:              0.0983
  Total Cost:                   $0.5403
  Total Duration:               32850ms

── PRESCRIPTIVE FIXES ──
  1. [CRITICAL] [AUTO-FIX] Implement context compaction (summarization)
     Token Efficiency Ratio is 0.015 (healthy: >0.15). Your context is bloated
     with irrelevant tokens. Add a summarization step every N turns to compress
     conversation history.
     → Expected: 50-70% token reduction, major cost savings

  2. [HIGH] [AUTO-FIX] Enable prompt/prefix caching
     Total input tokens: 110,100. Enable prefix caching to avoid re-processing
     the same system prompt on every LLM call.
     → Expected: 30-50% latency reduction on repeated calls

  3. [MEDIUM] Tune RAG retrieval — retrieve less, retrieve better
     You're likely stuffing too many documents into context. Reduce top_k, add
     re-ranking, or switch to semantic chunking.
     → Expected: Fewer tokens = lower cost + faster inference

═══ CONTEXT ENTROPY ANALYSIS ═══
Session: openclaw_research_v2
Total Turns: 7

── ENTROPY METRICS ──
  Average Entropy:         0.9374
  Max Entropy:             1.0000
  Entropy Trend:           INCREASING
  Semantic Drift Ratio:    0.1034
  Noise Ratio:             0.2857
  Compaction Integrity:    1.0000

── VERDICT ──
  ✗ CRITICAL — Context is critical

═══ DRY-RUN: Implement context compaction (summarization) ═══
  context_compaction.enabled: false → true  [CAUTION]
  context_compaction.strategy: (none) → summarize_every_n  [CAUTION]
  context_compaction.n_turns: (none) → 5  [CAUTION]
  Rollback: agentslow rollback --fix-id context-001

═══ DRY-RUN: Enable prompt/prefix caching ═══
  enable_prompt_caching: false → true  [SAFE]
  Rollback: agentslow rollback --fix-id context-002

═══ GOLDEN SET FIDELITY TEST ═══
Tests Run: 15
Passed: 15 | Failed: 0
Overall Fidelity: 1.0000

── VERDICT ──
  ✓ PASSED — Safe to apply in production.

That's one command. Regime classification + metrics + fixes + entropy audit + dry-run diffs + fidelity verification (15 golden cases covering all 4 regimes).

Benchmark: Before/After Proof

$ agentslow benchmark examples/agent_config_baseline.yaml --compare --tasks 5

═══ BENCHMARK COMPARISON ═══
Tasks: 5

── BEFORE (baseline) ──
  Tokens:  42,576 avg
  Cost:    $0.31 avg
  Success: 80%

── AFTER (optimized) ──
  Tokens:  23,218 avg  (-45.5%)
  Cost:    $0.22 avg   (-30.2%)
  Success: 80%

Fixes Applied: 3
MICRO-EVAL GUARD: ALL CLEAR

P99 Jitter Audit

$ agentslow benchmark examples/agent_config_baseline.yaml --jitter-audit --jitter-runs 5

═══ P99 JITTER AUDIT ═══
Runs: 5

── STABILITY ──
  Overall: STABLE
  Worst Jitter: 1.26x (p99/p50)

Production-ready: variance is within acceptable bounds.

CI/CD Integration

# Fails pipeline on CRITICAL entropy
agentslow diagnose trace.yaml --entropy --ci > junit_report.xml
echo $?  # exit code 1 on CRITICAL

JUnit XML output integrates directly with GitHub Actions, GitLab CI, Jenkins.

Key Concepts

Performance Regimes

Regime Symptom Auto-Fix
Context-Bound Low TER (<0.15), bloated context Context compaction, prompt caching
Reasoning-Bound High reasoning ratio (>0.5) Reasoning budget limits, task decomposition
Tool-Bound High tool re-entry (>0.3) Tool call batching, result caching
IO-Bound High TTA (>2000ms) Parallel tool execution, connection pooling

Novel Metrics

  • Token Efficiency Ratio (TER): output_tokens / input_tokens. Healthy: >0.15. Below that, you're paying for context the model ignores.
  • Tool Re-entry Rate: Fraction of steps that re-call the same tool. High = your agent is retrying or looping.
  • Time-to-First-Action (TTA): Milliseconds from prompt to first tool call. Measures reasoning overhead.
  • Reasoning Ratio: reasoning_tokens / total_tokens. How much compute goes to thinking vs. acting.

Context Entropy Monitor

Measures context health over long-running sessions:

  • Per-turn entropy scores (0-1)
  • Noise ratio (garbage accumulation)
  • Semantic drift via TER sliding windows
  • Compaction integrity validation

Safe Mode (Trust Architecture)

By default, agentslow shows fixes but doesn't write anything. The --safe-mode flag adds an extra preview layer; --apply is the explicit opt-in to write config patches.

# Preview only (default) — never writes files
agentslow diagnose trace.yaml --dry-run

# Safe mode — extra verbose preview with risk annotations
agentslow diagnose trace.yaml --safe-mode

# Apply — writes config patches after full safety chain
agentslow diagnose trace.yaml --apply

Six-gate safety chain: diagnose → classify → prescribe → golden-set verify → dry-run preview → human review → apply.

Auto-Fix Pipeline

  1. Diagnose → Classify regime + compute metrics
  2. Fix → Generate prescriptive fixes with config patches
  3. Dry-Run → Show diffs with SAFE/CAUTION/WARNING risk levels
  4. Fidelity → Verify fixes don't change agent behavior (15 golden cases)
  5. Human Review--safe-mode preview before any writes
  6. Apply → Machine-readable JSON config patches (explicit --apply opt-in)
  7. CI Gate → Non-zero exit on CRITICAL issues

Framework Support

  • LangGraph (primary) — native trace parser
  • MCP — tool call analysis
  • CrewAI — multi-agent traces
  • Claude Code — session analysis

Project Stats

  • 4,510+ lines across 12 modules
  • 101 tests, 15 golden fidelity cases
  • 10 atomic commits (v0.1.0 → v0.8.0)
  • CLI-first, no dashboard — the moat is auto-fixing
  • Six-gate safety chain with --safe-mode trust architecture

Quick Start

# Install
pip install agentslow

# Diagnose a trace
agentslow diagnose your_trace.yaml

# Full analysis with all features
agentslow diagnose your_trace.yaml \
  --entropy \
  --fidelity \
  --config your_config.yaml \
  --dry-run

# Safe mode — preview everything before writing
agentslow diagnose your_trace.yaml --safe-mode

# Apply fixes (explicit opt-in)
agentslow diagnose your_trace.yaml --apply

# Benchmark before/after
agentslow benchmark your_config.yaml --compare --tasks 5

# CI mode (JUnit XML + exit codes)
agentslow diagnose your_trace.yaml --entropy --ci

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

agentslow-0.8.1.tar.gz (63.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

agentslow-0.8.1-py3-none-any.whl (66.5 kB view details)

Uploaded Python 3

File details

Details for the file agentslow-0.8.1.tar.gz.

File metadata

  • Download URL: agentslow-0.8.1.tar.gz
  • Upload date:
  • Size: 63.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for agentslow-0.8.1.tar.gz
Algorithm Hash digest
SHA256 5c0785dd657e617a4e0a540fdeb61e6df1aa9cda81a7fa2aab9ec15a8c8d70aa
MD5 63c6a5705657fc77a98ca2eb61826ed1
BLAKE2b-256 b4f3d2b98fe7395b47d82e4713f573809aad074839b1cd2878b2647f24bbece8

See more details on using hashes here.

File details

Details for the file agentslow-0.8.1-py3-none-any.whl.

File metadata

  • Download URL: agentslow-0.8.1-py3-none-any.whl
  • Upload date:
  • Size: 66.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for agentslow-0.8.1-py3-none-any.whl
Algorithm Hash digest
SHA256 e046a917d46b2d8514bbda4bffb0e8d8ff97f1cfcb6b7275fdff6703afd40b98
MD5 1098c9d2efb27e8edd485daa8de52ac9
BLAKE2b-256 0abdbf6045a9d85e968233b6084612681b4981f5f874792d65a794da614865fb

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page