Diagnose and auto-fix AI agent performance bottlenecks.

These details have not been verified by PyPI

Project links

Project description

agentslow

AI agent diagnostics CLI — diagnose, benchmark, and auto-fix AI agent performance.

Part of A.I Shovels — tools that dig into AI infrastructure problems.

pip install agentslow

What it does

One command tells you why your AI agent is slow, expensive, or unreliable — and fixes it.

agentslow diagnose trace.yaml

agentslow classifies your agent into one of four performance regimes (Context-Bound, Reasoning-Bound, Tool-Bound, IO-Bound), computes novel metrics like Token Efficiency Ratio and Tool Re-entry Rate, then prescribes auto-applicable fixes with machine-readable config patches.

Actual CLI Output (not marketing — run it yourself)

$ agentslow diagnose examples/openclaw_research_agent.yaml --entropy --fidelity \
    --config examples/agent_config_baseline.yaml --dry-run

═══ agentslow v0.8.0 ═══
Agent: openclaw_research_v2 (langgraph)
Task: Research competitor pricing for SaaS product and compile report
Status: ✓ Success

── REGIME CLASSIFICATION ──
  Primary: CONTEXT-BOUND
  Confidence: 90%

── KEY METRICS ──
  Token Efficiency Ratio (TER): 0.0147
  Tool Re-entry Rate:           0.2500
  Time-to-First-Action:         1800ms
  Reasoning Ratio:              0.0983
  Total Cost:                   $0.5403
  Total Duration:               32850ms

── PRESCRIPTIVE FIXES ──
  1. [CRITICAL] [AUTO-FIX] Implement context compaction (summarization)
     Token Efficiency Ratio is 0.015 (healthy: >0.15). Your context is bloated
     with irrelevant tokens. Add a summarization step every N turns to compress
     conversation history.
     → Expected: 50-70% token reduction, major cost savings

  2. [HIGH] [AUTO-FIX] Enable prompt/prefix caching
     Total input tokens: 110,100. Enable prefix caching to avoid re-processing
     the same system prompt on every LLM call.
     → Expected: 30-50% latency reduction on repeated calls

  3. [MEDIUM] Tune RAG retrieval — retrieve less, retrieve better
     You're likely stuffing too many documents into context. Reduce top_k, add
     re-ranking, or switch to semantic chunking.
     → Expected: Fewer tokens = lower cost + faster inference

═══ CONTEXT ENTROPY ANALYSIS ═══
Session: openclaw_research_v2
Total Turns: 7

── ENTROPY METRICS ──
  Average Entropy:         0.9374
  Max Entropy:             1.0000
  Entropy Trend:           INCREASING
  Semantic Drift Ratio:    0.1034
  Noise Ratio:             0.2857
  Compaction Integrity:    1.0000

── VERDICT ──
  ✗ CRITICAL — Context is critical

═══ DRY-RUN: Implement context compaction (summarization) ═══
  context_compaction.enabled: false → true  [CAUTION]
  context_compaction.strategy: (none) → summarize_every_n  [CAUTION]
  context_compaction.n_turns: (none) → 5  [CAUTION]
  Rollback: agentslow rollback --fix-id context-001

═══ DRY-RUN: Enable prompt/prefix caching ═══
  enable_prompt_caching: false → true  [SAFE]
  Rollback: agentslow rollback --fix-id context-002

═══ GOLDEN SET FIDELITY TEST ═══
Tests Run: 15
Passed: 15 | Failed: 0
Overall Fidelity: 1.0000

── VERDICT ──
  ✓ PASSED — Safe to apply in production.

That's one command. Regime classification + metrics + fixes + entropy audit + dry-run diffs + fidelity verification (15 golden cases covering all 4 regimes).

Benchmark: Before/After Proof

$ agentslow benchmark examples/agent_config_baseline.yaml --compare --tasks 5

═══ BENCHMARK COMPARISON ═══
Tasks: 5

── BEFORE (baseline) ──
  Tokens:  42,576 avg
  Cost:    $0.31 avg
  Success: 80%

── AFTER (optimized) ──
  Tokens:  23,218 avg  (-45.5%)
  Cost:    $0.22 avg   (-30.2%)
  Success: 80%

Fixes Applied: 3
MICRO-EVAL GUARD: ALL CLEAR

P99 Jitter Audit

$ agentslow benchmark examples/agent_config_baseline.yaml --jitter-audit --jitter-runs 5

═══ P99 JITTER AUDIT ═══
Runs: 5

── STABILITY ──
  Overall: STABLE
  Worst Jitter: 1.26x (p99/p50)

Production-ready: variance is within acceptable bounds.

CI/CD Integration

# Fails pipeline on CRITICAL entropy
agentslow diagnose trace.yaml --entropy --ci > junit_report.xml
echo $?  # exit code 1 on CRITICAL

JUnit XML output integrates directly with GitHub Actions, GitLab CI, Jenkins.

Key Concepts

Performance Regimes

Regime	Symptom	Auto-Fix
Context-Bound	Low TER (<0.15), bloated context	Context compaction, prompt caching
Reasoning-Bound	High reasoning ratio (>0.5)	Reasoning budget limits, task decomposition
Tool-Bound	High tool re-entry (>0.3)	Tool call batching, result caching
IO-Bound	High TTA (>2000ms)	Parallel tool execution, connection pooling

Novel Metrics

Token Efficiency Ratio (TER): output_tokens / input_tokens. Healthy: >0.15. Below that, you're paying for context the model ignores.
Tool Re-entry Rate: Fraction of steps that re-call the same tool. High = your agent is retrying or looping.
Time-to-First-Action (TTA): Milliseconds from prompt to first tool call. Measures reasoning overhead.
Reasoning Ratio: reasoning_tokens / total_tokens. How much compute goes to thinking vs. acting.

Context Entropy Monitor

Measures context health over long-running sessions:

Per-turn entropy scores (0-1)
Noise ratio (garbage accumulation)
Semantic drift via TER sliding windows
Compaction integrity validation

Safe Mode (Trust Architecture)

By default, agentslow shows fixes but doesn't write anything. The --safe-mode flag adds an extra preview layer; --apply is the explicit opt-in to write config patches.

# Preview only (default) — never writes files
agentslow diagnose trace.yaml --dry-run

# Safe mode — extra verbose preview with risk annotations
agentslow diagnose trace.yaml --safe-mode

# Apply — writes config patches after full safety chain
agentslow diagnose trace.yaml --apply

Six-gate safety chain: diagnose → classify → prescribe → golden-set verify → dry-run preview → human review → apply.

Auto-Fix Pipeline

Diagnose → Classify regime + compute metrics
Fix → Generate prescriptive fixes with config patches
Dry-Run → Show diffs with SAFE/CAUTION/WARNING risk levels
Fidelity → Verify fixes don't change agent behavior (15 golden cases)
Human Review → --safe-mode preview before any writes
Apply → Machine-readable JSON config patches (explicit --apply opt-in)
CI Gate → Non-zero exit on CRITICAL issues

Framework Support

LangGraph (primary) — native trace parser
MCP — tool call analysis
CrewAI — multi-agent traces
Claude Code — session analysis

Project Stats

4,510+ lines across 12 modules
101 tests, 15 golden fidelity cases
10 atomic commits (v0.1.0 → v0.8.0)
CLI-first, no dashboard — the moat is auto-fixing
Six-gate safety chain with --safe-mode trust architecture

Quick Start

# Install
pip install agentslow

# Diagnose a trace
agentslow diagnose your_trace.yaml

# Full analysis with all features
agentslow diagnose your_trace.yaml \
  --entropy \
  --fidelity \
  --config your_config.yaml \
  --dry-run

# Safe mode — preview everything before writing
agentslow diagnose your_trace.yaml --safe-mode

# Apply fixes (explicit opt-in)
agentslow diagnose your_trace.yaml --apply

# Benchmark before/after
agentslow benchmark your_config.yaml --compare --tasks 5

# CI mode (JUnit XML + exit codes)
agentslow diagnose your_trace.yaml --entropy --ci

License

MIT

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.8.1

Apr 12, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

agentslow-0.8.1.tar.gz (63.3 kB view details)

Uploaded Apr 12, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

agentslow-0.8.1-py3-none-any.whl (66.5 kB view details)

Uploaded Apr 12, 2026 Python 3

File details

Details for the file agentslow-0.8.1.tar.gz.

File metadata

Download URL: agentslow-0.8.1.tar.gz
Upload date: Apr 12, 2026
Size: 63.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for agentslow-0.8.1.tar.gz
Algorithm	Hash digest
SHA256	`5c0785dd657e617a4e0a540fdeb61e6df1aa9cda81a7fa2aab9ec15a8c8d70aa`
MD5	`63c6a5705657fc77a98ca2eb61826ed1`
BLAKE2b-256	`b4f3d2b98fe7395b47d82e4713f573809aad074839b1cd2878b2647f24bbece8`

See more details on using hashes here.

File details

Details for the file agentslow-0.8.1-py3-none-any.whl.

File metadata

Download URL: agentslow-0.8.1-py3-none-any.whl
Upload date: Apr 12, 2026
Size: 66.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for agentslow-0.8.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`e046a917d46b2d8514bbda4bffb0e8d8ff97f1cfcb6b7275fdff6703afd40b98`
MD5	`1098c9d2efb27e8edd485daa8de52ac9`
BLAKE2b-256	`0abdbf6045a9d85e968233b6084612681b4981f5f874792d65a794da614865fb`

See more details on using hashes here.

agentslow 0.8.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

agentslow

What it does

Actual CLI Output (not marketing — run it yourself)

Benchmark: Before/After Proof

P99 Jitter Audit

CI/CD Integration

Key Concepts

Performance Regimes

Novel Metrics

Context Entropy Monitor

Safe Mode (Trust Architecture)

Auto-Fix Pipeline

Framework Support

Project Stats

Quick Start

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes