Automatic error recovery for AI agent sessions: detects stuck loops, rate limits, token overflows, and more.
Project description
session-doctor
Automatic error recovery for AI agent sessions.
Loop detection. Rate limit handling. Token overflow recovery. Budget guards.
Zero dependencies. Works with any AI agent framework.
The problem
AI agent sessions get stuck and stay stuck:
| Failure | Symptom | Without Session Doctor |
|---|---|---|
| Rate limit (429) | Same API error on every retry | Burns budget, hits cap |
| Token overflow | Context window exceeded | Session crashes silently |
| JSON parse error | Malformed tool call response | Agent loops forever |
| Timeout spiral | Tool call never returns | Session hangs indefinitely |
| Auth error (401) | Invalid key / expired token | 100 retries before anyone notices |
| Budget overrun | Cost cap hit | Silent death |
Session Doctor sits alongside your agent, detects these patterns, and recovers from them automatically — before they become expensive problems.
Install
pip install session-doctor
Usage
As a library
from session_doctor import SessionDoctor, Config
# Custom config (all fields optional)
config = Config(
error_repeat_count=3, # trigger recovery after 3x same error
error_window_sec=300, # within a 5-minute window
budget_cap=2.00, # halt if budget exceeds $2
max_auto_retries=2, # max recovery attempts before HALT
notifications_enabled=True,
notifications_min_severity="MEDIUM",
)
doctor = SessionDoctor(config=config)
doctor.register_session("my-run", label="coding-agent")
# Feed events from your agent's output / logs
doctor.ingest_event("my-run", "Error: 429 rate limit exceeded")
doctor.ingest_event("my-run", "Error: context length exceeded max_tokens", budget_used=0.75)
Recovery strategies by error type
| Error | Severity | Strategy |
|---|---|---|
| HTTP 400 / Bad Request | HIGH | Reset session |
| Token / context limit | HIGH | Compact context |
| JSON parse failure | MEDIUM | Retry with prompt fix |
| Timeout / ETIMEDOUT | MEDIUM | Exponential backoff |
| Repeated tool call | HIGH | Break loop + notify |
| Rate limit (429) | LOW | Exponential backoff |
| Auth error (401/403) | CRITICAL | Halt + notify |
| Budget exceeded | CRITICAL | Halt + notify |
Backoff behavior
Exponential backoff: 2s → 4s → 8s → 16s → 32s (max).
After max_auto_retries failures at any non-CRITICAL severity, escalates to HALT.
As a CLI
# Show health dashboard for all registered sessions
session-doctor status
# Show recovery history (last 20)
session-doctor report
# Inject a test error to verify behavior
session-doctor inject my-session token_limit
# Start monitoring daemon (polls every 30s)
session-doctor monitor
# Run self-contained demo
session-doctor demo
Or via Python module:
python -m session_doctor status
python -m session_doctor demo
Dashboard
╔══════════════════════════════════════════════════════════════╗
║ Session Doctor — Health Dashboard ║
╠══════════════════╦════════════╦══════════╦═════════╦════════╣
║ Session ║ Status ║ Errors ║ Budget ║ Label ║
╠══════════════════╬════════════╬══════════╬═════════╬════════╣
║ heartbeat ║ 🟢 ok ║ 0 ║ $0.12 ║ heart ║
║ session-abc123 ║ 🟡 warn ║ 2 ║ $0.34 ║ gh-iss ║
║ session-def456 ║ 🔴 error ║ 5 ║ $1.43 ║ coding ║
╚══════════════════╩════════════╩══════════╩═════════╩════════╝
Last check: 2026-03-05 09:00:00 PST
State
Session Doctor persists all events to a local SQLite database:
~/.openclaw/workspace/projects/session-doctor/state.db— sessions, errors, recoveries~/.openclaw/workspace/projects/session-doctor/session_doctor.log— event log~/.openclaw/workspace/projects/session-doctor/notifications.log— alert history
Advanced: components
from session_doctor import (
Detector, # classify raw text → error_type
Recoverer, # execute recovery strategies
StateStore, # SQLite persistence
Notifier, # structured alert delivery
ERROR_PATTERNS, # regex patterns dict
ERROR_POLICY, # error_type → (Severity, Strategy)
Severity,
Strategy,
SessionStatus,
)
# Custom detector usage
detector = Detector()
error_type = detector.classify("Error: 429 Too Many Requests")
# → "rate_limit"
# Direct policy lookup
severity, strategy = ERROR_POLICY["token_limit"]
# → Severity.HIGH, Strategy.COMPACT_CONTEXT
Integration
Session Doctor is designed to complement, not replace, existing tools:
- agent-watchdog — loop detection + circuit breaker (run both together)
- agent-budget-guard — budget tracking skill (Session Doctor can subscribe to budget events)
Development
git clone https://github.com/woodwater2026/session-doctor
cd session-doctor
pip install -e ".[dev]"
pytest tests/ -v
Roadmap
- Real Watchdog hook integration
- Context compaction via OpenClaw runtime API
- Telegram notification via
messagetool - Session process control (SIGTERM / restart via OpenClaw CLI)
- Config file loading (
config.yaml) - Budget Guard event subscription
License
MIT © 2026 Water Woods
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file session_doctor-0.1.1.tar.gz.
File metadata
- Download URL: session_doctor-0.1.1.tar.gz
- Upload date:
- Size: 20.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
51075fabe7d83065d058346b26425bdd53722c1b568c50b65daecc78337fb7fe
|
|
| MD5 |
331ab0b6c6af648b0a9766ad4d18e841
|
|
| BLAKE2b-256 |
9ce4a1755477ecff4371b7df499c14c944f165138ff8e96df769dabfd2544732
|
File details
Details for the file session_doctor-0.1.1-py3-none-any.whl.
File metadata
- Download URL: session_doctor-0.1.1-py3-none-any.whl
- Upload date:
- Size: 13.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0246015089fe586f9bdb7a21a77c57daaa73050980fc3e6d0daa30fd57ab2092
|
|
| MD5 |
8b02d888d623c4aa97f26eddeafa7254
|
|
| BLAKE2b-256 |
eb3eeb1cd773089ca5546b1854a65d12ffbfc57a4194ac166e7f24ff0d76ae70
|