Skip to main content

Automatic error recovery for AI agent sessions: detects stuck loops, rate limits, token overflows, and more.

Project description

session-doctor

PyPI version License: MIT

Automatic error recovery for AI agent sessions.

Loop detection. Rate limit handling. Token overflow recovery. Budget guards.

Zero dependencies. Works with any AI agent framework.


The problem

AI agent sessions get stuck and stay stuck:

Failure Symptom Without Session Doctor
Rate limit (429) Same API error on every retry Burns budget, hits cap
Token overflow Context window exceeded Session crashes silently
JSON parse error Malformed tool call response Agent loops forever
Timeout spiral Tool call never returns Session hangs indefinitely
Auth error (401) Invalid key / expired token 100 retries before anyone notices
Budget overrun Cost cap hit Silent death

Session Doctor sits alongside your agent, detects these patterns, and recovers from them automatically — before they become expensive problems.


Install

pip install session-doctor

Usage

As a library

from session_doctor import SessionDoctor, Config

# Custom config (all fields optional)
config = Config(
    error_repeat_count=3,      # trigger recovery after 3x same error
    error_window_sec=300,      # within a 5-minute window
    budget_cap=2.00,           # halt if budget exceeds $2
    max_auto_retries=2,        # max recovery attempts before HALT
    notifications_enabled=True,
    notifications_min_severity="MEDIUM",
)

doctor = SessionDoctor(config=config)
doctor.register_session("my-run", label="coding-agent")

# Feed events from your agent's output / logs
doctor.ingest_event("my-run", "Error: 429 rate limit exceeded")
doctor.ingest_event("my-run", "Error: context length exceeded max_tokens", budget_used=0.75)

Recovery strategies by error type

Error Severity Strategy
HTTP 400 / Bad Request HIGH Reset session
Token / context limit HIGH Compact context
JSON parse failure MEDIUM Retry with prompt fix
Timeout / ETIMEDOUT MEDIUM Exponential backoff
Repeated tool call HIGH Break loop + notify
Rate limit (429) LOW Exponential backoff
Auth error (401/403) CRITICAL Halt + notify
Budget exceeded CRITICAL Halt + notify

Backoff behavior

Exponential backoff: 2s → 4s → 8s → 16s → 32s (max).
After max_auto_retries failures at any non-CRITICAL severity, escalates to HALT.

As a CLI

# Show health dashboard for all registered sessions
session-doctor status

# Show recovery history (last 20)
session-doctor report

# Inject a test error to verify behavior
session-doctor inject my-session token_limit

# Start monitoring daemon (polls every 30s)
session-doctor monitor

# Run self-contained demo
session-doctor demo

Or via Python module:

python -m session_doctor status
python -m session_doctor demo

Dashboard

╔══════════════════════════════════════════════════════════════╗
║             Session Doctor — Health Dashboard                ║
╠══════════════════╦════════════╦══════════╦═════════╦════════╣
║ Session          ║ Status     ║ Errors   ║ Budget  ║ Label  ║
╠══════════════════╬════════════╬══════════╬═════════╬════════╣
║ heartbeat        ║ 🟢 ok      ║ 0        ║ $0.12   ║ heart  ║
║ session-abc123   ║ 🟡 warn    ║ 2        ║ $0.34   ║ gh-iss ║
║ session-def456   ║ 🔴 error   ║ 5        ║ $1.43   ║ coding ║
╚══════════════════╩════════════╩══════════╩═════════╩════════╝

Last check: 2026-03-05 09:00:00 PST

State

Session Doctor persists all events to a local SQLite database:

  • ~/.openclaw/workspace/projects/session-doctor/state.db — sessions, errors, recoveries
  • ~/.openclaw/workspace/projects/session-doctor/session_doctor.log — event log
  • ~/.openclaw/workspace/projects/session-doctor/notifications.log — alert history

Advanced: components

from session_doctor import (
    Detector,       # classify raw text → error_type
    Recoverer,      # execute recovery strategies
    StateStore,     # SQLite persistence
    Notifier,       # structured alert delivery
    ERROR_PATTERNS, # regex patterns dict
    ERROR_POLICY,   # error_type → (Severity, Strategy)
    Severity,
    Strategy,
    SessionStatus,
)

# Custom detector usage
detector = Detector()
error_type = detector.classify("Error: 429 Too Many Requests")
# → "rate_limit"

# Direct policy lookup
severity, strategy = ERROR_POLICY["token_limit"]
# → Severity.HIGH, Strategy.COMPACT_CONTEXT

Integration

Session Doctor is designed to complement, not replace, existing tools:

  • agent-watchdog — loop detection + circuit breaker (run both together)
  • agent-budget-guard — budget tracking skill (Session Doctor can subscribe to budget events)

Development

git clone https://github.com/woodwater2026/session-doctor
cd session-doctor
pip install -e ".[dev]"
pytest tests/ -v

Roadmap

  • Real Watchdog hook integration
  • Context compaction via OpenClaw runtime API
  • Telegram notification via message tool
  • Session process control (SIGTERM / restart via OpenClaw CLI)
  • Config file loading (config.yaml)
  • Budget Guard event subscription

License

MIT © 2026 Water Woods

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

session_doctor-0.1.1.tar.gz (20.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

session_doctor-0.1.1-py3-none-any.whl (13.4 kB view details)

Uploaded Python 3

File details

Details for the file session_doctor-0.1.1.tar.gz.

File metadata

  • Download URL: session_doctor-0.1.1.tar.gz
  • Upload date:
  • Size: 20.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for session_doctor-0.1.1.tar.gz
Algorithm Hash digest
SHA256 51075fabe7d83065d058346b26425bdd53722c1b568c50b65daecc78337fb7fe
MD5 331ab0b6c6af648b0a9766ad4d18e841
BLAKE2b-256 9ce4a1755477ecff4371b7df499c14c944f165138ff8e96df769dabfd2544732

See more details on using hashes here.

File details

Details for the file session_doctor-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: session_doctor-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 13.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for session_doctor-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 0246015089fe586f9bdb7a21a77c57daaa73050980fc3e6d0daa30fd57ab2092
MD5 8b02d888d623c4aa97f26eddeafa7254
BLAKE2b-256 eb3eeb1cd773089ca5546b1854a65d12ffbfc57a4194ac166e7f24ff0d76ae70

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page