Skip to main content

Self-healing agent middleware that intercepts errors and injects recovery prompts

Project description

ErrorRecovery-Engine

License: MIT Python 3.10+ Tests

Self-healing agent middleware that intercepts agent errors and injects recovery prompts. Drop-in, works with any agent.

How It Works

Agent Error → Classify → Pattern Match → Inject Recovery Prompt → Retry → Verify
                                    ↓ (no match)
                              LLM Fallback Prompt
  1. Classify the error into one of 9 categories (bash, edit, read, write, test, network, import, type, unknown)
  2. Match against a database of recovery patterns using FAISS semantic search
  3. Inject the recovery prompt into the agent's next execution
  4. Verify the recovery succeeded; retry with exponential backoff if needed

Installation

pip install error-recovery

With server support:

pip install error-recovery[server]

From source:

cd error-recovery
pip install -e ".[dev]"

Quick Start

As Middleware (Drop-in)

from error_recovery import ErrorRecovery

class MyAgent:
    def run(self, prompt: str, **kwargs) -> str:
        # Your agent logic here
        return "result"

agent = MyAgent()

# Use as context manager
with ErrorRecovery(agent) as mw:
    result = agent.run("do something risky")

# Or create middleware directly
from error_recovery import ErrorRecoveryMiddleware

mw = ErrorRecoveryMiddleware(agent)
wrapped_tool = mw.wrap_tool_call(my_function, tool_name="my_tool")
result = wrapped_tool(args)

As Engine (Programmatic)

from error_recovery import ErrorRecoveryEngine, ErrorRecoveryConfig

config = ErrorRecoveryConfig(max_attempts=3, similarity_threshold=0.8)
engine = ErrorRecoveryEngine(config=config)

result = engine.recover_sync(
    error_message="bash: command not found: xyz",
    context="Running shell command",
    tool_name="bash",
)

print(f"Category: {result.error_category.value}")
print(f"Recovery: {result.recovery_prompt}")
print(f"Success: {result.success}")

As CLI

# Test recovery on an error message
error-recovery recover --error "command not found: xyz" --tool bash

# Analyze error patterns in a trace file
error-recovery analyze trace.jsonl

# Build a FAISS index from pattern data
error-recovery build-index --data ./patterns/ --output ./my_index/

# Start the recovery API server
error-recovery serve --port 8000

As API Server

pip install error-recovery[server]
error-recovery serve --port 8000
# Recover from an error
curl -X POST http://localhost:8000/recover \
  -H "Content-Type: application/json" \
  -d '{"error_message": "command not found: xyz", "tool_name": "bash"}'

# Health check
curl http://localhost:8000/health

# Stats
curl http://localhost:8000/stats

Error Categories

Category Description Examples
BASH_ERROR Shell/command errors command not found, permission denied, exit code ≠ 0
EDIT_ERROR File editing errors pattern not matched, multiple matches
READ_ERROR File read errors file not found, permission denied
WRITE_ERROR File write errors permission denied, disk full
TEST_ERROR Test failures assertion failed, timeout, fixture not found
NETWORK_ERROR Network errors connection refused, DNS failure, SSL errors
IMPORT_ERROR Import/dependency errors ModuleNotFoundError, circular import
TYPE_ERROR Type/value errors TypeError, KeyError, AttributeError
UNKNOWN Unclassified errors -

Configuration

from error_recovery import ErrorRecoveryConfig

config = ErrorRecoveryConfig(
    max_attempts=3,              # Max recovery attempts
    similarity_threshold=0.8,     # Minimum similarity for pattern match
    fallback_to_llm=True,        # Use LLM fallback when no pattern matches
    backoff_base=2.0,            # Exponential backoff base
    backoff_max=30.0,            # Max backoff seconds
    cooling_period_seconds=0.0,  # Cooldown between retries
    model_name="all-MiniLM-L6-v2",  # Sentence transformer model
    top_k=5,                     # Number of pattern matches to return
    track_success_rates=True,     # Track pattern success rates
)

Pattern Matching

The engine uses sentence-transformers for semantic similarity and FAISS for fast nearest-neighbor search. Each pattern has:

  • error_type — Category (e.g., bash_error)
  • pattern — Regex pattern for fast exact matching
  • error_message — Description for semantic matching
  • recovery_prompt — The prompt to inject for recovery
  • success_rate — Historical success rate (0.0–1.0)
  • tags — Searchable tags

Built-in Patterns

  • bash_errors.json — 53 common bash error patterns
  • edit_errors.json — 15 file editing patterns
  • test_errors.json — 15 test failure patterns
  • general.json — 25 network, import, type, and general patterns

Custom Patterns

Add your own patterns as JSON files in the patterns directory:

[
  {
    "error_type": "bash_error",
    "error_message": "terraform apply failed",
    "pattern": "terraform.*apply.*failed",
    "recovery_prompt": "Terraform apply failed. Check: (1) run 'terraform plan' first, (2) check for state drift, (3) verify provider credentials.",
    "success_rate": 0.70,
    "tags": ["terraform", "iac"]
  }
]

Load them:

from error_recovery import PatternMatcher

matcher = PatternMatcher()
matcher.load_patterns("/path/to/custom_patterns/")
matcher.build_index()

Callbacks

def on_recovery(result: RecoveryResult):
    print(f"Recovered: {result.original_error}{result.recovery_prompt[:50]}")

def on_failure(error: str):
    print(f"Failed to recover: {error}")

def on_success(tool_name: str):
    print(f"Tool {tool_name} succeeded after recovery")

mw = ErrorRecoveryMiddleware(
    on_recovery=on_recovery,
    on_failure=on_failure,
    on_success=on_success,
)

Architecture

error_recovery/
├── models.py              # Pydantic models (ErrorPattern, RecoveryResult, Config)
├── error_classifier.py    # Regex + keyword error classification
├── pattern_matcher.py     # FAISS + sentence-transformers semantic matching
├── recovery_engine.py     # Core recovery logic (classify → match → inject → verify)
├── middleware.py           # Drop-in agent middleware wrapper
├── cli.py                 # CLI: recover, analyze, build-index, serve
└── patterns/               # Built-in recovery patterns
    ├── bash_errors.json    # 53 patterns
    ├── edit_errors.json    # 15 patterns
    ├── test_errors.json    # 15 patterns
    └── general.json         # 25 patterns

Running Tests

pip install -e ".[dev]"
pytest tests/ -v

License

MIT

Ecosystem

Part of the FableForge ecosystem — 21 open-source projects built from 210K real agent traces:

Project Description
Anvil Self-verified coding agent
VerifyLoop Plan→Execute→Verify→Recover framework
ErrorRecovery Self-healing middleware (3,725 error patterns)
FableForge-14B The fine-tuned 14B model (4-stage training)
ShellWhisperer 1.5B edge agent (phone/RPi, 50ms)
ReasonCritic Verification model (130 benchmark tasks)
TraceCompiler Compile traces → LoRA skills
AgentRuntime Persistent agent daemon (systemd for AI)
AgentSwarm Multi-agent from real trace transitions
AgentTelemetry Datadog for agents (token tracking, costs)
BenchAgent HumanEval for tool-use (107 tasks)
AgentDev VSCode extension with verification
TraceViz Trace replay visualizer (Next.js)
AgentSkills npm for agent behaviors
AgentCurriculum 5-stage progressive training
AgentFuzzer Adversarial testing for agents
AgentConstitution Safety guardrails from traces
CostOptimizer Token cost reduction (50-80%)
AgentProfiler Behavioral fingerprinting
TrajectoryDistiller Trace→training data pipeline
Fable5-Dataset HuggingFace dataset release

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

error_recovery-0.1.0.tar.gz (32.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

error_recovery-0.1.0-py3-none-any.whl (35.4 kB view details)

Uploaded Python 3

File details

Details for the file error_recovery-0.1.0.tar.gz.

File metadata

  • Download URL: error_recovery-0.1.0.tar.gz
  • Upload date:
  • Size: 32.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for error_recovery-0.1.0.tar.gz
Algorithm Hash digest
SHA256 dc7cd9adc715d1406703f9d2e52544ecafa1c398ada64938c25ee94d0f025134
MD5 b7c5d4f02184e91df02463aaea5b4435
BLAKE2b-256 d57b1f07524fff3b6dc91e278c8efb48026400a802d1569c0af78a7758b9336a

See more details on using hashes here.

File details

Details for the file error_recovery-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: error_recovery-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 35.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for error_recovery-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 11d5cb3b3d6a34c3eb8e2e88b1613cd07ef24a161c13bb58ed7124199b691827
MD5 edcedcc6cc8af20c95c1178dde3fb5b7
BLAKE2b-256 636ea2f33510e37f87fcb727438d7e76235577b33be27cea2448f114745ed38c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page