Skip to main content

Failure memory for AI agents — self-healing retry with structured learning

Project description

ReLoop

Failure memory for AI agents.

Every agent fails. ReLoop is the first framework that gets smarter from failure.

License: Apache 2.0 PyPI version Python 3.11+


The Problem

AI agents retry blindly -- same mistake, same failure, burning tokens and money. No framework treats failure as data. They either retry with no memory, or give up.

The Solution

ReLoop captures every failure into a structured memory graph -- error type, root cause, suggested fix, confidence score, semantic embedding -- so the next retry starts smarter. Your agents don't just recover. They get permanently smarter.


Quick Start

The reloop-ai package is officially published and ready for production use.

pip install reloop-ai
reloop init
reloop demo

Minimal Setup (No Redis Required)

ReLoop works out of the box with SQLite -- no Redis or Blaxel needed:

pip install reloop-ai
export OPENAI_API_KEY=sk-...
reloop serve

Redis and Blaxel are optional -- add them when you need production performance. For local development, ReLoop uses Docker or runs commands directly as the sandbox.


Three Ways to Use ReLoop

1. As a Library (any agent, 3 lines)

from reloop import FailureMemory

memory = FailureMemory(redis_url="redis://localhost:6379")
similar = await memory.search("ImportError sharp")  # Returns past failures + fixes

2. As a Framework (full self-healing loop)

reloop run "Fix and deploy the Next.js project at ./my-broken-app"

3. As an MCP Server (Cursor & Claude Code)

Integrate ReLoop directly into your AI IDEs so they become self-healing. When the MCP server is connected, Claude can:

  • Search Memory: Semantically search past errors across your team's history.
  • Fetch Checkpoints: Restore exact state from past Blaxel Firecracker VM checkpoints.
  • Execute Safely: Run isolated code tests within Blaxel sandboxes directly from the editor.

Add to your claude_desktop_config.json or Cursor settings:

{
  "mcpServers": {
    "reloop": {
      "command": "python",
      "args": ["-m", "reloop.mcp_server"]
    }
  }
}

Architecture

flowchart LR
    Client[Cursor / Claude] -->|MCP Protocol| Server[ReLoop MCP Server]
    Server -->|Vector Search| Redis[(Redis Agent Memory)]
    Server -->|Isolated Execution| Blaxel[Blaxel Firecracker VM]
    Redis -.->|Stores| Embeddings[Failure Embeddings]
    Blaxel -.->|25ms Resume| Checkpoints[State Checkpoints]

Redis: The Memory Backbone

ReLoop utilizes Redis as the core of its Agent Memory Server. The architecture consists of a 3-tier memory system:

  1. Working Memory: Stores the current task's session state and immediate context.
  2. Long-term Memory: A persistent failure graph using Redis Vector Search to semantically match current errors with past distilled solutions.
  3. Episodic Memory: Full execution traces and timeline records for auditing and the dashboard UI.

Blaxel: Perpetual Execution Sandboxes

For safe, deterministic task execution, ReLoop integrates Blaxel's Firecracker microVMs.

  • Perpetual State: The sandbox is never lost. You can pause and resume the exact environment.
  • 25ms Checkpoint/Restore: ReLoop creates instantaneous checkpoints after every step.
  • Time-Travel Rewinds: Hit a roadblock? Rewind the agent to a previous checkpoint in 25ms and try a different fix strategy.

The REJD Loop

The core algorithm: Retrieve -> Execute -> Judge -> Distill

flowchart TD
    A([New Task]) --> R[Retrieve\nQuery Redis for similar past failures]
    R --> E[Execute\nRun in Blaxel sandbox]
    E --> J{Judge\nSuccess or failure?}

    J -- Success --> D_OK[Distill Success\nStore solution + learnings]
    D_OK --> DONE([Task Complete])

    J -- Failure --> D_FAIL[Distill Failure\nCapture root cause, fix, confidence]
    D_FAIL --> CB{Circuit breaker\nor budget exceeded?}
    CB -- Yes --> ABANDON([Task Abandoned])
    CB -- No --> R

Powered by:

  • OpenAI Agents SDK -- orchestrates the REJD loop with handoffs between specialist agents
  • Redis -- 3-tier failure memory (working, long-term, episodic) via Agent Memory Server
  • Blaxel -- Firecracker sandbox with 25ms checkpoint/restore (optional; Docker or direct execution for local dev)

Timeline UI

The non-chat interface that makes failure learning visible.

A horizontal timeline of colored nodes tells the full story at a glance:

RED (failed) -> RED (failed) -> RED (failed) -> GREEN (succeeded)

Click any node to inspect the full failure record -- root cause, suggested fix, confidence score, cost, and the exact code diff that resolved it.


Integrations

Works with any agent framework:

  • OpenAI Agents SDK
  • LangGraph
  • CrewAI
  • Claude Agent SDK
  • Raw Python

ReLoop is the memory layer -- bring your own orchestration.


A/B: Memory vs No Memory

Metric Without Memory With Memory
Attempts to fix 4 bugs 12+ 4
Total cost $0.47 $0.18
Same mistake repeated 3x 0x

API Reference

Full API reference: docs/api-reference.md

Method Path Description
POST /v1/tasks Create and run a task
GET /v1/tasks/{id} Get task status and result
GET /v1/tasks/{id}/timeline Full execution timeline
GET /v1/tasks/{id}/sse Server-Sent Events stream
POST /v1/memories/search Semantic search over failure memory
GET /v1/memories/stats Aggregated memory statistics
GET /v1/tasks/{id}/checkpoints List sandbox checkpoints
POST /v1/tasks/{id}/checkpoints/{cid}/restore Rewind to checkpoint
POST /v1/tasks/ab-comparison Run A/B comparison (with vs without memory)
GET /v1/tasks/{id}/circuit-breaker Get circuit breaker state for a task
POST /v1/memories/predict Predict failure likelihood for new code
GET /v1/memories/export Export failure memory as JSON

Configuration

ReLoop is configured via environment variables. Only OPENAI_API_KEY is required -- everything else has sensible defaults.

Variable Required Default Description
OPENAI_API_KEY Yes -- OpenAI API key for code generation and reasoning
REDIS_URL No redis://localhost:6379 Redis connection URL (falls back to SQLite)
REDIS_MEMORY_INDEX No reloop-failures Vector index name for failure embeddings
BLAXEL_API_KEY No -- Blaxel API key for Firecracker sandboxes
BLAXEL_WORKSPACE No -- Blaxel workspace name
CODEX_MODEL No gpt-4o Chat model for planner/distiller
REASONING_MODEL No o1 Deep reasoning model for root cause analysis
FAST_MODEL No gpt-4o-mini Fast/cheap model for classification
EMBEDDING_MODEL No text-embedding-3-small Model for failure memory embeddings
API_PORT No 8000 FastAPI server port
API_HOST No 0.0.0.0 API host
NEXT_PUBLIC_API_URL No http://localhost:8000 Backend API URL for frontend
MAX_RETRIES No 5 Maximum retry attempts per task
MAX_BUDGET_USD No 1.00 Maximum cost budget per task
CIRCUIT_BREAKER_THRESHOLD No 3 Consecutive failures before circuit break

See .env.example for a copy-paste template.


Layer Technology Role
Orchestration OpenAI Agents SDK REJD loop with specialist agent handoffs
Failure Memory Redis Agent Memory Server 3-tier: working memory, long-term failure graph, episodic traces
Execution Sandbox Blaxel Firecracker microVMs Perpetual state, 25ms resume, checkpoint/restore
API FastAPI + SSE Task management, memory search, real-time streaming
Dashboard Next.js + Tailwind + shadcn/ui Timeline, failure sidebar, cost tracker

Contributing

We welcome contributions. See CONTRIBUTING.md for:

  • Development environment setup
  • Code style requirements (ruff, mypy)
  • PR process and review checklist
  • Architecture overview for new contributors

License

Apache 2.0 -- see LICENSE for the full text.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

reloop_ai-0.3.0.tar.gz (457.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

reloop_ai-0.3.0-py3-none-any.whl (104.9 kB view details)

Uploaded Python 3

File details

Details for the file reloop_ai-0.3.0.tar.gz.

File metadata

  • Download URL: reloop_ai-0.3.0.tar.gz
  • Upload date:
  • Size: 457.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.11

File hashes

Hashes for reloop_ai-0.3.0.tar.gz
Algorithm Hash digest
SHA256 57fbcedb815ccce51dee3c3896e759a554001c033612ca8645f787c7c02e906a
MD5 14eaf74fee97cd5aadcdf2d793a43f30
BLAKE2b-256 1b8c889166f48fcaa33e92184fe302010fa1f4febc5eef73d0a2170dd02e2e72

See more details on using hashes here.

File details

Details for the file reloop_ai-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: reloop_ai-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 104.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.11

File hashes

Hashes for reloop_ai-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 deb90dde250618588559507889078fc224dd35d982d1301987632c1eff32f768
MD5 ba4877b9b3d42a8d015e1c69dceebb6e
BLAKE2b-256 437f6a93b20dfabc7587796e325842d0abfeba17c70f8678bd43e09e35ad54af

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page