Skip to main content

Failure memory for AI agents — self-healing retry with structured learning

Project description

ReLoop

Failure memory for AI agents.

Every agent fails. ReLoop is the first framework that gets smarter from failure.

License: Apache 2.0 Python 3.11+ Build GitHub stars


The Problem

AI agents retry blindly -- same mistake, same failure, burning tokens and money. No framework treats failure as data. They either retry with no memory, or give up.

The Solution

ReLoop captures every failure into a structured memory graph -- error type, root cause, suggested fix, confidence score, semantic embedding -- so the next retry starts smarter. Your agents don't just recover. They get permanently smarter.


Quick Start

pip install reloop
reloop init
reloop demo

Three Ways to Use ReLoop

1. As a Library (any agent, 3 lines)

from reloop import FailureMemory

memory = FailureMemory(redis_url="redis://localhost:6379")
similar = await memory.search("ImportError sharp")  # Returns past failures + fixes

2. As a Framework (full self-healing loop)

reloop run "Fix and deploy the Next.js project at ./my-broken-app"

3. As an MCP Server (Claude Code / Cursor)

{
  "mcpServers": {
    "reloop": {
      "command": "python",
      "args": ["-m", "src.mcp_server"]
    }
  }
}

How It Works

The REJD Loop: Retrieve -> Execute -> Judge -> Distill

flowchart TD
    A([New Task]) --> R[Retrieve\nQuery Redis for similar past failures]
    R --> E[Execute\nRun in Blaxel sandbox]
    E --> J{Judge\nSuccess or failure?}

    J -- Success --> D_OK[Distill Success\nStore solution + learnings]
    D_OK --> DONE([Task Complete])

    J -- Failure --> D_FAIL[Distill Failure\nCapture root cause, fix, confidence]
    D_FAIL --> CB{Circuit breaker\nor budget exceeded?}
    CB -- Yes --> ABANDON([Task Abandoned])
    CB -- No --> R

Powered by:

  • OpenAI Agents SDK -- orchestrates the REJD loop with handoffs between specialist agents
  • Redis -- 3-tier failure memory (working, long-term, episodic)
  • Blaxel -- Firecracker sandbox with 25ms checkpoint/restore

Timeline UI

The non-chat interface that makes failure learning visible.

A horizontal timeline of colored nodes tells the full story at a glance:

RED (failed) -> RED (failed) -> RED (failed) -> GREEN (succeeded)

Click any node to inspect the full failure record -- root cause, suggested fix, confidence score, cost, and the exact code diff that resolved it.


Integrations

Works with any agent framework:

  • OpenAI Agents SDK
  • LangGraph
  • CrewAI
  • Claude Agent SDK
  • Raw Python

ReLoop is the memory layer -- bring your own orchestration.


A/B: Memory vs No Memory

Metric Without Memory With Memory
Attempts to fix 4 bugs 12+ 4
Total cost $0.47 $0.18
Same mistake repeated 3x 0x

API Reference

Full API reference: docs/api-reference.md

Method Path Description
POST /v1/tasks Create and run a task
GET /v1/tasks/{id} Get task status and result
GET /v1/tasks/{id}/timeline Full execution timeline
GET /v1/tasks/{id}/sse Server-Sent Events stream
POST /v1/memories/search Semantic search over failure memory
GET /v1/memories/stats Aggregated memory statistics
GET /v1/tasks/{id}/checkpoints List sandbox checkpoints
POST /v1/tasks/{id}/checkpoints/{cid}/restore Rewind to checkpoint

Architecture

+-------------------+     +-------------------+     +-------------------+
|  OpenAI Agents    |     |  Redis Agent      |     |  Blaxel           |
|  SDK              |     |  Memory Server    |     |  Firecracker VMs  |
|                   |     |                   |     |                   |
|  Orchestrates the |     |  3-tier memory:   |     |  Perpetual state  |
|  REJD loop with   |<--->|  - Working        |     |  25ms resume      |
|  specialist agent |     |  - Long-term      |<--->|  Checkpoint/       |
|  handoffs         |     |  - Episodic       |     |  restore          |
+-------------------+     +-------------------+     +-------------------+
         |                         |                         |
         v                         v                         v
+---------------------------------------------------------------+
|                     FastAPI + SSE                              |
|  Task management, memory search, checkpoints, streaming       |
+---------------------------------------------------------------+
         |
         v
+---------------------------------------------------------------+
|               Next.js Dashboard                               |
|  Timeline UI, failure sidebar, live output, cost tracker      |
+---------------------------------------------------------------+
Layer Technology Role
Orchestration OpenAI Agents SDK REJD loop with specialist agent handoffs
Failure Memory Redis Agent Memory Server 3-tier: working memory, long-term failure graph, episodic traces
Execution Sandbox Blaxel Firecracker microVMs Perpetual state, 25ms resume, checkpoint/restore
API FastAPI + SSE Task management, memory search, real-time streaming
Dashboard Next.js + Tailwind + shadcn/ui Timeline, failure sidebar, cost tracker

Contributing

We welcome contributions. See CONTRIBUTING.md for:

  • Development environment setup
  • Code style requirements (ruff, mypy)
  • PR process and review checklist
  • Architecture overview for new contributors

License

Apache 2.0 -- see LICENSE for the full text.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

reloop_ai-0.1.0.tar.gz (410.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

reloop_ai-0.1.0-py3-none-any.whl (86.4 kB view details)

Uploaded Python 3

File details

Details for the file reloop_ai-0.1.0.tar.gz.

File metadata

  • Download URL: reloop_ai-0.1.0.tar.gz
  • Upload date:
  • Size: 410.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.11

File hashes

Hashes for reloop_ai-0.1.0.tar.gz
Algorithm Hash digest
SHA256 0a885b8a10cfae1314e7c417be9e8822f53657e241c2a7af9c0aed0b6ffc0b9a
MD5 57849668958283958bdb07c2e7d3b7d8
BLAKE2b-256 79dcb83a1ebf30c7224a7afac90002c222e98cc63a23da28c48cdb9285aa73d5

See more details on using hashes here.

File details

Details for the file reloop_ai-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: reloop_ai-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 86.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.11

File hashes

Hashes for reloop_ai-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 0b0827bdfc951a3556bfa04fa65614971e7f8382ef1770c92ee79dc0d7e50054
MD5 a467bb508355a296f2d14480b5ea88a1
BLAKE2b-256 52932a8a9e2e315a49c96506691aa33fdb60166392b21b9ab890c7472f495ec6

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page