Failure memory for AI agents — self-healing retry with structured learning

These details have not been verified by PyPI

Project links

Project description

ReLoop

Failure memory for AI agents.

Every agent fails. ReLoop is the first framework that gets smarter from failure.

The Problem

AI agents retry blindly -- same mistake, same failure, burning tokens and money. No framework treats failure as data. They either retry with no memory, or give up.

The Solution

ReLoop captures every failure into a structured memory graph -- error type, root cause, suggested fix, confidence score, semantic embedding -- so the next retry starts smarter. Your agents don't just recover. They get permanently smarter.

Quick Start

The reloop-ai package is officially published and ready for production use.

pip install reloop-ai
reloop init
reloop demo

Minimal Setup (No Redis Required)

ReLoop works out of the box with SQLite -- no Redis or Blaxel needed:

pip install reloop-ai
export OPENAI_API_KEY=sk-...
reloop serve

Redis and Blaxel are optional -- add them when you need production performance. For local development, ReLoop uses Docker or runs commands directly as the sandbox.

Three Ways to Use ReLoop

1. As a Library (any agent, 3 lines)

from reloop import FailureMemory

memory = FailureMemory(redis_url="redis://localhost:6379")
similar = await memory.search("ImportError sharp")  # Returns past failures + fixes

2. As a Framework (full self-healing loop)

reloop run "Fix and deploy the Next.js project at ./my-broken-app"

3. As an MCP Server (Cursor & Claude Code)

Integrate ReLoop directly into your AI IDEs so they become self-healing. When the MCP server is connected, Claude can:

Search Memory: Semantically search past errors across your team's history.
Fetch Checkpoints: Restore exact state from past Blaxel Firecracker VM checkpoints.
Execute Safely: Run isolated code tests within Blaxel sandboxes directly from the editor.

Add to your claude_desktop_config.json or Cursor settings:

{
  "mcpServers": {
    "reloop": {
      "command": "python",
      "args": ["-m", "reloop.mcp_server"]
    }
  }
}

Architecture

┌─────────────────────┐   MCP Protocol    ┌──────────────────────┐
│  Cursor / Claude    │ ───────────────►  │  ReLoop MCP Server   │
└─────────────────────┘                   └──────────┬───────────┘
                                                     │
                               ┌─────────────────────┴──────────────────────┐
                               │                                             │
                               ▼  Vector Search                              ▼  Isolated Execution
                    ┌──────────────────────┐                    ┌──────────────────────┐
                    │  Redis Agent Memory  │                    │  Blaxel Firecracker  │
                    └──────────┬───────────┘                    └──────────┬───────────┘
                               │ stores                                    │ 25ms resume
                               ▼                                           ▼
                    ┌──────────────────────┐                    ┌──────────────────────┐
                    │  Failure Embeddings  │                    │   State Checkpoints  │
                    └──────────────────────┘                    └──────────────────────┘

Redis: The Memory Backbone

ReLoop utilizes Redis as the core of its Agent Memory Server. The architecture consists of a 3-tier memory system:

Working Memory: Stores the current task's session state and immediate context.
Long-term Memory: A persistent failure graph using Redis Vector Search to semantically match current errors with past distilled solutions.
Episodic Memory: Full execution traces and timeline records for auditing and the dashboard UI.

Blaxel: Perpetual Execution Sandboxes

For safe, deterministic task execution, ReLoop integrates Blaxel's Firecracker microVMs.

Perpetual State: The sandbox is never lost. You can pause and resume the exact environment.
25ms Checkpoint/Restore: ReLoop creates instantaneous checkpoints after every step.
Time-Travel Rewinds: Hit a roadblock? Rewind the agent to a previous checkpoint in 25ms and try a different fix strategy.

The REJD Loop

The core algorithm: Retrieve -> Execute -> Judge -> Distill

                           ┌─────────────┐
                           │  New Task   │
                           └──────┬──────┘
                                  │
                                  ▼
                    ┌─────────────────────────────┐
                    │           Retrieve           │
                    │  Query Redis for similar     │
                    │      past failures           │
                    └──────────────┬──────────────┘
                                   │
                                   ▼
                    ┌─────────────────────────────┐
                    │           Execute            │
                    │    Run in Blaxel sandbox     │
                    └──────────────┬──────────────┘
                                   │
                                   ▼
                    ┌─────────────────────────────┐
                    │            Judge             │
                    │     Success or failure?      │
                    └──────┬──────────────┬────────┘
                           │              │
                        Success        Failure
                           │              │
                           ▼              ▼
              ┌────────────────┐  ┌───────────────────────┐
              │ Distill        │  │ Distill Failure        │
              │ Success        │  │ root cause, fix,       │
              │ Store solution │  │ confidence score       │
              └───────┬────────┘  └──────────┬────────────┘
                      │                       │
                      ▼                       ▼
              ┌────────────────┐  ┌───────────────────────┐
              │ Task Complete  │  │  Circuit breaker or    │
              └────────────────┘  │   budget exceeded?    │
                                  └──────┬──────────┬──────┘
                                         │          │
                                        Yes         No
                                         │          │
                                         ▼          └──► Retrieve ↑
                                ┌─────────────────┐
                                │ Task Abandoned  │
                                └─────────────────┘

OpenAI Agents SDK -- orchestrates the REJD loop with handoffs between specialist agents
Redis -- 3-tier failure memory (working, long-term, episodic) via Agent Memory Server
Blaxel -- Firecracker sandbox with 25ms checkpoint/restore (optional; Docker or direct execution for local dev)

Timeline UI

The non-chat interface that makes failure learning visible.

A horizontal timeline of colored nodes tells the full story at a glance:

RED (failed) -> RED (failed) -> RED (failed) -> GREEN (succeeded)

Click any node to inspect the full failure record -- root cause, suggested fix, confidence score, cost, and the exact code diff that resolved it.

Integrations

Works with any agent framework:

OpenAI Agents SDK
LangGraph
CrewAI
Claude Agent SDK
Raw Python

ReLoop is the memory layer -- bring your own orchestration.

A/B: Memory vs No Memory

Metric	Without Memory	With Memory
Attempts to fix 4 bugs	12+	4
Total cost	$0.47	$0.18
Same mistake repeated	3x	0x

API Reference

Full API reference: docs/api-reference.md

Method	Path	Description
`POST`	`/v1/tasks`	Create and run a task
`GET`	`/v1/tasks/{id}`	Get task status and result
`GET`	`/v1/tasks/{id}/timeline`	Full execution timeline
`GET`	`/v1/tasks/{id}/sse`	Server-Sent Events stream
`POST`	`/v1/memories/search`	Semantic search over failure memory
`GET`	`/v1/memories/stats`	Aggregated memory statistics
`GET`	`/v1/tasks/{id}/checkpoints`	List sandbox checkpoints
`POST`	`/v1/tasks/{id}/checkpoints/{cid}/restore`	Rewind to checkpoint
`POST`	`/v1/tasks/ab-comparison`	Run A/B comparison (with vs without memory)
`GET`	`/v1/tasks/{id}/circuit-breaker`	Get circuit breaker state for a task
`POST`	`/v1/memories/predict`	Predict failure likelihood for new code
`GET`	`/v1/memories/export`	Export failure memory as JSON

Configuration

ReLoop is configured via environment variables. Only OPENAI_API_KEY is required -- everything else has sensible defaults.

Variable	Required	Default	Description
`OPENAI_API_KEY`	Yes	--	OpenAI API key for code generation and reasoning
`REDIS_URL`	No	`redis://localhost:6379`	Redis connection URL (falls back to SQLite)
`REDIS_MEMORY_INDEX`	No	`reloop-failures`	Vector index name for failure embeddings
`BLAXEL_API_KEY`	No	--	Blaxel API key for Firecracker sandboxes
`BLAXEL_WORKSPACE`	No	--	Blaxel workspace name
`CODEX_MODEL`	No	`gpt-4o`	Chat model for planner/distiller
`REASONING_MODEL`	No	`o1`	Deep reasoning model for root cause analysis
`FAST_MODEL`	No	`gpt-4o-mini`	Fast/cheap model for classification
`EMBEDDING_MODEL`	No	`text-embedding-3-small`	Model for failure memory embeddings
`API_PORT`	No	`8000`	FastAPI server port
`API_HOST`	No	`0.0.0.0`	API host
`NEXT_PUBLIC_API_URL`	No	`http://localhost:8000`	Backend API URL for frontend
`MAX_RETRIES`	No	`5`	Maximum retry attempts per task
`MAX_BUDGET_USD`	No	`1.00`	Maximum cost budget per task
`CIRCUIT_BREAKER_THRESHOLD`	No	`3`	Consecutive failures before circuit break

See .env.example for a copy-paste template.

Layer	Technology	Role
Orchestration	OpenAI Agents SDK	REJD loop with specialist agent handoffs
Failure Memory	Redis Agent Memory Server	3-tier: working memory, long-term failure graph, episodic traces
Execution Sandbox	Blaxel Firecracker microVMs	Perpetual state, 25ms resume, checkpoint/restore
API	FastAPI + SSE	Task management, memory search, real-time streaming
Dashboard	Next.js + Tailwind + shadcn/ui	Timeline, failure sidebar, cost tracker

Contributing

We welcome contributions. See CONTRIBUTING.md for:

Development environment setup
Code style requirements (ruff, mypy)
PR process and review checklist
Architecture overview for new contributors

License

Apache 2.0 -- see LICENSE for the full text.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.5.0

Apr 12, 2026

0.4.3

Apr 12, 2026

0.4.2

Apr 12, 2026

0.4.1

Apr 12, 2026

This version

0.4.0

Apr 12, 2026

0.3.2

Apr 11, 2026

0.3.1

Apr 11, 2026

0.3.0

Apr 11, 2026

0.2.0

Apr 11, 2026

0.1.0

Apr 11, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

reloop_ai-0.4.0.tar.gz (474.4 kB view details)

Uploaded Apr 12, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

reloop_ai-0.4.0-py3-none-any.whl (115.8 kB view details)

Uploaded Apr 12, 2026 Python 3

File details

Details for the file reloop_ai-0.4.0.tar.gz.

File metadata

Download URL: reloop_ai-0.4.0.tar.gz
Upload date: Apr 12, 2026
Size: 474.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for reloop_ai-0.4.0.tar.gz
Algorithm	Hash digest
SHA256	`a0d5c1f8f02af254d2810a259693677ac5feeb3fc281c59c29f9081ba72d6877`
MD5	`fbc334520c0eafbc15721fa82b9d0c40`
BLAKE2b-256	`add5a84d219bd2fd24b53218824412b248645b7b19eb1a49a892a4d84d2368b6`

See more details on using hashes here.

File details

Details for the file reloop_ai-0.4.0-py3-none-any.whl.

File metadata

Download URL: reloop_ai-0.4.0-py3-none-any.whl
Upload date: Apr 12, 2026
Size: 115.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for reloop_ai-0.4.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`31f362d2794a75710c47d00be070b62e06bc3fc728c15bcca286c5d622112b96`
MD5	`6d1529018c694886dda24a823f361ba0`
BLAKE2b-256	`b71a5dd768c134971e70ab705c06db4871d56b9aef2a4d95f9439559e81e06c6`

See more details on using hashes here.

reloop-ai 0.4.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

ReLoop

The Problem

The Solution

Quick Start

Minimal Setup (No Redis Required)

Three Ways to Use ReLoop

1. As a Library (any agent, 3 lines)

2. As a Framework (full self-healing loop)

3. As an MCP Server (Cursor & Claude Code)

Architecture

Redis: The Memory Backbone

Blaxel: Perpetual Execution Sandboxes

The REJD Loop

Timeline UI

Integrations

A/B: Memory vs No Memory

API Reference

Configuration

Contributing

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes