Infinite memory for Claude Code - MCP server inspired by MIT RLM paper
Project description
RLM - Infinite Memory for Claude Code
Your Claude Code sessions forget everything after
/compact. RLM fixes that.
The Problem
Claude Code has a context window limit. When it fills up:
/compactwipes your conversation history- Previous decisions, insights, and context are lost
- You repeat yourself. Claude makes the same mistakes. Productivity drops.
The Solution
RLM is an MCP server that gives Claude Code persistent memory across sessions:
You: "Remember that the client prefers 500ml bottles"
→ Saved. Forever. Across all sessions.
You: "What did we decide about the API architecture?"
→ Claude searches its memory and finds the answer.
3 lines to install. 14 tools. Zero configuration.
Quick Install
Via PyPI (recommended)
pip install mcp-rlm-server[all]
Via Git
git clone https://github.com/EncrEor/rlm-claude.git
cd rlm-claude
./install.sh
Restart Claude Code. Done.
Requirements: Python 3.10+, Claude Code CLI
Upgrading from v0.9.0 or earlier
v0.9.1 moved the source code from mcp_server/ to src/mcp_server/ (PyPA best practice). A compatibility symlink is included so existing installations keep working, but we recommend re-running the installer:
cd rlm-claude
git pull
./install.sh # reconfigures the MCP server path
Your data (~/.claude/rlm/) is untouched. Only the server path is updated.
How It Works
┌─────────────────────────┐
│ Claude Code CLI │
└────────────┬────────────┘
│
┌────────────▼────────────┐
│ RLM MCP Server │
│ (14 tools) │
└────────────┬────────────┘
│
┌──────────────────┼──────────────────┐
│ │ │
┌─────────▼────────┐ ┌──────▼──────┐ ┌──────────▼─────────┐
│ Insights │ │ Chunks │ │ Retention │
│ (key decisions, │ │ (full conv │ │ (auto-archive, │
│ facts, prefs) │ │ history) │ │ restore, purge) │
└──────────────────┘ └─────────────┘ └────────────────────┘
Auto-Save Before Context Loss
RLM hooks into Claude Code's /compact event. Before your context is wiped, RLM automatically saves a snapshot. No action needed.
Two Memory Systems
| System | What it stores | How to use |
|---|---|---|
| Insights | Key decisions, facts, preferences | rlm_remember() / rlm_recall() |
| Chunks | Full conversation segments | rlm_chunk() / rlm_peek() / rlm_grep() |
Features
Memory & Insights
rlm_remember- Save decisions, facts, preferences with categories and importance levelsrlm_recall- Search insights by keyword (multi-word tokenized), category, or importancerlm_forget- Remove an insightrlm_status- System overview (insight count, chunk stats, access metrics)
Conversation History
rlm_chunk- Save conversation segments with typed categorization (snapshot,session,debug;insightredirects torlm_remember)rlm_peek- Read a chunk (full or partial by line range)rlm_grep- Regex search across all chunks (+ fuzzy matching for typo tolerance)rlm_search- Hybrid search: BM25 + semantic cosine similarity (FR/EN, accent-normalized, chunks + insights)rlm_list_chunks- List all chunks with metadata
Multi-Project Organization
rlm_sessions- Browse sessions by project or domainrlm_domains- List available domains for categorization- Auto-detection of project from git or working directory
- Cross-project filtering on all search tools
Smart Retention
rlm_retention_preview- Preview what would be archived (dry-run)rlm_retention_run- Archive old unused chunks, purge ancient onesrlm_restore- Bring back archived chunks- 3-zone lifecycle: Active → Archive (.gz) → Purge
- Immunity system: critical tags, frequent access, and keywords protect chunks
Auto-Chunking (Hooks)
- PreCompact hook: Automatic snapshot before
/compactor auto-compact - PostToolUse hook: Stats tracking after chunk operations
- User-driven philosophy: you decide when to chunk, the system saves before loss
Semantic Search (optional)
- Hybrid BM25 + cosine - Combines keyword matching with vector similarity for better relevance
- Auto-embedding - New chunks are automatically embedded at creation time
- Two providers - Model2Vec (fast, 256d) or FastEmbed (accurate, 384d)
- Graceful degradation - Falls back to pure BM25 when semantic deps are not installed
Provider comparison (benchmark on 108 chunks)
| Model2Vec (default) | FastEmbed | |
|---|---|---|
| Model | potion-multilingual-128M |
paraphrase-multilingual-MiniLM-L12-v2 |
| Dimensions | 256 | 384 |
| Embed 108 chunks | 0.06s | 1.30s |
| Search latency | 0.1ms/query | 1.5ms/query |
| Memory | 0.1 MB | 0.3 MB |
| Disk (model) | ~35 MB | ~230 MB |
| Semantic quality | Good (keyword-biased) | Better (true semantic) |
| Speed | 21x faster | Baseline |
Top-5 result overlap between providers: ~1.6/5 (different results in 7/8 queries). FastEmbed captures more semantic meaning while Model2Vec leans toward keyword similarity. The hybrid BM25 + cosine fusion compensates for both weaknesses.
Recommendation: Start with Model2Vec (default). Switch to FastEmbed only if you need better semantic accuracy and can afford the slower startup.
# Model2Vec (default) — fast, ~35 MB
pip install mcp-rlm-server[semantic]
# FastEmbed — more accurate, ~230 MB, slower
pip install mcp-rlm-server[semantic-fastembed]
export RLM_EMBEDDING_PROVIDER=fastembed
# Compare both providers on your data
python3 scripts/benchmark_providers.py
# Backfill existing chunks (run once after install)
python3 scripts/backfill_embeddings.py
Sub-Agent Skills
/rlm-analyze- Analyze a single chunk with an isolated sub-agent/rlm-parallel- Analyze multiple chunks in parallel (Map-Reduce pattern from MIT RLM paper)
Comparison
| Feature | Raw Context | Letta/MemGPT | RLM |
|---|---|---|---|
| Persistent memory | No | Yes | Yes |
| Works with Claude Code | N/A | No (own runtime) | Native MCP |
| Auto-save before compact | No | N/A | Yes (hooks) |
| Search (regex + BM25 + semantic) | No | Basic | Yes |
| Fuzzy search (typo-tolerant) | No | No | Yes |
| Multi-project support | No | No | Yes |
| Smart retention (archive/purge) | No | Basic | Yes |
| Sub-agent analysis | No | No | Yes |
| Zero config install | N/A | Complex | 3 lines |
| FR/EN support | N/A | EN only | Both |
| Cost | Free | Self-hosted | Free |
Usage Examples
Save and recall insights
# Save a key decision
rlm_remember("Backend is the source of truth for all data",
category="decision", importance="high",
tags="architecture,backend")
# Find it later
rlm_recall(query="source of truth")
rlm_recall(category="decision")
Manage conversation history
# Save important discussion (typed)
rlm_chunk("Discussion about API redesign... [long content]",
summary="API v2 architecture decisions",
tags="api,architecture",
chunk_type="session") # or "snapshot", "debug"
# Search across all history
rlm_search("API architecture decisions") # BM25 ranked
rlm_grep("authentication", fuzzy=True) # Typo-tolerant
# Read a specific chunk
rlm_peek("2026-01-18_MyProject_001")
Multi-project organization
# Filter by project
rlm_search("deployment issues", project="MyApp")
rlm_grep("database", project="MyApp", domain="infra")
# Browse sessions
rlm_sessions(project="MyApp")
Project Structure
rlm-claude/
├── src/mcp_server/
│ ├── server.py # MCP server (14 tools)
│ └── tools/
│ ├── memory.py # Insights (remember/recall/forget)
│ ├── navigation.py # Chunks (chunk/peek/grep/list)
│ ├── search.py # BM25 search engine
│ ├── tokenizer_fr.py # FR/EN tokenization
│ ├── sessions.py # Multi-session management
│ ├── retention.py # Archive/restore/purge lifecycle
│ ├── embeddings.py # Embedding providers (Model2Vec, FastEmbed)
│ ├── vecstore.py # Vector store (.npz) for semantic search
│ └── fileutil.py # Safe I/O (atomic writes, path validation, locking)
│
├── hooks/ # Claude Code hooks
│ ├── pre_compact_chunk.py # Auto-save before /compact (PreCompact hook)
│ └── reset_chunk_counter.py # Stats reset after chunk (PostToolUse hook)
│
├── templates/
│ ├── hooks_settings.json # Hook config template
│ ├── CLAUDE_RLM_SNIPPET.md # CLAUDE.md instructions
│ └── skills/ # Sub-agent skills
│
├── context/ # Storage (created at install, git-ignored)
│ ├── session_memory.json # Insights
│ ├── index.json # Chunk index
│ ├── chunks/ # Conversation history
│ ├── archive/ # Compressed archives (.gz)
│ ├── embeddings.npz # Semantic vectors (Phase 8)
│ └── sessions.json # Session index
│
├── install.sh # One-command installer
└── README.md
Configuration
Hook Configuration
The installer automatically configures hooks in ~/.claude/settings.json:
{
"hooks": {
"PreCompact": [
{
"matcher": "manual",
"hooks": [{ "type": "command", "command": "python3 ~/.claude/rlm/hooks/pre_compact_chunk.py" }]
},
{
"matcher": "auto",
"hooks": [{ "type": "command", "command": "python3 ~/.claude/rlm/hooks/pre_compact_chunk.py" }]
}
],
"PostToolUse": [{
"matcher": "mcp__rlm-server__rlm_chunk",
"hooks": [{ "type": "command", "command": "python3 ~/.claude/rlm/hooks/reset_chunk_counter.py" }]
}]
}
}
Custom Domains
Organize chunks by topic with custom domains:
{
"domains": {
"my_project": {
"description": "Domains for my project",
"list": ["feature", "bugfix", "infra", "docs"]
}
}
}
Edit context/domains.json after installation.
Manual Installation
If you prefer to install manually:
pip install -e ".[all]"
claude mcp add rlm-server -- python3 -m mcp_server
mkdir -p ~/.claude/rlm/hooks
cp hooks/*.py ~/.claude/rlm/hooks/
chmod +x ~/.claude/rlm/hooks/*.py
mkdir -p ~/.claude/skills/rlm-analyze ~/.claude/skills/rlm-parallel
cp templates/skills/rlm-analyze/skill.md ~/.claude/skills/rlm-analyze/
cp templates/skills/rlm-parallel/skill.md ~/.claude/skills/rlm-parallel/
Then configure hooks in ~/.claude/settings.json (see above).
Uninstall
./uninstall.sh # Interactive (choose to keep or delete data)
./uninstall.sh --keep-data # Remove RLM config, keep your chunks/insights
./uninstall.sh --all # Remove everything
./uninstall.sh --dry-run # Preview what would be removed
Security
RLM includes built-in protections for safe operation:
- Path traversal prevention - Chunk IDs are validated against a strict allowlist (
[a-zA-Z0-9_.-&]), and resolved paths are verified to stay within the storage directory - Atomic writes - All JSON and chunk files are written using write-to-temp-then-rename, preventing corruption from interrupted writes or crashes
- File locking - Concurrent read-modify-write operations on shared indexes use
fcntl.flockexclusive locks - Content size limits - Chunks are limited to 2 MB, and gzip decompression (archive restore) is capped at 10 MB to prevent resource exhaustion
- SHA-256 hashing - Content deduplication uses SHA-256 (not MD5)
All I/O safety primitives are centralized in mcp_server/tools/fileutil.py.
Troubleshooting
"MCP server not found"
claude mcp list # Check servers
claude mcp remove rlm-server # Remove if exists
claude mcp add rlm-server -- python3 -m mcp_server
"Hooks not working"
cat ~/.claude/settings.json | grep -A 10 "PreCompact" # Verify hooks config
ls ~/.claude/rlm/hooks/ # Check installed hooks
Roadmap
- Phase 1: Memory tools (remember/recall/forget/status)
- Phase 2: Navigation tools (chunk/peek/grep/list)
- Phase 3: Auto-chunking + sub-agent skills
- Phase 4: Production (auto-summary, dedup, access tracking)
- Phase 5: Advanced (BM25 search, fuzzy grep, multi-sessions, retention)
- Phase 6: Production-ready (tests, CI/CD, PyPI)
- Phase 7: MAGMA-inspired (temporal filtering, entity extraction)
- Phase 8: Hybrid semantic search (BM25 + cosine, Model2Vec)
- Phase 9: Typed chunking —
chunk_typeparameter (snapshot/session/debug/insight redirect)
Inspired By
Research Papers
- RLM Paper (MIT CSAIL) - Zhang et al., Dec 2025 - "Recursive Language Models" — foundational architecture (chunk/peek/grep, sub-agent analysis)
- MAGMA (arXiv:2601.03236) - Jan 2026 - "Memory-Augmented Generation with Memory Agents" — temporal filtering, entity extraction (Phase 7)
Libraries & Tools
- Model2Vec - Static word embeddings for fast semantic search (Phase 8)
- BM25S - Fast BM25 implementation in pure Python (Phase 5)
- FastEmbed - ONNX-based embeddings, optional provider (Phase 8)
- Letta/MemGPT - AI agent memory framework — early inspiration
Standards & Platform
- MCP Specification - Model Context Protocol
- Claude Code Hooks - PreCompact / PostToolUse hooks
Authors
- Ahmed MAKNI (@EncrEor)
- Claude Opus 4.5 (joint R&D)
License
MIT License - see LICENSE
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file mcp_rlm_server-0.10.0.tar.gz.
File metadata
- Download URL: mcp_rlm_server-0.10.0.tar.gz
- Upload date:
- Size: 52.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b878b9fb11f51954dc18d0ed322f2bd3956717baf8da5b6f486a7cebfda93aed
|
|
| MD5 |
d4aaf8df8f90f1e5ce0a5be7538bb6d9
|
|
| BLAKE2b-256 |
f55be85ec5cb59f61ec320bf7d349e139ed545ae88d04d02b6e24938e39a3339
|
Provenance
The following attestation bundles were made for mcp_rlm_server-0.10.0.tar.gz:
Publisher:
ci.yml on EncrEor/rlm-claude
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
mcp_rlm_server-0.10.0.tar.gz -
Subject digest:
b878b9fb11f51954dc18d0ed322f2bd3956717baf8da5b6f486a7cebfda93aed - Sigstore transparency entry: 910519473
- Sigstore integration time:
-
Permalink:
EncrEor/rlm-claude@c98fa02f6f616ca08df15a69402f12c29d62ae51 -
Branch / Tag:
refs/tags/v0.10.0 - Owner: https://github.com/EncrEor
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
ci.yml@c98fa02f6f616ca08df15a69402f12c29d62ae51 -
Trigger Event:
push
-
Statement type:
File details
Details for the file mcp_rlm_server-0.10.0-py3-none-any.whl.
File metadata
- Download URL: mcp_rlm_server-0.10.0-py3-none-any.whl
- Upload date:
- Size: 46.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5ce5cb4f56e910afbc6cc8f808b617659ceebc522b34e2f71f0456b253097863
|
|
| MD5 |
1acfe97cf5457f5119aecf8204eee22e
|
|
| BLAKE2b-256 |
4c0e4fa534d1611b0b7b263c397cc6467a1a9258a963f23fbb7b58a54889f851
|
Provenance
The following attestation bundles were made for mcp_rlm_server-0.10.0-py3-none-any.whl:
Publisher:
ci.yml on EncrEor/rlm-claude
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
mcp_rlm_server-0.10.0-py3-none-any.whl -
Subject digest:
5ce5cb4f56e910afbc6cc8f808b617659ceebc522b34e2f71f0456b253097863 - Sigstore transparency entry: 910519476
- Sigstore integration time:
-
Permalink:
EncrEor/rlm-claude@c98fa02f6f616ca08df15a69402f12c29d62ae51 -
Branch / Tag:
refs/tags/v0.10.0 - Owner: https://github.com/EncrEor
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
ci.yml@c98fa02f6f616ca08df15a69402f12c29d62ae51 -
Trigger Event:
push
-
Statement type: