Recursive Language Model - Process unlimited context with any LLM
Project description
RLM Engine
Recursive Language Model - Process unlimited context by having LLMs write and execute code.
Overview
RLM solves the context length limitation of LLMs by treating them as a "neurosymbolic operating system." Instead of feeding entire documents to the model, RLM:
- Provides a Python REPL environment
- LLM writes code to explore/search the document
- Code executes and returns results
- LLM iterates until finding the answer
This enables processing 10M+ character documents that would overflow traditional context windows.
Installation
pip install -e .
# Optional: For YAML config support
pip install pyyaml
# Optional: For API server
pip install fastapi uvicorn
Quick Start
Python API
from rlm import RLM, RLMConfig
# With OpenAI
rlm = RLM(backend="openai", model="gpt-4o")
# With vLLM (self-hosted)
rlm = RLM(
backend="vllm",
model="meta-llama/Llama-3.1-70B-Instruct",
base_url="http://localhost:8000/v1"
)
# Process a document
result = rlm.completion(
query="What is the secret code?",
context=huge_document, # Can be 10M+ characters
)
print(result.answer)
print(f"Iterations: {result.iterations}")
print(f"Time: {result.execution_time:.2f}s")
CLI
# Query a file
rlm query "What is the revenue?" --file report.txt
# Pipe from stdin
cat document.txt | rlm query "Summarize this"
# Use specific backend
rlm query "Find dates" --file data.txt --backend vllm --base-url http://localhost:8000/v1
# Output as JSON
rlm query "Count words" --file doc.txt --json
API Server
# Start server
rlm serve --port 8080
# Or with Python
python -m rlm.server
# Query the API
curl -X POST http://localhost:8080/v1/rlm/completion \
-H "Content-Type: application/json" \
-d '{
"query": "What is the revenue?",
"context": "Q4 Report: Revenue $500M..."
}'
Configuration
Environment Variables
export RLM_BACKEND=vllm
export RLM_MODEL=meta-llama/Llama-3.1-70B-Instruct
export RLM_BASE_URL=http://localhost:8000/v1
export RLM_MAX_ITERATIONS=10
export RLM_VERBOSE=true
Config File (rlm.yaml)
model:
backend: vllm
model: meta-llama/Llama-3.1-70B-Instruct
base_url: http://localhost:8000/v1
rlm:
max_iterations: 10
max_depth: 3
temperature: 0.7
verbose: false
optimizations:
cache_enabled: true
parallel_chunks: 5
server:
host: 0.0.0.0
port: 8080
# Initialize config
rlm init
Optimized Variants
FastRLM
Optimized for speed with relevance filtering:
from rlm import FastRLM
rlm = FastRLM(
backend="vllm",
base_url="http://localhost:8000/v1",
use_relevance_filtering=True,
)
result = await rlm.fast_completion(query, context)
ScalableRLM
Optimized for large documents with chunking and caching:
from rlm import ScalableRLM
rlm = ScalableRLM(
backend="openai",
enable_cache=True,
max_concurrent=10,
)
result = await rlm.scalable_completion(
query="Summarize this 10M document",
context=massive_document,
)
Streaming
from rlm import RLM, StreamingRLM
rlm = RLM(backend="openai")
streaming = StreamingRLM(rlm)
async for event in streaming.stream_completion(query, context):
if event.event_type == "code":
print(f"Executing: {event.data}")
elif event.event_type == "output":
print(f"Output: {event.data}")
elif event.event_type == "answer":
print(f"Answer: {event.data}")
Supported Backends
| Backend | Requires API Key | Self-Hosted |
|---|---|---|
openai |
Yes (OPENAI_API_KEY) | No |
anthropic |
Yes (ANTHROPIC_API_KEY) | No |
vllm |
No | Yes |
ollama |
No | Yes |
Architecture
┌─────────────────────────────────────────────────────────────┐
│ RLM Engine │
├─────────────────────────────────────────────────────────────┤
│ │
│ Query + Context │
│ │ │
│ ▼ │
│ ┌─────────────┐ │
│ │ System │ ◄── Few-shot examples for code writing │
│ │ Prompt │ │
│ └─────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────┐ ┌─────────────┐ │
│ │ LLM │ ──► │ Parser │ ──► Extract code │
│ │ Call │ └─────────────┘ │
│ └─────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────┐ │
│ │ Python │ ◄── Safe sandbox with context access │
│ │ REPL │ │
│ └─────────────┘ │
│ │ │
│ ▼ │
│ Output fed back to LLM ──────────────────► Loop │
│ │ │
│ ▼ │
│ FINAL(answer) ──────────────────────────► Return │
│ │
└─────────────────────────────────────────────────────────────┘
Benchmarks
Run benchmarks:
rlm benchmark --backend vllm --base-url http://localhost:8000/v1 -o results.json
| Model | Accuracy | Avg Latency |
|---|---|---|
| GPT-4o | 95% | 5s |
| Claude Sonnet | 92% | 6s |
| Llama-3.1-70B | 88% | 4s |
| Phi-3.5-mini | 25-100%* | 3-7s |
*Accuracy varies by task complexity
Development
# Install dev dependencies
pip install -e ".[dev]"
# Run tests
pytest tests/ -v
# Run specific tests
pytest tests/test_parser.py -v
# Run with coverage
pytest tests/ --cov=rlm --cov-report=html
Project Structure
rlm-engine/
├── rlm/
│ ├── __init__.py # Package exports
│ ├── core.py # Main RLM implementation
│ ├── fast_rlm.py # Speed-optimized variant
│ ├── scalable_rlm.py # Scale-optimized variant
│ ├── parser.py # Code/answer extraction
│ ├── prompts.py # System prompts
│ ├── repl.py # Python REPL sandbox
│ ├── streaming.py # Streaming support
│ ├── server.py # FastAPI server
│ ├── cli.py # CLI tool
│ ├── config.py # Configuration
│ ├── logging_config.py # Structured logging
│ ├── clients/ # LLM backend clients
│ └── optimizations/ # Caching, chunking, etc.
├── tests/ # Test suite
├── examples/ # Usage examples
├── benchmarks/ # Benchmark suite
└── README.md
License
MIT License
References
- RLM Paper - Original research
- MIT Implementation - Official implementation
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file rlm_engine-1.0.0.tar.gz.
File metadata
- Download URL: rlm_engine-1.0.0.tar.gz
- Upload date:
- Size: 39.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
eee3e1dc82dc0c750112f6589c7418c1084f7b89496df4549133bb3a8af0a410
|
|
| MD5 |
e60eb8c8871e8b3e1ca42eb0b9aab179
|
|
| BLAKE2b-256 |
4c79220a48e17227737b12866dde5785df694ba18bfe454812bbe20d7a001f77
|
File details
Details for the file rlm_engine-1.0.0-py3-none-any.whl.
File metadata
- Download URL: rlm_engine-1.0.0-py3-none-any.whl
- Upload date:
- Size: 41.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3208d73c2906efe90177b195a101d5e111255879ab8fe90ee6e49216cf81a409
|
|
| MD5 |
44b9add53fc301b446944fee4b3fa21e
|
|
| BLAKE2b-256 |
41748f79a00f138eb8669f10118ef9d2508df0c2de940393abcbc0fd752251e2
|