Recursive Language Models for unbounded context processing
Project description
Recursive Language Models (RLM)
Python implementation of Recursive Language Models for processing unbounded context lengths.
Based on the paper by Alex Zhang and Omar Khattab (MIT, 2025) | arXiv
What is RLM?
RLM enables language models to process extremely long contexts (100k+ tokens) by:
- Storing context as a Python variable instead of in the prompt
- Allowing the LM to recursively explore and partition the context
- Avoiding "context rot" (performance degradation with long context)
Instead of this:
llm.complete(prompt="Summarize this", context=huge_document) # Context rot!
RLM does this:
rlm = RLM(model="gpt-5-mini")
result = rlm.complete(
query="Summarize this",
context=huge_document # Stored as variable, not in prompt
)
The LM can then peek, search, and recursively process the context adaptively.
Installation
Install the core library:
pip install r-llm
Install the Gradio UI:
pip install "r-llm[ui]"
Install the Gradio UI plus LinearRAG-related extras:
pip install "r-llm[all]"
If you're working from source instead:
# Clone the repository
git clone https://github.com/ysz/recursive-llm.git
cd recursive-llm
# Install core package
pip install -e .
# Install UI extras
pip install -e ".[ui]"
# Install all extras
pip install -e ".[all]"
# Install dev dependencies
pip install -e ".[dev]"
Requirements
- Python 3.9 or higher
- An API key for your chosen LLM provider (OpenAI, Anthropic, etc.)
- Or a local model setup (Ollama, llama.cpp, etc.)
Quick Start
from rlm import RLM
# Initialize with any LLM
rlm = RLM(model="gpt-5-mini")
# Process long context
result = rlm.complete(
query="What are the main themes in this document?",
context=long_document
)
print(result)
For document-heavy workflows, use the document processor to prepare chunked corpora with helper tools:
from rlm import DocumentProcessor, RLM, SourceDocument
processor = DocumentProcessor(
RLM(model="gpt-5-mini"),
chunk_size_chars=4000,
chunk_overlap_chars=400,
)
answer = processor.process_documents(
"Find the retention requirements and compare them across the documents.",
[
SourceDocument(name="policy.md", text=policy_text),
SourceDocument(name="runbook.md", text=runbook_text),
],
)
print(answer)
API Keys Setup
Set your API key via environment variable or pass it directly:
export OPENAI_API_KEY="sk-..." # or ANTHROPIC_API_KEY, etc.
Or pass directly in code:
rlm = RLM(model="gpt-5-mini", api_key="sk-...")
Supported Models
Works with 100+ LLM providers via LiteLLM:
# OpenAI
rlm = RLM(model="gpt-5")
rlm = RLM(model="gpt-5-mini")
# Groq-hosted models
rlm = RLM(model="groq/llama-3.1-8b-instant")
rlm = RLM(model="groq/meta-llama/llama-4-scout-17b-16e-instruct")
# Anthropic
rlm = RLM(model="claude-sonnet-4")
rlm = RLM(model="claude-sonnet-4-20250514")
# Ollama (local)
rlm = RLM(model="ollama/llama3.2")
rlm = RLM(model="ollama/mistral")
# llama.cpp (local)
rlm = RLM(
model="openai/local",
api_base="http://localhost:8000/v1"
)
# Azure OpenAI
rlm = RLM(model="azure/gpt-4-deployment")
# And many more via LiteLLM...
For Groq, set GROQ_API_KEY and use a Groq model string through LiteLLM.
Text files, Markdown files, and PDFs can be passed into the document processor.
Gradio UI
If you install the UI extra, you can launch the packaged app with:
rlm-gradio
Or from source:
python app.py
Advanced Usage
Two Models (Optimize Cost)
Use a cheaper model for recursive calls:
rlm = RLM(
model="gpt-5", # Root LM (main decisions)
recursive_model="gpt-5-mini" # Recursive calls (cheaper)
)
Async API
For better performance with parallel recursive calls:
import asyncio
async def main():
rlm = RLM(model="gpt-5-mini")
result = await rlm.acomplete(query, context)
print(result)
asyncio.run(main())
Configuration
rlm = RLM(
model="gpt-5-mini",
max_depth=5, # Maximum recursion depth
max_iterations=20, # Maximum REPL iterations
# Optional LiteLLM params: temperature, timeout, etc.
)
Large Document Processor
DocumentProcessor adds a reusable document-processing system on top of RLM:
- Normalizes one or many documents into named sources
- Splits large documents into overlapping, boundary-aware chunks
- Builds a manifest plus full chunk corpus for the RLM context
- Exposes helper tools inside the REPL:
find_chunks(),get_chunk(),get_document(), and chunk metadata
This makes large-doc workflows less dependent on ad hoc string slicing in prompts and gives the model a structured way to localize relevant sections before deeper analysis.
How It Works
- Context is stored as a variable in a Python REPL environment
- Root LM gets only the query plus instructions
- LM can explore context using Python code:
# Peek at context context[:1000] # Search with regex import re re.findall(r'pattern', context) # Recursive processing recursive_llm("extract dates", context[1000:2000])
- Returns final answer via
FINAL(answer)statement
Examples
See the examples/ directory for complete working examples:
basic_usage.py- Simple complete with OpenAIdocument_processor.py- Structured large-document processinggroq_usage.py- Run RLM on Groq-hosted modelsollama_local.py- Using Ollama locallytwo_models.py- Cost optimization with two modelslong_document.py- Processing 50k+ token documentsdata_extraction.py- Extract structured data from textmulti_file.py- Process multiple documentscustom_config.py- Advanced configuration
Run an example:
# Set your API key first
export OPENAI_API_KEY="sk-..."
# Run example
python examples/basic_usage.py
Performance
Paper Results
On OOLONG benchmark (132k tokens):
- GPT-5: baseline
- RLM(GPT-5-Mini): 33% better than GPT-5 at similar cost
Our Benchmark Results
Tested with GPT-5-Mini on structured data queries (counting, filtering) across 5 different test cases:
60k token contexts:
- RLM: 80% accurate (4/5 correct)
- Direct OpenAI: 0% accurate (0/5 correct, all returned approximations)
RLM wins on accuracy. Both complete requests, but only RLM gives correct answers.
150k+ token contexts:
- Direct OpenAI: Fails (rate limit errors)
- RLM: Works (processes 1M+ tokens successfully)
Token efficiency: RLM uses ~2-3k tokens per query vs 95k+ for direct approach, since context is stored as a variable instead of being sent in prompts.
Development
# Clone repository
git clone https://github.com/ysz/recursive-llm.git
cd recursive-llm
# Install with dev dependencies
pip install -e ".[dev]"
# Run tests
pytest tests/ -v
# Run tests with coverage
pytest tests/ -v --cov=src/rlm --cov-report=term-missing
# Type checking
mypy src/rlm
# Linting
ruff check src/rlm
# Format code
black src/rlm tests examples
Publishing To PyPI
# Install publishing tools
pip install -e ".[dev]"
# Build sdist + wheel
python -m build
# Check artifacts
python -m twine check dist/*
# Upload to TestPyPI first
python -m twine upload --repository testpypi dist/*
# Upload to PyPI
python -m twine upload dist/*
Before uploading, update the version in pyproject.toml and src/rlm/__init__.py.
Architecture
RLM
├── Core (async completion logic)
├── REPL Executor (safe code execution via RestrictedPython)
├── Prompt Builder (system prompts)
└── Parser (extract FINAL() answers)
Built on top of LiteLLM for universal LLM support.
Limitations
- REPL execution is sequential (no parallel code execution yet)
- No prefix caching (future enhancement)
- Recursion depth is limited (configurable via
max_depth) - No streaming support yet
Troubleshooting
"Max iterations exceeded"
- Increase
max_iterationsparameter - Simplify your query
- Check if the model is getting stuck in a loop
"API key not found"
- Set the appropriate environment variable (e.g.,
OPENAI_API_KEY) - Or pass
api_keyparameter to RLM constructor
"Model not found"
- Check model name format for your provider
- See LiteLLM docs: https://docs.litellm.ai/docs/providers
Using Ollama
- Make sure Ollama is running:
ollama serve - Pull a model first:
ollama pull llama3.2 - Use model format:
ollama/model-name
Contributing
Contributions welcome! Please:
- Fork the repository
- Create a feature branch
- Add tests for new features
- Ensure all tests pass (
pytest tests/) - Follow code style (use
blackandruff) - Submit a pull request
Citation
This implementation is based on the RLM paper by Alex Zhang and Omar Khattab.
To cite this implementation:
@software{rlm_python,
title = {recursive-llm: Python Implementation of Recursive Language Models},
author = {Gvadzabia, Grisha},
year = {2025},
url = {https://github.com/ysz/recursive-llm}
}
To cite the original paper:
@misc{zhang2025rlm,
title = {Recursive Language Models},
author = {Zhang, Alex and Khattab, Omar},
year = {2025},
month = {October},
url = {https://alexzhang13.github.io/blog/2025/rlm/},
eprint = {2512.24601},
archivePrefix = {arXiv}
}
License
MIT License - see LICENSE file for details
Acknowledgments
Based on the Recursive Language Models paper by Alex Zhang and Omar Khattab from MIT CSAIL.
Built using:
- LiteLLM for universal LLM API support
- RestrictedPython for safe code execution
Links
- Paper: https://alexzhang13.github.io/blog/2025/rlm/
- arXiv: https://arxiv.org/abs/2512.24601
- LiteLLM Docs: https://docs.litellm.ai/
- Issues: https://github.com/ysz/recursive-llm/issues
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file rec_llm-0.1.0.tar.gz.
File metadata
- Download URL: rec_llm-0.1.0.tar.gz
- Upload date:
- Size: 27.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8923b79a057267836542167caba857d41464b6891630ad9a3450883e3f14c718
|
|
| MD5 |
8fd299b679e6eef4ceacc1428be3c7a2
|
|
| BLAKE2b-256 |
4583cbc5e90d433733c06bc67ab569b0af6c79d36df6754d7da332a92b418907
|
File details
Details for the file rec_llm-0.1.0-py3-none-any.whl.
File metadata
- Download URL: rec_llm-0.1.0-py3-none-any.whl
- Upload date:
- Size: 25.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
23d4a5060960abac323008e61b07b69b7036ef12bf0c8958435d890ae6364e15
|
|
| MD5 |
e39b1fdb51e2b03034b636b2091bde8e
|
|
| BLAKE2b-256 |
ea195c40790918c770a10be1d58c0832767656c731ed13536cf5f885ac943dbc
|