Recursive Language Models for unbounded context processing

These details have not been verified by PyPI

Project links

Project description

Recursive Language Models (RLM)

Python implementation of Recursive Language Models for processing unbounded context lengths.

Based on the paper by Alex Zhang and Omar Khattab (MIT, 2025) | arXiv

What is RLM?

RLM enables language models to process extremely long contexts (100k+ tokens) by:

Storing context as a Python variable instead of in the prompt
Allowing the LM to recursively explore and partition the context
Avoiding "context rot" (performance degradation with long context)

Instead of this:

llm.complete(prompt="Summarize this", context=huge_document)  # Context rot!

RLM does this:

rlm = RLM(model="gpt-5-mini")
result = rlm.complete(
    query="Summarize this",
    context=huge_document  # Stored as variable, not in prompt
)

The LM can then peek, search, and recursively process the context adaptively.

Installation

Install the core library:

pip install r-llm

Install the Gradio UI:

pip install "r-llm[ui]"

Install the Gradio UI plus LinearRAG-related extras:

pip install "r-llm[all]"

If you're working from source instead:

# Clone the repository
git clone https://github.com/ysz/recursive-llm.git
cd recursive-llm

# Install core package
pip install -e .

# Install UI extras
pip install -e ".[ui]"

# Install all extras
pip install -e ".[all]"

# Install dev dependencies
pip install -e ".[dev]"

Requirements

Python 3.9 or higher
An API key for your chosen LLM provider (OpenAI, Anthropic, etc.)
Or a local model setup (Ollama, llama.cpp, etc.)

Quick Start

from rlm import RLM

# Initialize with any LLM
rlm = RLM(model="gpt-5-mini")

# Process long context
result = rlm.complete(
    query="What are the main themes in this document?",
    context=long_document
)
print(result)

For document-heavy workflows, use the document processor to prepare chunked corpora with helper tools:

from rlm import DocumentProcessor, RLM, SourceDocument

processor = DocumentProcessor(
    RLM(model="gpt-5-mini"),
    chunk_size_chars=4000,
    chunk_overlap_chars=400,
)

answer = processor.process_documents(
    "Find the retention requirements and compare them across the documents.",
    [
        SourceDocument(name="policy.md", text=policy_text),
        SourceDocument(name="runbook.md", text=runbook_text),
    ],
)
print(answer)

API Keys Setup

Set your API key via environment variable or pass it directly:

export OPENAI_API_KEY="sk-..."  # or ANTHROPIC_API_KEY, etc.

Or pass directly in code:

rlm = RLM(model="gpt-5-mini", api_key="sk-...")

Supported Models

Works with 100+ LLM providers via LiteLLM:

# OpenAI
rlm = RLM(model="gpt-5")
rlm = RLM(model="gpt-5-mini")

# Groq-hosted models
rlm = RLM(model="groq/llama-3.1-8b-instant")
rlm = RLM(model="groq/meta-llama/llama-4-scout-17b-16e-instruct")

# Anthropic
rlm = RLM(model="claude-sonnet-4")
rlm = RLM(model="claude-sonnet-4-20250514")

# Ollama (local)
rlm = RLM(model="ollama/llama3.2")
rlm = RLM(model="ollama/mistral")

# llama.cpp (local)
rlm = RLM(
    model="openai/local",
    api_base="http://localhost:8000/v1"
)

# Azure OpenAI
rlm = RLM(model="azure/gpt-4-deployment")

# And many more via LiteLLM...

For Groq, set GROQ_API_KEY and use a Groq model string through LiteLLM. Text files, Markdown files, and PDFs can be passed into the document processor.

Gradio UI

If you install the UI extra, you can launch the packaged app with:

rlm-gradio

Or from source:

python app.py

Advanced Usage

Two Models (Optimize Cost)

Use a cheaper model for recursive calls:

rlm = RLM(
    model="gpt-5",              # Root LM (main decisions)
    recursive_model="gpt-5-mini"  # Recursive calls (cheaper)
)

Async API

For better performance with parallel recursive calls:

import asyncio

async def main():
    rlm = RLM(model="gpt-5-mini")
    result = await rlm.acomplete(query, context)
    print(result)

asyncio.run(main())

Configuration

rlm = RLM(
    model="gpt-5-mini",
    max_depth=5,         # Maximum recursion depth
    max_iterations=20,   # Maximum REPL iterations
    # Optional LiteLLM params: temperature, timeout, etc.
)

Large Document Processor

DocumentProcessor adds a reusable document-processing system on top of RLM:

Normalizes one or many documents into named sources
Splits large documents into overlapping, boundary-aware chunks
Builds a manifest plus full chunk corpus for the RLM context
Exposes helper tools inside the REPL: find_chunks(), get_chunk(), get_document(), and chunk metadata

This makes large-doc workflows less dependent on ad hoc string slicing in prompts and gives the model a structured way to localize relevant sections before deeper analysis.

How It Works

Context is stored as a variable in a Python REPL environment
Root LM gets only the query plus instructions

LM can explore context using Python code:

# Peek at context
context[:1000]

# Search with regex
import re
re.findall(r'pattern', context)

# Recursive processing
recursive_llm("extract dates", context[1000:2000])

Returns final answer via FINAL(answer) statement

Examples

See the examples/ directory for complete working examples:

basic_usage.py - Simple complete with OpenAI
document_processor.py - Structured large-document processing
groq_usage.py - Run RLM on Groq-hosted models
ollama_local.py - Using Ollama locally
two_models.py - Cost optimization with two models
long_document.py - Processing 50k+ token documents
data_extraction.py - Extract structured data from text
multi_file.py - Process multiple documents
custom_config.py - Advanced configuration

Run an example:

# Set your API key first
export OPENAI_API_KEY="sk-..."

# Run example
python examples/basic_usage.py

Performance

Paper Results

On OOLONG benchmark (132k tokens):

GPT-5: baseline
RLM(GPT-5-Mini): 33% better than GPT-5 at similar cost

Our Benchmark Results

Tested with GPT-5-Mini on structured data queries (counting, filtering) across 5 different test cases:

60k token contexts:

RLM: 80% accurate (4/5 correct)
Direct OpenAI: 0% accurate (0/5 correct, all returned approximations)

RLM wins on accuracy. Both complete requests, but only RLM gives correct answers.

150k+ token contexts:

Direct OpenAI: Fails (rate limit errors)
RLM: Works (processes 1M+ tokens successfully)

Token efficiency: RLM uses ~2-3k tokens per query vs 95k+ for direct approach, since context is stored as a variable instead of being sent in prompts.

Development

# Clone repository
git clone https://github.com/ysz/recursive-llm.git
cd recursive-llm

# Install with dev dependencies
pip install -e ".[dev]"

# Run tests
pytest tests/ -v

# Run tests with coverage
pytest tests/ -v --cov=src/rlm --cov-report=term-missing

# Type checking
mypy src/rlm

# Linting
ruff check src/rlm

# Format code
black src/rlm tests examples

Publishing To PyPI

# Install publishing tools
pip install -e ".[dev]"

# Build sdist + wheel
python -m build

# Check artifacts
python -m twine check dist/*

# Upload to TestPyPI first
python -m twine upload --repository testpypi dist/*

# Upload to PyPI
python -m twine upload dist/*

Before uploading, update the version in pyproject.toml and src/rlm/__init__.py.

Architecture

RLM
├── Core (async completion logic)
├── REPL Executor (safe code execution via RestrictedPython)
├── Prompt Builder (system prompts)
└── Parser (extract FINAL() answers)

Built on top of LiteLLM for universal LLM support.

Limitations

REPL execution is sequential (no parallel code execution yet)
No prefix caching (future enhancement)
Recursion depth is limited (configurable via max_depth)
No streaming support yet

Troubleshooting

"Max iterations exceeded"

Increase max_iterations parameter
Simplify your query
Check if the model is getting stuck in a loop

"API key not found"

Set the appropriate environment variable (e.g., OPENAI_API_KEY)
Or pass api_key parameter to RLM constructor

"Model not found"

Check model name format for your provider
See LiteLLM docs: https://docs.litellm.ai/docs/providers

Using Ollama

Make sure Ollama is running: ollama serve
Pull a model first: ollama pull llama3.2
Use model format: ollama/model-name

Contributing

Contributions welcome! Please:

Fork the repository
Create a feature branch
Add tests for new features
Ensure all tests pass (pytest tests/)
Follow code style (use black and ruff)
Submit a pull request

Citation

This implementation is based on the RLM paper by Alex Zhang and Omar Khattab.

To cite this implementation:

@software{rlm_python,
  title = {recursive-llm: Python Implementation of Recursive Language Models},
  author = {Gvadzabia, Grisha},
  year = {2025},
  url = {https://github.com/ysz/recursive-llm}
}

To cite the original paper:

@misc{zhang2025rlm,
  title = {Recursive Language Models},
  author = {Zhang, Alex and Khattab, Omar},
  year = {2025},
  month = {October},
  url = {https://alexzhang13.github.io/blog/2025/rlm/},
  eprint = {2512.24601},
  archivePrefix = {arXiv}
}

License

MIT License - see LICENSE file for details

Acknowledgments

Based on the Recursive Language Models paper by Alex Zhang and Omar Khattab from MIT CSAIL.

Built using:

LiteLLM for universal LLM API support
RestrictedPython for safe code execution

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.0

May 14, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rec_llm-0.1.0.tar.gz (27.9 kB view details)

Uploaded May 14, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

rec_llm-0.1.0-py3-none-any.whl (25.6 kB view details)

Uploaded May 14, 2026 Python 3

File details

Details for the file rec_llm-0.1.0.tar.gz.

File metadata

Download URL: rec_llm-0.1.0.tar.gz
Upload date: May 14, 2026
Size: 27.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.13

File hashes

Hashes for rec_llm-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`8923b79a057267836542167caba857d41464b6891630ad9a3450883e3f14c718`
MD5	`8fd299b679e6eef4ceacc1428be3c7a2`
BLAKE2b-256	`4583cbc5e90d433733c06bc67ab569b0af6c79d36df6754d7da332a92b418907`

See more details on using hashes here.

File details

Details for the file rec_llm-0.1.0-py3-none-any.whl.

File metadata

Download URL: rec_llm-0.1.0-py3-none-any.whl
Upload date: May 14, 2026
Size: 25.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.13

File hashes

Hashes for rec_llm-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`23d4a5060960abac323008e61b07b69b7036ef12bf0c8958435d890ae6364e15`
MD5	`e39b1fdb51e2b03034b636b2091bde8e`
BLAKE2b-256	`ea195c40790918c770a10be1d58c0832767656c731ed13536cf5f885ac943dbc`

See more details on using hashes here.

rec-llm 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Recursive Language Models (RLM)

What is RLM?

Installation

Requirements

Quick Start

API Keys Setup

Supported Models

Gradio UI

Advanced Usage

Two Models (Optimize Cost)

Async API

Configuration

Large Document Processor

How It Works

Examples

Performance

Paper Results

Our Benchmark Results

Development

Publishing To PyPI

Architecture

Limitations

Troubleshooting

"Max iterations exceeded"

"API key not found"

"Model not found"

Using Ollama

Contributing

Citation

License

Acknowledgments

Links

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes