Minimal Recursive Language Model - Let LLMs think through code
Project description
minrlm
minRLM is a token-efficient implementation of Recursive Language Models. The data never enters the prompt. The cost stays flat regardless of context size. Every step is Python code you can read, rerun, and debug.
Read the full blog post - 12 tasks, 3 models, 4,800 evaluations, all the details.
Results
| minRLM | Vanilla | Official RLM | |
|---|---|---|---|
| Accuracy | 72.7% | 69.5% | 69.7% |
| Tokens/query | 8,151 | 20,967 | 29,327 |
| Cost (600 evals) | $2.86 | $4.74 | $7.92 |
GPT-5-mini, 1,800 evaluations, 12 tasks, 50 runs each. Full per-task breakdown in eval/README.md.
Model scaling
| Model | minRLM | Vanilla | Delta | Tasks won |
|---|---|---|---|---|
| GPT-5-nano | 53.7% | 63.2% | -9.5 | 4/12 |
| GPT-5-mini | 72.7% | 69.5% | +3.2 | 7/12 |
| GPT-5.2 | 78.2% | 48.2% | +30.0 | 11/12 |
The advantage grows with model capability. Details in the blog.
Quick start
pip install minrlm # or: uv add minrlm
export OPENAI_API_KEY="sk-..."
CLI (zero-install)
# Just a task
uvx minrlm "What is the sum of the first 100 primes?"
# Task + file as context
uvx minrlm "How many ERROR lines in the last hour?" ./server.log
# Pipe context from stdin
cat huge_dataset.csv | uvx minrlm "Which product had the highest return rate?"
# Show generated code (-s) and token stats (-v)
uvx minrlm -sv "Return the sum of all primes up to 1,000,000."
# -> Sieve of Eratosthenes in 6,215 tokens, 1 iteration
# -> Answer: 37550402023
uvx minrlm -sv "Return all primes up to 1,000,000, reversed. Return a list of numbers."
# -> 999983, 999979, 999961, 999959, 999953, ...
# -> Tokens: 6,258 | Output: 616,964 chars (~154K tokens) | 25x savings
Visualizer
git clone https://github.com/avilum/minrlm && cd minrlm
uv sync --extra visualizer
uv run python examples/visualizer.py # http://localhost:7860
Python
from minrlm import RLM
client = RLM(model="gpt-5-mini")
# Large context - data never enters the prompt
answer = client.completion(
task="Which product had the highest return rate in Q3?",
context=open("q3_returns.csv").read() # could be 50MB
)
# No context - the REPL computes via code
result = client.completion(
"Return all prime numbers up to 1,000,000, reversed. Return a list of numbers."
)
# Output: 999983, 999979, 999961, 999959, 999953, ...
# Tokens used: 6,258 | Output chars: 616,964 (~154K tokens) | Savings: 25x
REPL tools
| Function | What it does |
|---|---|
input_0 |
Your context data (string) |
search(text, pattern) |
Substring search with context windows |
sub_llm(task, context) |
Recursive LLM call on a sub-chunk |
FINAL(answer) |
Return answer and stop |
Custom endpoints
client = RLM(model="llama-3.1-70b", base_url="http://localhost:8000/v1")
What's in this repo
| Component | Location | Description |
|---|---|---|
| Client | minrlm/ |
RLM class - the LLM <-> REPL loop |
| DockerREPL | minrlm/docker_repl.py |
Sandboxed execution via Docker + seccomp |
| Evals | eval/ |
12-task benchmark framework, 3 model sizes |
| Examples | examples/ |
Quickstart, proxy server, Gradio UI |
DockerREPL
LLM-generated code runs in isolated Docker containers. Docker is auto-detected. No network, read-only filesystem, memory-capped, seccomp-filtered.
client = RLM(model="gpt-5-mini", use_docker=True, docker_memory="256m")
Evals
git clone https://github.com/avilum/minrlm && cd minrlm
uv sync --extra eval
# Smoke test
uv run python eval/quickstart.py
# Full benchmark (reproduces the table above)
uv run python eval/run.py \
--tasks all \
--runners minrlm-reasoning,vanilla,official \
--runs 50 --parallel 12 --task-parallel 12 \
--output-dir logs/my_eval
Full results, per-task breakdowns, reproduction steps: eval/README.md
Examples
uv run python examples/minimal.py # vanilla vs RLM side-by-side
uv run python examples/advanced_usage.py # search, sub_llm, callbacks
uv run python examples/visualizer.py # Gradio UI (uv sync --extra visualizer)
uv run uvicorn examples.proxy:app --port 8000 # OpenAI-compatible proxy (uv sync --extra proxy)
Credits
Built by Avi Lumelsky. Independent implementation - not a fork. The RLM concept comes from Zhang, Kraska, and Khattab (2025). Official implementation: github.com/alexzhang13/rlm.
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file minrlm-0.1.2-py3-none-any.whl.
File metadata
- Download URL: minrlm-0.1.2-py3-none-any.whl
- Upload date:
- Size: 50.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
55bfa1169ae8249dfec283b70f1dce6c0db7492c947bd1fc075728501754536d
|
|
| MD5 |
a747877d4cfc58c17a7969ec72860838
|
|
| BLAKE2b-256 |
9e8e9a4794a3780871d6e4f201ff3df57680c158488f859fc98a5271c3d1cc62
|