Skip to main content

Minimal Recursive Language Model - Let LLMs think through code

Project description

minrlm

minRLM is a token-efficient implementation of Recursive Language Models. The data never enters the prompt. The cost stays flat regardless of context size. Every step is Python code you can read, rerun, and debug.

Read the full blog post - 12 tasks, 3 models, 4,800 evaluations, all the details.

Results

minRLM Vanilla Official RLM
Accuracy 72.7% 69.5% 69.7%
Tokens/query 8,151 20,967 29,327
Cost (600 evals) $2.86 $4.74 $7.92

GPT-5-mini, 1,800 evaluations, 12 tasks, 50 runs each. Full per-task breakdown in eval/README.md.

Model scaling

Model minRLM Vanilla Delta Tasks won
GPT-5-nano 53.7% 63.2% -9.5 4/12
GPT-5-mini 72.7% 69.5% +3.2 7/12
GPT-5.2 78.2% 48.2% +30.0 11/12

The advantage grows with model capability. Details in the blog.

Summary Accuracy Tokens
Cost Latency Per Task

Quick start

pip install minrlm   # or: uv add minrlm
export OPENAI_API_KEY="sk-..."

CLI (zero-install)

# Just a task
uvx minrlm "What is the sum of the first 100 primes?"

# Task + file as context
uvx minrlm "How many ERROR lines in the last hour?" ./server.log

# Pipe context from stdin
cat huge_dataset.csv | uvx minrlm "Which product had the highest return rate?"

# Show generated code (-s) and token stats (-v)
uvx minrlm -sv "Return the sum of all primes up to 1,000,000."
# -> Sieve of Eratosthenes in 6,215 tokens, 1 iteration
# -> Answer: 37550402023

uvx minrlm -sv "Return all primes up to 1,000,000, reversed. Return a list of numbers."
# -> 999983, 999979, 999961, 999959, 999953, ...
# -> Tokens: 6,258 | Output: 616,964 chars (~154K tokens) | 25x savings

Visualizer

git clone https://github.com/avilum/minrlm && cd minrlm
uv sync --extra visualizer
uv run python examples/visualizer.py   # http://localhost:7860

Python

from minrlm import RLM

client = RLM(model="gpt-5-mini")

# Large context - data never enters the prompt
answer = client.completion(
    task="Which product had the highest return rate in Q3?",
    context=open("q3_returns.csv").read()  # could be 50MB
)

# No context - the REPL computes via code
result = client.completion(
    "Return all prime numbers up to 1,000,000, reversed. Return a list of numbers."
)
# Output: 999983, 999979, 999961, 999959, 999953, ...
# Tokens used: 6,258 | Output chars: 616,964 (~154K tokens) | Savings: 25x

REPL tools

Function What it does
input_0 Your context data (string)
search(text, pattern) Substring search with context windows
sub_llm(task, context) Recursive LLM call on a sub-chunk
FINAL(answer) Return answer and stop

Custom endpoints

client = RLM(model="llama-3.1-70b", base_url="http://localhost:8000/v1")

What's in this repo

Component Location Description
Client minrlm/ RLM class - the LLM <-> REPL loop
DockerREPL minrlm/docker_repl.py Sandboxed execution via Docker + seccomp
Evals eval/ 12-task benchmark framework, 3 model sizes
Examples examples/ Quickstart, proxy server, Gradio UI

DockerREPL

LLM-generated code runs in isolated Docker containers. Docker is auto-detected. No network, read-only filesystem, memory-capped, seccomp-filtered.

client = RLM(model="gpt-5-mini", use_docker=True, docker_memory="256m")

Evals

git clone https://github.com/avilum/minrlm && cd minrlm
uv sync --extra eval

# Smoke test
uv run python eval/quickstart.py

# Full benchmark (reproduces the table above)
uv run python eval/run.py \
    --tasks all \
    --runners minrlm-reasoning,vanilla,official \
    --runs 50 --parallel 12 --task-parallel 12 \
    --output-dir logs/my_eval

Full results, per-task breakdowns, reproduction steps: eval/README.md

Examples

uv run python examples/minimal.py              # vanilla vs RLM side-by-side
uv run python examples/advanced_usage.py        # search, sub_llm, callbacks
uv run python examples/visualizer.py            # Gradio UI (uv sync --extra visualizer)
uv run uvicorn examples.proxy:app --port 8000   # OpenAI-compatible proxy (uv sync --extra proxy)

Credits

Built by Avi Lumelsky. Independent implementation - not a fork. The RLM concept comes from Zhang, Kraska, and Khattab (2025). Official implementation: github.com/alexzhang13/rlm.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

minrlm-0.1.2-py3-none-any.whl (50.0 kB view details)

Uploaded Python 3

File details

Details for the file minrlm-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: minrlm-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 50.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.13

File hashes

Hashes for minrlm-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 55bfa1169ae8249dfec283b70f1dce6c0db7492c947bd1fc075728501754536d
MD5 a747877d4cfc58c17a7969ec72860838
BLAKE2b-256 9e8e9a4794a3780871d6e4f201ff3df57680c158488f859fc98a5271c3d1cc62

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page