LM-REPL: Recursive Language Models with Self-Reflective Program Search.
Project description
LM-REPL
Recursive Language Models with self-reflective program search.
lm-repl (package import: lm_repl) is a fork of rlms, the MIT OASYS lab's inference engine for Recursive Language Models (RLMs). An RLM replaces the canonical llm.completion(prompt) call with rlm.completion(prompt): the context is offloaded into a variable inside a REPL environment, and the model writes programs that slice, search, and recursively query that context instead of attending over it directly.
This fork keeps the upstream engine and layers two things on top:
- Map-reduce style orchestration. Patches that harden the orchestrator-plus-workers pattern: long contexts are chunked and fanned out to parallel batched sub-calls (the map), and the orchestrator aggregates the partial answers (the reduce). The fork adds distinct system prompts for the orchestrator and its workers, per-child iteration budgets, and client fixes needed to drive local OpenAI-compatible servers reliably.
- Self-reflective program search (SRLM). An
SRLMsubclass implementing uncertainty-guided trajectory selection per Apple's SRLM paper: generate K candidate context-interaction trajectories, then select using the model's own uncertainty signals (self-consistency, verbalized confidence, reasoning trace length) instead of trusting a single rollout. The same paper motivates context-length routing, since recursive decomposition often hurts when the context already fits the model's window.
Lineage
| Stage | What it contributed |
|---|---|
rlms 0.1.1 (Zhang, Kraska, Khattab) |
The RLM paradigm and engine: REPL environments, recursive sub-calls, parallel rlm_query_batched, clients, logging, visualizer |
Local rlms patches |
Map-reduce orchestration support: child_system_prompt (workers get a different system prompt than the orchestrator), child_max_iterations, max_output_chars stdout truncation, default_extra_body on the OpenAI client, consecutive same-role message merging (required by llama-server), response_format pass-through |
lm-repl fork |
The SRLM subclass: context-length routing, multi-trajectory generation with parallel candidates, and joint uncertainty-guided selection |
SRLM: uncertainty-guided trajectory selection
The quality of an RLM answer depends heavily on which program trajectory the model happens to sample. SRLM subclasses RLM and replaces single-rollout inference with search over K candidates:
from lm_repl import SRLM
srlm = SRLM(
backend="openai",
backend_kwargs={"model_name": "my-model", "base_url": "http://localhost:8080/v1"},
direct_threshold=30_000, # contexts under 30K chars skip the REPL entirely
n_candidates=4, # K candidate trajectories
candidate_parallel=2, # candidates in flight at once (match server slots)
candidate_temperature=0.7, # sampling diversity across candidates
confidence_elicitation=True, # elicit per-step {"confidence": N} and use it in selection
)
result = srlm.completion(long_context, "What changed between Q3 and Q4?")
How a winner is chosen, per the SRLM paper:
- Self-consistency. Final answers are clustered semantically (normalization plus word-boundary containment, so "42" and "The answer is 42" vote together) and the plurality cluster survives. Tied clusters pool their candidates rather than favoring whichever answer appeared first.
- Joint uncertainty score. Within the surviving set, each trajectory gets
VC(p) * Len(p), whereVCis the sum of log per-step verbalized confidences (steps that skip reporting are imputed with the trajectory mean, so under-reporting cannot inflate the score) andLenis the trace length in output tokens. The candidate closest to zero wins. Withoutconfidence_elicitation, selection falls back to the shortest trace.
Implementation notes:
- Each candidate runs on a fresh
RLMinstance with its own logger and config copy, so parallel candidates share no mutable state. A crashing candidate is dropped; only if every candidate fails does the call raise. confidence_elicitation=Trueappends the reporting instruction to the system prompt automatically; spawned candidates inherit it.direct_thresholdroutes short contexts to a plain LLM call. The SRLM paper finds recursive decomposition frequently underperforms the base model within its native window, so set this to roughly the served context size.
| Parameter | Default | Meaning |
|---|---|---|
direct_threshold |
0 (off) |
Context length in chars below which the REPL is bypassed |
n_candidates |
1 |
Candidate trajectories per completion |
candidate_parallel |
1 |
Candidates run concurrently (thread pool) |
candidate_temperature |
None |
Temperature injected into candidate backends |
confidence_elicitation |
False |
Elicit per-step confidence and use VC*Len selection |
All RLM constructor arguments pass through unchanged, including child_system_prompt.
Install
Requires Python 3.11+. Note that pip install rlms installs the upstream package, not this fork.
pip install lm-repl
For development, install editable from a checkout:
uv pip install -e /path/to/lm-repl --no-deps
Verify you got the fork and not a stale upstream build:
python -c "import inspect; from lm_repl import RLM, SRLM; print('child_system_prompt' in inspect.signature(RLM.__init__).parameters)"
Quick start
from lm_repl import RLM
rlm = RLM(
backend="openai",
backend_kwargs={"model_name": "gpt-5-nano"},
verbose=True,
)
print(rlm.completion("Print me the first 100 powers of two, each on a newline.").response)
For the orchestrator/worker split used in map-reduce style runs:
rlm = RLM(
backend="openai",
backend_kwargs={...},
custom_system_prompt=ORCHESTRATOR_PROMPT, # the root model plans and reduces
child_system_prompt=WORKER_PROMPT, # sub-call workers map over chunks
child_max_iterations=5,
max_concurrent_subcalls=4,
)
REPL environments
Non-isolated environments run code on the host (fine for benchmarking, not for untrusted prompts); isolated environments run in cloud sandboxes. Natively supported: local (default), ipython, docker, modal, prime, daytona, e2b.
rlm = RLM(
environment="local",
environment_kwargs={"max_output_chars": 500},
)
local: in-processexecwith namespaced globals.max_output_charstruncates REPL stdout fed back to the model.ipython(pip install 'lm-repl[ipython]'): real IPython session, in-process or in anipykernelsubprocess with hard cell timeouts.docker: REPL inside a container (python:3.11-slimby default).modal/prime/daytona/e2b: fully isolated cloud sandboxes; sub-calls are proxied back to the host.
Model providers
OpenAI, Anthropic, OpenRouter, and Portkey clients are included. Local models work through any OpenAI-compatible server (vLLM, llama-server); the fork's default_extra_body and same-role message merging exist specifically to make local serving smooth. See lm_repl/clients/ to add providers.
Trajectory metadata and logging
RLMChatCompletion.metadata holds the full trajectory (run config plus every iteration and sub-call) when a logger is attached. SRLM relies on this for confidence scoring, and spawns per-candidate loggers automatically.
from lm_repl import RLM
from lm_repl.logger import RLMLogger
logger = RLMLogger(log_dir="./logs") # omit log_dir for in-memory only
rlm = RLM(..., logger=logger)
JSONL logs feed the bundled visualizer:
cd visualizer/
npm run dev # default localhost:3001
Citations
This fork builds directly on two papers. The engine:
@misc{zhang2026recursivelanguagemodels,
title={Recursive Language Models},
author={Alex L. Zhang and Tim Kraska and Omar Khattab},
year={2026},
eprint={2512.24601},
archivePrefix={arXiv},
primaryClass={cs.AI},
url={https://arxiv.org/abs/2512.24601},
}
The selection strategy:
@misc{alizadeh2026srlm,
title={Recursive Language Models Meet Uncertainty: The Surprising Effectiveness of Self-Reflective Program Search for Long Context},
author={Keivan Alizadeh and Parshin Shojaee and Minsik Cho and Mehrdad Farajtabar},
year={2026},
eprint={2603.15653},
archivePrefix={arXiv},
primaryClass={cs.AI},
url={https://arxiv.org/abs/2603.15653},
}
Upstream documentation, blogpost, and minimal implementation: docs | blogpost | rlm-minimal.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file lm_repl-0.2.0.tar.gz.
File metadata
- Download URL: lm_repl-0.2.0.tar.gz
- Upload date:
- Size: 125.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.10.8 {"installer":{"name":"uv","version":"0.10.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c1461ec5c63d0e6ce00dec4e8ed39d4f589dd8cc4c49e0a8992c719a2f3bf5ad
|
|
| MD5 |
b9fb1c911a51884f325322941a163c03
|
|
| BLAKE2b-256 |
3d0a6285dc13ee0b109b4693c66d0856cead163bde137ea6a9d5a1e09de2a7e8
|
File details
Details for the file lm_repl-0.2.0-py3-none-any.whl.
File metadata
- Download URL: lm_repl-0.2.0-py3-none-any.whl
- Upload date:
- Size: 108.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.10.8 {"installer":{"name":"uv","version":"0.10.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c5b523b6fe9aa64fdaac28d548729374b8b5a3273cade78f17ffccc6489d6cf8
|
|
| MD5 |
5fdb7c72c0eb952af9056080683b94f6
|
|
| BLAKE2b-256 |
eb3d267a7f1a8d543279b46386c2b5e6f7e1e3071e5bc70e1685d8ba49031ffd
|