LM-REPL: Recursive Language Models with Self-Reflective Program Search.

These details have not been verified by PyPI

Project links

Project description

LM-REPL

Recursive Language Models with self-reflective program search.

lm-repl (package import: lm_repl) is a fork of rlms, the MIT OASYS lab's inference engine for Recursive Language Models (RLMs). An RLM replaces the canonical llm.completion(prompt) call with rlm.completion(prompt): the context is offloaded into a variable inside a REPL environment, and the model writes programs that slice, search, and recursively query that context instead of attending over it directly.

This fork keeps the upstream engine and layers two things on top:

Map-reduce style orchestration. Patches that harden the orchestrator-plus-workers pattern: long contexts are chunked and fanned out to parallel batched sub-calls (the map), and the orchestrator aggregates the partial answers (the reduce). The fork adds distinct system prompts for the orchestrator and its workers, per-child iteration budgets, and client fixes needed to drive local OpenAI-compatible servers reliably.
Self-reflective program search (SRLM). An SRLM subclass implementing uncertainty-guided trajectory selection per Apple's SRLM paper: generate K candidate context-interaction trajectories, then select using the model's own uncertainty signals (self-consistency, verbalized confidence, reasoning trace length) instead of trusting a single rollout. The same paper motivates context-length routing, since recursive decomposition often hurts when the context already fits the model's window.

Lineage

Stage	What it contributed
`rlms` 0.1.1 (Zhang, Kraska, Khattab)	The RLM paradigm and engine: REPL environments, recursive sub-calls, parallel `rlm_query_batched`, clients, logging, visualizer
Local `rlms` patches	Map-reduce orchestration support: `child_system_prompt` (workers get a different system prompt than the orchestrator), `child_max_iterations`, `max_output_chars` stdout truncation, `default_extra_body` on the OpenAI client, consecutive same-role message merging (required by llama-server), `response_format` pass-through
`lm-repl` fork	The `SRLM` subclass: context-length routing, multi-trajectory generation with parallel candidates, and joint uncertainty-guided selection

SRLM: uncertainty-guided trajectory selection

The quality of an RLM answer depends heavily on which program trajectory the model happens to sample. SRLM subclasses RLM and replaces single-rollout inference with search over K candidates:

from lm_repl import SRLM

srlm = SRLM(
    backend="openai",
    backend_kwargs={"model_name": "my-model", "base_url": "http://localhost:8080/v1"},
    direct_threshold=30_000,      # contexts under 30K chars skip the REPL entirely
    n_candidates=4,               # K candidate trajectories
    candidate_parallel=2,         # candidates in flight at once (match server slots)
    candidate_temperature=0.7,    # sampling diversity across candidates
    confidence_elicitation=True,  # elicit per-step {"confidence": N} and use it in selection
)

result = srlm.completion(long_context, "What changed between Q3 and Q4?")

How a winner is chosen, per the SRLM paper:

Self-consistency. Final answers are clustered semantically (normalization plus word-boundary containment, so "42" and "The answer is 42" vote together) and the plurality cluster survives. Tied clusters pool their candidates rather than favoring whichever answer appeared first.
Joint uncertainty score. Within the surviving set, each trajectory gets VC(p) * Len(p), where VC is the sum of log per-step verbalized confidences (steps that skip reporting are imputed with the trajectory mean, so under-reporting cannot inflate the score) and Len is the trace length in output tokens. The candidate closest to zero wins. Without confidence_elicitation, selection falls back to the shortest trace.

Implementation notes:

Each candidate runs on a fresh RLM instance with its own logger and config copy, so parallel candidates share no mutable state. A crashing candidate is dropped; only if every candidate fails does the call raise.
confidence_elicitation=True appends the reporting instruction to the system prompt automatically; spawned candidates inherit it.
direct_threshold routes short contexts to a plain LLM call. The SRLM paper finds recursive decomposition frequently underperforms the base model within its native window, so set this to roughly the served context size.

Parameter	Default	Meaning
`direct_threshold`	`0` (off)	Context length in chars below which the REPL is bypassed
`n_candidates`	`1`	Candidate trajectories per completion
`candidate_parallel`	`1`	Candidates run concurrently (thread pool)
`candidate_temperature`	`None`	Temperature injected into candidate backends
`confidence_elicitation`	`False`	Elicit per-step confidence and use VC*Len selection

All RLM constructor arguments pass through unchanged, including child_system_prompt.

Install

Requires Python 3.11+. Note that pip install rlms installs the upstream package, not this fork.

pip install lm-repl

For development, install editable from a checkout:

uv pip install -e /path/to/lm-repl --no-deps

Verify you got the fork and not a stale upstream build:

python -c "import inspect; from lm_repl import RLM, SRLM; print('child_system_prompt' in inspect.signature(RLM.__init__).parameters)"

Quick start

from lm_repl import RLM

rlm = RLM(
    backend="openai",
    backend_kwargs={"model_name": "gpt-5-nano"},
    verbose=True,
)

print(rlm.completion("Print me the first 100 powers of two, each on a newline.").response)

For the orchestrator/worker split used in map-reduce style runs:

rlm = RLM(
    backend="openai",
    backend_kwargs={...},
    custom_system_prompt=ORCHESTRATOR_PROMPT,   # the root model plans and reduces
    child_system_prompt=WORKER_PROMPT,          # sub-call workers map over chunks
    child_max_iterations=5,
    max_concurrent_subcalls=4,
)

REPL environments

Non-isolated environments run code on the host (fine for benchmarking, not for untrusted prompts); isolated environments run in cloud sandboxes. Natively supported: local (default), ipython, docker, modal, prime, daytona, e2b.

rlm = RLM(
    environment="local",
    environment_kwargs={"max_output_chars": 500},
)

local: in-process exec with namespaced globals. max_output_chars truncates REPL stdout fed back to the model.
ipython (pip install 'lm-repl[ipython]'): real IPython session, in-process or in an ipykernel subprocess with hard cell timeouts.
docker: REPL inside a container (python:3.11-slim by default).
modal / prime / daytona / e2b: fully isolated cloud sandboxes; sub-calls are proxied back to the host.

Model providers

OpenAI, Anthropic, OpenRouter, and Portkey clients are included. Local models work through any OpenAI-compatible server (vLLM, llama-server); the fork's default_extra_body and same-role message merging exist specifically to make local serving smooth. See lm_repl/clients/ to add providers.

Trajectory metadata and logging

RLMChatCompletion.metadata holds the full trajectory (run config plus every iteration and sub-call) when a logger is attached. SRLM relies on this for confidence scoring, and spawns per-candidate loggers automatically.

from lm_repl import RLM
from lm_repl.logger import RLMLogger

logger = RLMLogger(log_dir="./logs")   # omit log_dir for in-memory only
rlm = RLM(..., logger=logger)

JSONL logs feed the bundled visualizer:

cd visualizer/
npm run dev   # default localhost:3001

Citations

This fork builds directly on two papers. The engine:

@misc{zhang2026recursivelanguagemodels,
      title={Recursive Language Models},
      author={Alex L. Zhang and Tim Kraska and Omar Khattab},
      year={2026},
      eprint={2512.24601},
      archivePrefix={arXiv},
      primaryClass={cs.AI},
      url={https://arxiv.org/abs/2512.24601},
}

The selection strategy:

@misc{alizadeh2026srlm,
      title={Recursive Language Models Meet Uncertainty: The Surprising Effectiveness of Self-Reflective Program Search for Long Context},
      author={Keivan Alizadeh and Parshin Shojaee and Minsik Cho and Mehrdad Farajtabar},
      year={2026},
      eprint={2603.15653},
      archivePrefix={arXiv},
      primaryClass={cs.AI},
      url={https://arxiv.org/abs/2603.15653},
}

Upstream documentation, blogpost, and minimal implementation: docs | blogpost | rlm-minimal.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.2.0

Jun 10, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lm_repl-0.2.0.tar.gz (125.3 kB view details)

Uploaded Jun 10, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

lm_repl-0.2.0-py3-none-any.whl (108.0 kB view details)

Uploaded Jun 10, 2026 Python 3

File details

Details for the file lm_repl-0.2.0.tar.gz.

File metadata

Download URL: lm_repl-0.2.0.tar.gz
Upload date: Jun 10, 2026
Size: 125.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.10.8 {"installer":{"name":"uv","version":"0.10.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for lm_repl-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`c1461ec5c63d0e6ce00dec4e8ed39d4f589dd8cc4c49e0a8992c719a2f3bf5ad`
MD5	`b9fb1c911a51884f325322941a163c03`
BLAKE2b-256	`3d0a6285dc13ee0b109b4693c66d0856cead163bde137ea6a9d5a1e09de2a7e8`

See more details on using hashes here.

File details

Details for the file lm_repl-0.2.0-py3-none-any.whl.

File metadata

Download URL: lm_repl-0.2.0-py3-none-any.whl
Upload date: Jun 10, 2026
Size: 108.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.10.8 {"installer":{"name":"uv","version":"0.10.8","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for lm_repl-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`c5b523b6fe9aa64fdaac28d548729374b8b5a3273cade78f17ffccc6489d6cf8`
MD5	`5fdb7c72c0eb952af9056080683b94f6`
BLAKE2b-256	`eb3d267a7f1a8d543279b46386c2b5e6f7e1e3071e5bc70e1685d8ba49031ffd`

See more details on using hashes here.

lm-repl 0.2.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

LM-REPL

Lineage

SRLM: uncertainty-guided trajectory selection

Install

Quick start

REPL environments

Model providers

Trajectory metadata and logging

Citations

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes