Recursive LLM agent harness and CLI with a persistent IPython REPL
Project description
openrlm
A recursive language model (RLM) agent harness with persistent IPython REPL environments. Usable as a CLI, embedded in an existing harness, or as a library.
Each agent gets a stateful IPython environment where it can persist variables, define functions, and run computations across multiple turns. Agents can programmatically spawn sub-agents, each with their own isolated REPL, to arbitrary depth.
Why
Why RLM? The Recursive Language Model paper from MIT shows that recursive decomposition significantly improves performance on long-context and complex reasoning tasks. An agent that can spawn sub-agents to handle sub-problems — each with their own scratch space — outperforms flat agent loops.
Why this implementation? The original RLM implementation treats the user prompt as a variable the LLM greps and chunks. In practice, you want agents to operate on files and data in an application-specific context with custom tools. This implementation provides:
- Custom host functions. Define tools (search, APIs, domain-specific operations) that execute on the host but appear as plain async Python functions inside the agent's REPL. Serialization is invisible to the LLM.
- Persistent REPL state. Agents persist data to an IPython environment they can access across turns — variables, imports, and function definitions all survive between tool calls. Some ARC-AGI implementations demonstrated superior performance with this pattern, but lacked recursive sub-agents.
- Cheap sub-agent spawning. Sub-agents are forked processes. The fork server pre-imports expensive packages (numpy, pandas, etc.), then calls
gc.freeze()before forking. Children inherit all imported modules via copy-on-write pages, andgc.freeze()prevents the garbage collector from scanning those objects — which would dirty the pages and force real memory copies. The OS only allocates memory for new data each sub-agent creates. A single machine can support hundreds to thousands of concurrent sub-agents.
Architecture
openrlm has two layers: a core that handles single-message execution, and a harness that adds multi-turn state management on top.
Core: one-shot execution
The core takes a single message and runs a complete LLM↔REPL loop: the LLM emits a python tool call, the core executes it in a persistent IPython sandbox, returns the output, and repeats until the LLM responds with text. Everything — computation, file I/O, web requests, sub-agent orchestration — is Python code the LLM writes and runs through that single tool. Host functions you register appear as plain await fn(...) calls inside the sandbox. Sub-agent functions (create_agent, run_agent, await_result) work the same way — the LLM doesn’t know these are remote calls.
AgentRuntime owns the infrastructure: the fork server (process lifecycle), the host function server (HTTP bridge for custom tools), and sub-agent routing. The LLM client is pluggable — you provide any implementation of the LLMClient protocol. It routes sub-agent calls through flat lookup tables so agents at any nesting depth resolve to the correct session.
import asyncio
from openrlm import build_runtime
async def main():
runtime = build_runtime(model="openai/gpt-5.2")
async with runtime:
session = await runtime.create_session("my-session")
# One message in, one result out. The core handles the full LLM loop.
result = await session.run_single("Compute the first 20 prime numbers")
print(result)
await runtime.close_session("my-session")
asyncio.run(main())
The fork server pre-imports expensive packages once, calls gc.freeze(), then forks a child process for each sandbox. Children share pre-imported module memory via OS-level copy-on-write. Host functions registered on the caller side are injected as async stubs into each sandbox — the code inside calls await my_function(...) and it transparently round-trips to the host via HTTP.
Two execution modes (both use the same TCP-based protocol):
- Local (default): Fork server runs as a subprocess. No Docker required. The workspace directory defaults to cwd — files agents create appear on your filesystem.
- Docker: Fork server runs inside a container for isolation. Host directories are exposed via bind mounts. Use
--imageto enable.
Built-in Agent Harness: multi-turn state management
For multi-turn conversations, call run_single() repeatedly on the same session. The harness manages what accumulates between turns:
- Message history. Each
run_single()appends the user message and final assistant response. REPL state (variables, imports, computed results) also persists. - Message compression. All messages from previous turns are preserved, but tool outputs are truncated to 20 lines / 1 KB. The current turn retains full tool call detail. The complete uncompressed history is available inside the REPL as
_conversation_history. - Cancellation. Cancelling a turn (via
asyncio.CancelledErroror Ctrl-C) rolls back message history to the last consistent checkpoint. Sub-agent tasks are cancelled transitively.
async with runtime:
session = await runtime.create_session("analysis")
# Turn 1: agent loads data, stores DataFrame in a REPL variable
response = await session.run_single("Load data.csv and show me the column names")
print(response) # "The file has columns: date, product, price, volume ..."
# Turn 2: agent reuses the loaded DataFrame — no re-reading needed
response = await session.run_single("What's the correlation between price and volume?")
print(response) # "The Pearson correlation is 0.73 ..."
# Turn 3: agent builds on all prior computed state
response = await session.run_single("Plot the top 5 outliers and save to outliers.png")
print(response) # "Saved outliers.png with 5 data points highlighted ..."
await runtime.close_session("analysis")
The caller provides user messages and consumes response strings. Everything else — message accumulation, compression, tool execution, history sync — happens inside the Session. Each Session is independent; multiple Sessions can run concurrently on the same Runtime.
Custom Harness and Agent Implementations
If you need to manage message history yourself — injecting context between turns, forking conversations, external history storage — use session.run_turn(messages, user_message) instead of run_single. You construct the message list starting with session.system_message, pass it to each turn, and freely modify it between turns. The engine borrows the list during a turn and returns it enriched. run_single is a convenience wrapper that uses an engine-internal list.
async with runtime:
session = await runtime.create_session("analysis")
messages = [session.system_message]
result = await session.run_turn(messages, "Load data.csv")
print(result)
# Inject context between turns
messages.append({"role": "user", "content": "(Note: focus on Q4 data)"})
messages.append({"role": "assistant", "content": "Understood."})
result = await session.run_turn(messages, "Summarize revenue trends")
print(result)
Installation
pip install openrlm
# or
uv pip install openrlm
To use the bundled internet search/extract tools:
pip install openrlm[contrib]
To use Docker mode, build the sandbox image:
openrlm --build-image
This builds openrlm:sandbox using sandbox-deps.txt if present in the current directory. To customize:
# Custom tag
openrlm --build-image my-image:latest
# Custom dependencies file
openrlm --build-image --sandbox-deps my-deps.txt
Quickstart
CLI
# Single message (local mode, default)
openrlm "compute the first 20 prime numbers"
# Interactive session
openrlm
# With a specific model (routed through OpenRouter)
openrlm --model anthropic/claude-sonnet-4-5 "explain main.py"
# With custom tools
openrlm --functions ./my-tools "use my_search to find X"
# With bundled contrib tools (requires PARALLEL_API_KEY)
openrlm --functions ./contrib "search for recent advances in fusion energy"
# Docker mode
openrlm --image openrlm:sandbox "analyze data"
# JSON output for programmatic use
openrlm --json "compute pi to 50 digits" | jq .result
# With conversation context from a prior session
openrlm --context history.json "continue the analysis"
Library
build_runtime() is the main entry point for programmatic use. It handles LLM client selection, API key resolution, and host function loading — the same wiring the CLI does internally. Its keyword arguments correspond to the CLI flags:
import asyncio
from openrlm import build_runtime
async def main():
runtime = build_runtime(
provider="anthropic",
model="claude-sonnet-4-5",
functions=["./my-tools"],
)
async with runtime:
session = await runtime.create_session("s1", on_event=my_handler)
result = await session.run_single("analyze this dataset")
print(result)
# Same session, same REPL state — variables from turn 1 persist
result = await session.run_single("now visualize the outliers")
print(result)
await runtime.close_session("s1")
asyncio.run(main())
When you need more control — a custom LLMClient implementation, programmatic host function registration, or non-default AgentConfig settings — construct the AgentRuntime directly:
from openrlm import (
AgentRuntime, AgentConfig, HostFunctionRegistry,
AnthropicClient, default_api_key_resolver,
)
registry = HostFunctionRegistry()
registry.register("my_tool", my_async_function)
resolver = default_api_key_resolver()
config = AgentConfig(
model="claude-sonnet-4-5-20250514",
get_api_key=lambda: resolver("anthropic"),
max_tool_rounds=100,
max_sub_agent_depth=5,
)
runtime = AgentRuntime(config, registry, llm_client=AnthropicClient())
This is what build_runtime does internally. See the LLM Client section for implementing custom providers.
Custom Host Functions
Define tools that execute on the host but appear as regular async functions inside the agent's REPL.
Library usage
For functions loaded from module files, build_runtime handles registration:
runtime = build_runtime(functions=["my_tools.py", "./more-tools/"])
For programmatic registration (e.g., closures that capture application state), create the registry directly:
import json
from openrlm import AgentRuntime, AgentConfig, HostFunctionRegistry
async def my_database_query(sql: str, limit: int = 100) -> str:
"""Execute a SQL query against the application database.
Use this function to make DB queries, like result = await my_database_query(sql="SELECT * FROM users", limit=10)
Returns results as a JSON string."""
results = await db.execute(sql, limit=limit)
return json.dumps(results)
registry = HostFunctionRegistry()
registry.register("my_database_query", my_database_query)
# The registry is passed to the runtime, which injects the functions into every agent's REPL
runtime = AgentRuntime(config, registry, llm_client=client)
Inside the agent's REPL, the function becomes callable as:
result = await my_database_query(sql="SELECT * FROM users", limit=10)
The function's type hints and docstring are picked up automatically — Pydantic builds a JSON schema from the signature for the system prompt, and the docstring becomes the description the LLM sees. No separate schema definitions needed.
CLI usage (--functions)
When using the CLI, you don't create a registry yourself — the CLI creates one and needs a way to discover your functions. You provide a Python file (or directory of files) that exports a register(registry) function. The CLI calls it, passing its own HostFunctionRegistry instance:
# my_tools.py
import json
async def my_database_query(sql: str, limit: int = 100) -> str:
"""Execute a SQL query against the application database.
Returns results as a JSON string."""
results = await db.execute(sql, limit=limit)
return json.dumps(results)
def register(registry):
"""Called by the CLI with its HostFunctionRegistry. Register your functions here."""
registry.register("my_database_query", my_database_query)
Then:
openrlm --functions my_tools.py "show me the top 10 users"
For a directory of tool files, each .py file with a register() function is loaded automatically (files starting with _ are skipped):
openrlm --functions ./my-tools/ "analyze the data"
You can also use a dotted module name for installed packages:
openrlm --functions my_package.tools "do something"
Event Streaming
Monitor agent activity with event callbacks. Events from sub-agents at any depth flow through the same callback, distinguished by agent_id:
from openrlm import build_runtime, EventCallback
from openrlm.events import RoundStart, ToolExecEnd, TurnEnd
def on_event(event):
match event:
case RoundStart(agent_id=aid, round_num=n):
print(f"[{aid}] Round {n}")
case ToolExecEnd(agent_id=aid, elapsed_seconds=t):
print(f"[{aid}] Tool execution: {t:.1f}s")
case TurnEnd(agent_id=aid, rounds=r, prompt_tokens=pt, completion_tokens=ct):
print(f"[{aid}] Done in {r} rounds, {pt}+{ct} tokens")
async with runtime:
session = await runtime.create_session("s1", on_event=on_event)
await session.run_single("analyze this dataset")
The on_event parameter accepts any EventCallback (Callable[[AgentEvent], None]). For multiple consumers or async I/O, use EventBus:
from openrlm import EventBus
bus = EventBus()
bus.add_listener(tui.update_panel) # sync: immediate UI update
bus.add_listener(metrics.record_event) # sync: bookkeeping
# Async consumers get an independent stream
stream = bus.stream(maxsize=256)
session = await runtime.create_session("s1", on_event=bus.callback)
# Consume asynchronously in a background task
async def push_events():
async for event in stream:
await websocket.send(serialize(event))
asyncio.create_task(push_events())
result = await session.run_single("analyze data")
bus.close() # terminates async iteration
Each listener and stream is independent — a slow or failing consumer does not affect the engine or other consumers.
Sub-agents
Agents can spawn sub-agents programmatically from within the REPL:
# Create a sub-agent with specific instructions
agent_id = await create_agent(instructions="You are a citation specialist")
# Start a task (non-blocking — runs in the background)
task_id = await run_agent(agent_id=agent_id, task="Research citation percentiles for federal courts")
# Do other work while sub-agent runs...
# Collect the result
result = await await_result(task_id)
Sub-agents can themselves spawn sub-agents, enabling recursive decomposition. Each sub-agent has:
- Its own isolated IPython namespace
- Its own conversation history with the LLM
- Access to the same host functions and shared workspace directory
- A per-agent lock that serializes concurrent tasks on the same sub-agent
Persistent sub-agents. A sub-agent created with create_agent persists across multiple run_agent calls. Each task appends a new user message and runs a full agent turn, so the sub-agent sees its full prior conversation and retains all REPL state (variables, imports, computed data) from previous tasks. This makes sub-agents useful as persistent specialists:
analyst = await create_agent(instructions="You are a data analyst")
t1 = await run_agent(agent_id=analyst, task="Load sales.csv and compute monthly totals")
await await_result(t1)
# The analyst still has the loaded data and computed totals in its REPL
t2 = await run_agent(agent_id=analyst, task="Now find the month-over-month growth rate")
growth = await await_result(t2)
The maximum recursion depth is configurable (default: 10 levels).
Configuration
AgentConfig
build_runtime() constructs an AgentConfig internally from its keyword arguments. Direct AgentConfig construction is only needed when building the AgentRuntime manually.
| Parameter | Default | Description |
|---|---|---|
model |
"openai/gpt-5.2" |
Model identifier |
sandbox_image |
None |
Docker image tag; None for local mode |
code_timeout |
3600.0 |
Code execution timeout in seconds |
max_tool_rounds |
50 |
Max LLM-tool iterations per turn |
max_sub_agent_depth |
10 |
Max recursive sub-agent depth |
output_limit_lines |
2000 |
Truncate tool output beyond this many lines |
output_limit_bytes |
50000 |
Truncate tool output beyond this many bytes |
temperature |
None |
LLM sampling temperature |
system_prompt |
None |
Override the default system prompt (format string with {functions_json}, {workspace_path}, {spool_path} placeholders) |
get_api_key |
None |
Callable[[], Awaitable[str]] that returns an API key; required when using AgentRuntime |
sandbox_binds |
{} |
Host-to-container directory mounts (Docker mode) |
task_preview_chars |
12000 |
Max characters of a sub-agent task shown in system prompt previews |
API Key Resolution
AgentConfig.get_api_key is caller-provided. The bundled default_api_key_resolver() checks these sources in order:
- Auth file (
~/.openrlm/auth.json, override withOPENRLM_AUTH_FILE): a JSON object mapping provider names to keys. ANTHROPIC_OAUTH_TOKEN(Anthropic only, legacy compatibility).- Provider-specific environment variable:
| Provider | Environment Variable |
|---|---|
|
openrouter|OPENROUTER_API_KEY| |anthropic|ANTHROPIC_API_KEY| |openai|OPENAI_API_KEY| |google|GEMINI_API_KEY| |groq|GROQ_API_KEY| |xai|XAI_API_KEY| |mistral|MISTRAL_API_KEY| |openai-codex|OPENAI_CODEX_TOKEN|
A .env file in the current directory is loaded automatically.
For the bundled contrib tools (internet_search, internet_extract), set PARALLEL_API_KEY.
CLI Flags
openrlm [message] [options]
positional:
message User message (omit for interactive session)
options:
--model MODEL Model identifier (default: openai/gpt-5.2)
--provider PROVIDER LLM provider (default: openrouter)
--image IMAGE Docker image tag for sandbox (omit for local mode)
--timeout SECONDS Code execution timeout (default: 3600)
--max-rounds N Max tool loop iterations (default: 50)
--functions PATH Directory, .py file, or dotted module name (comma-separated)
--workspace DIR Working directory shared with agents (default: cwd)
--context FILE JSON file with conversation history to prepend
--json Output result as JSON object
--verbose Enable debug logging
--env-file PATH Path to .env file (default: .env)
--log-file PATH Log file path (default: ~/Downloads/openrlm.log)
--build-image [TAG] Build Docker sandbox image and exit (default: openrlm:sandbox)
--sandbox-deps FILE Dependencies file for --build-image (default: sandbox-deps.txt)
--reasoning-effort E Reasoning effort for Codex models: none, minimal, low, medium, high, xhigh (default: medium)
--text-verbosity V Text verbosity for Codex models: low, medium, high (default: medium)
--context FILE
Prepends conversation history after the system prompt. The file must contain a JSON array of messages:
[
{"role": "user", "content": "I'm analyzing sales data"},
{"role": "assistant", "content": "I see the file has 10k rows with date, product, and revenue columns."}
]
Only "user" and "assistant" roles are allowed. This is useful for context bridging from an outer harness — pass a filtered conversation history so the agent understands what's been discussed.
--json
Wraps the result in a JSON object for programmatic consumption. Only valid with a message argument (not interactive mode).
// Success
{"result": "The answer is 42", "error": null}
// Failure
{"result": null, "error": "No API key for provider 'openrouter'. Set the OPENROUTER_API_KEY environment variable or add it to ~/.openrlm/auth.json."}
Exactly one of result or error is non-null.
Interactive mode
- Ctrl-C cancels the active turn and returns to the prompt.
- Double Ctrl-C exits the session.
Bundled Tools
The contrib/ directory includes two pre-built host functions that use the Parallel API:
internet_search— Search the web and return relevant excerpts with source URLs.internet_extract— Fetch a web page or PDF and return its content as markdown.
Both require PARALLEL_API_KEY in the environment and the contrib extra (pip install openrlm[contrib]).
openrlm --functions ./contrib "search for recent papers on transformer efficiency"
LLM Client
openrlm ships with two bundled client implementations:
- OpenRouterClient — routes to models from OpenAI, Anthropic, Google, and others through a single API.
- AnthropicClient — calls the Anthropic API directly.
build_runtime(provider="anthropic") or build_runtime(provider="openrouter") selects the appropriate client automatically. Manual client construction is only needed for custom LLMClient implementations.
To implement a custom provider:
from openrlm import AgentRuntime, AgentConfig, HostFunctionRegistry
from openrlm import LLMClient, CompletionResponse, CompletionChoice, CompletionMessage, TokenUsage, default_api_key_resolver
class MyCustomClient:
"""Example: implement LLMClient for a provider not built in."""
async def complete(self, messages, *, api_key, **kwargs) -> CompletionResponse:
# Call your provider's API, then translate the response:
return CompletionResponse(
model="my-model",
choices=[CompletionChoice(
message=CompletionMessage(content="...", tool_calls=None),
finish_reason="stop",
)],
usage=TokenUsage(prompt_tokens=0, completion_tokens=0),
)
async def close(self) -> None:
pass # Release any resources
async def main():
client = MyCustomClient()
resolver = default_api_key_resolver()
config = AgentConfig(get_api_key=lambda: resolver("my-provider"))
runtime = AgentRuntime(config, HostFunctionRegistry(), llm_client=client)
async with runtime:
session = await runtime.create_session("s1")
result = await session.run_single("What is 2 + 2?")
print(result)
await runtime.close_session("s1")
# Runtime closes the LLM client on exit.
Development
# Clone and install
git clone <repo-url>
cd openrlm
uv sync
# Run tests (requires Docker for full suite)
uv run python tests/test_e2e.py
# Build sandbox image (for Docker mode tests)
openrlm --build-image
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file openrlm-0.1.0.tar.gz.
File metadata
- Download URL: openrlm-0.1.0.tar.gz
- Upload date:
- Size: 84.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
96187612d9c9c2cec268c2d9bbd9dc1f77ed5cdbe7f22f8e2302b5d0dfc67f4c
|
|
| MD5 |
481557afce5f9c858ccc4169c2289ae3
|
|
| BLAKE2b-256 |
b3bfdff252dd0e026bb9a08f3ffff859565b41aac0de6ab7fab49c3a65dd708b
|
File details
Details for the file openrlm-0.1.0-py3-none-any.whl.
File metadata
- Download URL: openrlm-0.1.0-py3-none-any.whl
- Upload date:
- Size: 61.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
85ccb39a6eb941f1e396f8140323cc686cdd78abc16212d98a726b53af581c9c
|
|
| MD5 |
d6584875f0e91e961bd3aa09563a8bcc
|
|
| BLAKE2b-256 |
f8e0fdd80420478cab4a695a801c7aa1d06399fee4c098642b4b394e52bfa28c
|