Skip to main content

Recursive LLM agent harness and CLI with a persistent IPython REPL

Project description

openrlm

A recursive language model (RLM) agent harness with persistent IPython REPL environments. Usable as a CLI, embedded in an existing harness, or as a library.

Each agent gets a stateful IPython environment where it can persist variables, define functions, and run computations across multiple turns. Agents can programmatically spawn sub-agents, each with their own isolated REPL, to arbitrary depth.

Why

Why RLM? The Recursive Language Model paper from MIT shows that recursive decomposition significantly improves performance on long-context and complex reasoning tasks. An agent that can spawn sub-agents to handle sub-problems — each with their own scratch space — outperforms flat agent loops.

Why this implementation? The original RLM implementation treats the user prompt as a variable the LLM greps and chunks. In practice, you want agents to operate on files and data in an application-specific context with custom tools. This implementation provides:

  • Custom host functions. Define tools (search, APIs, domain-specific operations) that execute on the host but appear as plain async Python functions inside the agent's REPL. Serialization is invisible to the LLM.
  • Persistent REPL state. Agents persist data to an IPython environment they can access across turns — variables, imports, and function definitions all survive between tool calls. Some ARC-AGI implementations demonstrated superior performance with this pattern, but lacked recursive sub-agents.
  • Cheap sub-agent spawning. Sub-agents are forked processes. The fork server pre-imports expensive packages (numpy, pandas, etc.), then calls gc.freeze() before forking. Children inherit all imported modules via copy-on-write pages, and gc.freeze() prevents the garbage collector from scanning those objects — which would dirty the pages and force real memory copies. The OS only allocates memory for new data each sub-agent creates. A single machine can support hundreds to thousands of concurrent sub-agents.

Architecture

openrlm has two layers: a core that handles single-message execution, and a harness that adds multi-turn state management on top.

Core: one-shot execution

The core takes a single message and runs a complete LLM↔REPL loop: the LLM emits a python tool call, the core executes it in a persistent IPython sandbox, returns the output, and repeats until the LLM responds with text. Everything — computation, file I/O, web requests, sub-agent orchestration — is Python code the LLM writes and runs through that single tool. Host functions you register appear as plain await fn(...) calls inside the sandbox. Sub-agent functions (create_agent, run_agent, await_result) work the same way — the LLM doesn’t know these are remote calls.

AgentRuntime owns the infrastructure: the fork server (process lifecycle), the host function server (HTTP bridge for custom tools), and sub-agent routing. The LLM client is pluggable — you provide any implementation of the LLMClient protocol. It routes sub-agent calls through flat lookup tables so agents at any nesting depth resolve to the correct session.

import asyncio
from openrlm import build_runtime

async def main():
    runtime = build_runtime(model="openai/gpt-5.2")

    async with runtime:
        session = await runtime.create_session("my-session")
        # One message in, one result out. The core handles the full LLM loop.
        result = await session.run_single("Compute the first 20 prime numbers")
        print(result)
        await runtime.close_session("my-session")

asyncio.run(main())

The fork server pre-imports expensive packages once, calls gc.freeze(), then forks a child process for each sandbox. Children share pre-imported module memory via OS-level copy-on-write. Host functions registered on the caller side are injected as async stubs into each sandbox — the code inside calls await my_function(...) and it transparently round-trips to the host via HTTP.

Two execution modes (both use the same TCP-based protocol):

  • Local (default): Fork server runs as a subprocess. No Docker required. The workspace directory defaults to cwd — files agents create appear on your filesystem.
  • Docker: Fork server runs inside a container for isolation. Host directories are exposed via bind mounts. Use --image to enable.

Built-in Agent Harness: multi-turn state management

For multi-turn conversations, call run_single() repeatedly on the same session. The harness manages what accumulates between turns:

  • Message history. Each run_single() appends the user message and final assistant response. REPL state (variables, imports, computed results) also persists.
  • Message compression. All messages from previous turns are preserved, but tool outputs are truncated to 20 lines / 1 KB. The current turn retains full tool call detail. The complete uncompressed history is available inside the REPL as _conversation_history.
  • Cancellation. Cancelling a turn (via asyncio.CancelledError or Ctrl-C) rolls back message history to the last consistent checkpoint. Sub-agent tasks are cancelled transitively.
async with runtime:
    session = await runtime.create_session("analysis")
    # Turn 1: agent loads data, stores DataFrame in a REPL variable
    response = await session.run_single("Load data.csv and show me the column names")
    print(response)  # "The file has columns: date, product, price, volume ..."

    # Turn 2: agent reuses the loaded DataFrame — no re-reading needed
    response = await session.run_single("What's the correlation between price and volume?")
    print(response)  # "The Pearson correlation is 0.73 ..."

    # Turn 3: agent builds on all prior computed state
    response = await session.run_single("Plot the top 5 outliers and save to outliers.png")
    print(response)  # "Saved outliers.png with 5 data points highlighted ..."
    await runtime.close_session("analysis")

The caller provides user messages and consumes response strings. Everything else — message accumulation, compression, tool execution, history sync — happens inside the Session. Each Session is independent; multiple Sessions can run concurrently on the same Runtime.

Custom Harness and Agent Implementations

If you need to manage message history yourself — injecting context between turns, forking conversations, external history storage — use session.run_turn(messages, user_message) instead of run_single. You construct the message list starting with session.system_message, pass it to each turn, and freely modify it between turns. The engine borrows the list during a turn and returns it enriched. run_single is a convenience wrapper that uses an engine-internal list.

async with runtime:
    session = await runtime.create_session("analysis")
    messages = [session.system_message]

    result = await session.run_turn(messages, "Load data.csv")
    print(result)

    # Inject context between turns
    messages.append({"role": "user", "content": "(Note: focus on Q4 data)"})
    messages.append({"role": "assistant", "content": "Understood."})

    result = await session.run_turn(messages, "Summarize revenue trends")
    print(result)

Installation

pip install openrlm
# or
uv pip install openrlm

To use the bundled internet search/extract tools:

pip install openrlm[contrib]

To use Docker mode, build the sandbox image:

openrlm --build-image

This builds openrlm:sandbox using sandbox-deps.txt if present in the current directory. To customize:

# Custom tag
openrlm --build-image my-image:latest

# Custom dependencies file
openrlm --build-image --sandbox-deps my-deps.txt

Quickstart

CLI

# Single message (local mode, default)
openrlm "compute the first 20 prime numbers"

# Interactive session
openrlm

# With a specific model (routed through OpenRouter)
openrlm --model anthropic/claude-sonnet-4-5 "explain main.py"

# With custom tools
openrlm --functions ./my-tools "use my_search to find X"

# With bundled contrib tools (requires PARALLEL_API_KEY)
openrlm --functions ./contrib "search for recent advances in fusion energy"

# Docker mode
openrlm --image openrlm:sandbox "analyze data"

# JSON output for programmatic use
openrlm --json "compute pi to 50 digits" | jq .result

# With conversation context from a prior session
openrlm --context history.json "continue the analysis"

Library

build_runtime() is the main entry point for programmatic use. It handles LLM client selection, API key resolution, and host function loading — the same wiring the CLI does internally. Its keyword arguments correspond to the CLI flags:

import asyncio
from openrlm import build_runtime

async def main():
    runtime = build_runtime(
        provider="anthropic",
        model="claude-sonnet-4-5",
        functions=["./my-tools"],
    )

    async with runtime:
        session = await runtime.create_session("s1", on_event=my_handler)
        result = await session.run_single("analyze this dataset")
        print(result)

        # Same session, same REPL state — variables from turn 1 persist
        result = await session.run_single("now visualize the outliers")
        print(result)

        await runtime.close_session("s1")

asyncio.run(main())

When you need more control — a custom LLMClient implementation, programmatic host function registration, or non-default AgentConfig settings — construct the AgentRuntime directly:

from openrlm import (
    AgentRuntime, AgentConfig, HostFunctionRegistry,
    AnthropicClient, default_api_key_resolver,
)

registry = HostFunctionRegistry()
registry.register("my_tool", my_async_function)

resolver = default_api_key_resolver()
config = AgentConfig(
    model="claude-sonnet-4-5-20250514",
    get_api_key=lambda: resolver("anthropic"),
    max_tool_rounds=100,
    max_sub_agent_depth=5,
)

runtime = AgentRuntime(config, registry, llm_client=AnthropicClient())

This is what build_runtime does internally. See the LLM Client section for implementing custom providers.

Custom Host Functions

Define tools that execute on the host but appear as regular async functions inside the agent's REPL.

Library usage

For functions loaded from module files, build_runtime handles registration:

runtime = build_runtime(functions=["my_tools.py", "./more-tools/"])

For programmatic registration (e.g., closures that capture application state), create the registry directly:

import json
from openrlm import AgentRuntime, AgentConfig, HostFunctionRegistry

async def my_database_query(sql: str, limit: int = 100) -> str:
    """Execute a SQL query against the application database.
    Use this function to make DB queries, like result = await my_database_query(sql="SELECT * FROM users", limit=10)
    Returns results as a JSON string."""
    results = await db.execute(sql, limit=limit)
    return json.dumps(results)

registry = HostFunctionRegistry()
registry.register("my_database_query", my_database_query)

# The registry is passed to the runtime, which injects the functions into every agent's REPL
runtime = AgentRuntime(config, registry, llm_client=client)

Inside the agent's REPL, the function becomes callable as:

result = await my_database_query(sql="SELECT * FROM users", limit=10)

The function's type hints and docstring are picked up automatically — Pydantic builds a JSON schema from the signature for the system prompt, and the docstring becomes the description the LLM sees. No separate schema definitions needed.

CLI usage (--functions)

When using the CLI, you don't create a registry yourself — the CLI creates one and needs a way to discover your functions. You provide a Python file (or directory of files) that exports a register(registry) function. The CLI calls it, passing its own HostFunctionRegistry instance:

# my_tools.py
import json

async def my_database_query(sql: str, limit: int = 100) -> str:
    """Execute a SQL query against the application database.

    Returns results as a JSON string."""
    results = await db.execute(sql, limit=limit)
    return json.dumps(results)

def register(registry):
    """Called by the CLI with its HostFunctionRegistry. Register your functions here."""
    registry.register("my_database_query", my_database_query)

Then:

openrlm --functions my_tools.py "show me the top 10 users"

For a directory of tool files, each .py file with a register() function is loaded automatically (files starting with _ are skipped):

openrlm --functions ./my-tools/ "analyze the data"

You can also use a dotted module name for installed packages:

openrlm --functions my_package.tools "do something"

Event Streaming

Monitor agent activity with event callbacks. Events from sub-agents at any depth flow through the same callback, distinguished by agent_id:

from openrlm import build_runtime, EventCallback
from openrlm.events import RoundStart, ToolExecEnd, TurnEnd

def on_event(event):
    match event:
        case RoundStart(agent_id=aid, round_num=n):
            print(f"[{aid}] Round {n}")
        case ToolExecEnd(agent_id=aid, elapsed_seconds=t):
            print(f"[{aid}] Tool execution: {t:.1f}s")
        case TurnEnd(agent_id=aid, rounds=r, prompt_tokens=pt, completion_tokens=ct):
            print(f"[{aid}] Done in {r} rounds, {pt}+{ct} tokens")

async with runtime:
    session = await runtime.create_session("s1", on_event=on_event)
    await session.run_single("analyze this dataset")

The on_event parameter accepts any EventCallback (Callable[[AgentEvent], None]). For multiple consumers or async I/O, use EventBus:

from openrlm import EventBus

bus = EventBus()
bus.add_listener(tui.update_panel)       # sync: immediate UI update
bus.add_listener(metrics.record_event)   # sync: bookkeeping

# Async consumers get an independent stream
stream = bus.stream(maxsize=256)

session = await runtime.create_session("s1", on_event=bus.callback)

# Consume asynchronously in a background task
async def push_events():
    async for event in stream:
        await websocket.send(serialize(event))
asyncio.create_task(push_events())

result = await session.run_single("analyze data")
bus.close()  # terminates async iteration

Each listener and stream is independent — a slow or failing consumer does not affect the engine or other consumers.

Sub-agents

Agents can spawn sub-agents programmatically from within the REPL:

# Create a sub-agent with specific instructions
agent_id = await create_agent(instructions="You are a citation specialist")

# Start a task (non-blocking — runs in the background)
task_id = await run_agent(agent_id=agent_id, task="Research citation percentiles for federal courts")

# Do other work while sub-agent runs...

# Collect the result
result = await await_result(task_id)

Sub-agents can themselves spawn sub-agents, enabling recursive decomposition. Each sub-agent has:

  • Its own isolated IPython namespace
  • Its own conversation history with the LLM
  • Access to the same host functions and shared workspace directory
  • A per-agent lock that serializes concurrent tasks on the same sub-agent

Persistent sub-agents. A sub-agent created with create_agent persists across multiple run_agent calls. Each task appends a new user message and runs a full agent turn, so the sub-agent sees its full prior conversation and retains all REPL state (variables, imports, computed data) from previous tasks. This makes sub-agents useful as persistent specialists:

analyst = await create_agent(instructions="You are a data analyst")

t1 = await run_agent(agent_id=analyst, task="Load sales.csv and compute monthly totals")
await await_result(t1)

# The analyst still has the loaded data and computed totals in its REPL
t2 = await run_agent(agent_id=analyst, task="Now find the month-over-month growth rate")
growth = await await_result(t2)

The maximum recursion depth is configurable (default: 10 levels).

Configuration

AgentConfig

build_runtime() constructs an AgentConfig internally from its keyword arguments. Direct AgentConfig construction is only needed when building the AgentRuntime manually.

Parameter Default Description
model "openai/gpt-5.2" Model identifier
sandbox_image None Docker image tag; None for local mode
code_timeout 3600.0 Code execution timeout in seconds
max_tool_rounds 50 Max LLM-tool iterations per turn
max_sub_agent_depth 10 Max recursive sub-agent depth
output_limit_lines 2000 Truncate tool output beyond this many lines
output_limit_bytes 50000 Truncate tool output beyond this many bytes
temperature None LLM sampling temperature
system_prompt None Override the default system prompt (format string with {functions_json}, {workspace_path}, {spool_path} placeholders)
get_api_key None Callable[[], Awaitable[str]] that returns an API key; required when using AgentRuntime
sandbox_binds {} Host-to-container directory mounts (Docker mode)
task_preview_chars 12000 Max characters of a sub-agent task shown in system prompt previews

API Key Resolution

AgentConfig.get_api_key is caller-provided. The bundled default_api_key_resolver() checks these sources in order:

  1. Auth file (~/.openrlm/auth.json, override with OPENRLM_AUTH_FILE): a JSON object mapping provider names to keys.
  2. ANTHROPIC_OAUTH_TOKEN (Anthropic only, legacy compatibility).
  3. Provider-specific environment variable: | Provider | Environment Variable | |---|---| | openrouter | OPENROUTER_API_KEY | | anthropic | ANTHROPIC_API_KEY | | openai | OPENAI_API_KEY | | google | GEMINI_API_KEY | | groq | GROQ_API_KEY | | xai | XAI_API_KEY | | mistral | MISTRAL_API_KEY | | openai-codex | OPENAI_CODEX_TOKEN |

A .env file in the current directory is loaded automatically.

For the bundled contrib tools (internet_search, internet_extract), set PARALLEL_API_KEY.

CLI Flags

openrlm [message] [options]

positional:
  message                 User message (omit for interactive session)

options:
  --model MODEL           Model identifier (default: openai/gpt-5.2)
  --provider PROVIDER     LLM provider (default: openrouter)
  --image IMAGE           Docker image tag for sandbox (omit for local mode)
  --timeout SECONDS       Code execution timeout (default: 3600)
  --max-rounds N          Max tool loop iterations (default: 50)
  --functions PATH        Directory, .py file, or dotted module name (comma-separated)
  --workspace DIR         Working directory shared with agents (default: cwd)
  --context FILE          JSON file with conversation history to prepend
  --json                  Output result as JSON object
  --verbose               Enable debug logging
  --env-file PATH         Path to .env file (default: .env)
  --log-file PATH         Log file path (default: ~/Downloads/openrlm.log)
  --build-image [TAG]   Build Docker sandbox image and exit (default: openrlm:sandbox)
  --sandbox-deps FILE   Dependencies file for --build-image (default: sandbox-deps.txt)
  --reasoning-effort E  Reasoning effort for Codex models: none, minimal, low, medium, high, xhigh (default: medium)
  --text-verbosity V    Text verbosity for Codex models: low, medium, high (default: medium)

--context FILE

Prepends conversation history after the system prompt. The file must contain a JSON array of messages:

[
  {"role": "user", "content": "I'm analyzing sales data"},
  {"role": "assistant", "content": "I see the file has 10k rows with date, product, and revenue columns."}
]

Only "user" and "assistant" roles are allowed. This is useful for context bridging from an outer harness — pass a filtered conversation history so the agent understands what's been discussed.

--json

Wraps the result in a JSON object for programmatic consumption. Only valid with a message argument (not interactive mode).

// Success
{"result": "The answer is 42", "error": null}

// Failure
{"result": null, "error": "No API key for provider 'openrouter'. Set the OPENROUTER_API_KEY environment variable or add it to ~/.openrlm/auth.json."}

Exactly one of result or error is non-null.

Interactive mode

  • Ctrl-C cancels the active turn and returns to the prompt.
  • Double Ctrl-C exits the session.

Bundled Tools

The contrib/ directory includes two pre-built host functions that use the Parallel API:

  • internet_search — Search the web and return relevant excerpts with source URLs.
  • internet_extract — Fetch a web page or PDF and return its content as markdown.

Both require PARALLEL_API_KEY in the environment and the contrib extra (pip install openrlm[contrib]).

openrlm --functions ./contrib "search for recent papers on transformer efficiency"

LLM Client

openrlm ships with two bundled client implementations:

  • OpenRouterClient — routes to models from OpenAI, Anthropic, Google, and others through a single API.
  • AnthropicClient — calls the Anthropic API directly.

build_runtime(provider="anthropic") or build_runtime(provider="openrouter") selects the appropriate client automatically. Manual client construction is only needed for custom LLMClient implementations.

To implement a custom provider:

from openrlm import AgentRuntime, AgentConfig, HostFunctionRegistry
from openrlm import LLMClient, CompletionResponse, CompletionChoice, CompletionMessage, TokenUsage, default_api_key_resolver

class MyCustomClient:
    """Example: implement LLMClient for a provider not built in."""

    async def complete(self, messages, *, api_key, **kwargs) -> CompletionResponse:
        # Call your provider's API, then translate the response:
        return CompletionResponse(
            model="my-model",
            choices=[CompletionChoice(
                message=CompletionMessage(content="...", tool_calls=None),
                finish_reason="stop",
            )],
            usage=TokenUsage(prompt_tokens=0, completion_tokens=0),
        )

    async def close(self) -> None:
        pass  # Release any resources

async def main():
    client = MyCustomClient()
    resolver = default_api_key_resolver()
    config = AgentConfig(get_api_key=lambda: resolver("my-provider"))
    runtime = AgentRuntime(config, HostFunctionRegistry(), llm_client=client)

    async with runtime:
        session = await runtime.create_session("s1")
        result = await session.run_single("What is 2 + 2?")
        print(result)
        await runtime.close_session("s1")
    # Runtime closes the LLM client on exit.

Development

# Clone and install
git clone <repo-url>
cd openrlm
uv sync

# Run tests (requires Docker for full suite)
uv run python tests/test_e2e.py

# Build sandbox image (for Docker mode tests)
openrlm --build-image

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

openrlm-0.1.0.tar.gz (84.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

openrlm-0.1.0-py3-none-any.whl (61.9 kB view details)

Uploaded Python 3

File details

Details for the file openrlm-0.1.0.tar.gz.

File metadata

  • Download URL: openrlm-0.1.0.tar.gz
  • Upload date:
  • Size: 84.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for openrlm-0.1.0.tar.gz
Algorithm Hash digest
SHA256 96187612d9c9c2cec268c2d9bbd9dc1f77ed5cdbe7f22f8e2302b5d0dfc67f4c
MD5 481557afce5f9c858ccc4169c2289ae3
BLAKE2b-256 b3bfdff252dd0e026bb9a08f3ffff859565b41aac0de6ab7fab49c3a65dd708b

See more details on using hashes here.

File details

Details for the file openrlm-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: openrlm-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 61.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for openrlm-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 85ccb39a6eb941f1e396f8140323cc686cdd78abc16212d98a726b53af581c9c
MD5 d6584875f0e91e961bd3aa09563a8bcc
BLAKE2b-256 f8e0fdd80420478cab4a695a801c7aa1d06399fee4c098642b4b394e52bfa28c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page