Build Recursive Language Models as inspectable execution graphs.
Project description
rlmflow
A Python library for createing interactible, steppable graph Recursive Language Models.
Recursive Language Models are powerful systems -- capable of handling long-context tasks by spawning sub-agents with their own fresh context windows. However. RLMs get messy fast: parents spawn children, children spawn more children, which also can run for multiple steps, etc.
rlmflow turns the run into an explicit graph. Every query, action, observation, child call, wait, resume, and result is a typed, immutable node you can step, inspect, fork, and replay.
RLMs as Graphs
RLMs delegate subtasks to children, those children can delegate to their own children, and their results bubble back up. rlmflow represents this representation as a tree directory: every step inside an agent is a typed node and every delegation is an edge between agents.
For example, this RLM code:
h1 = delegate("search", "Find evidence", context=chunk_a)
h2 = delegate("verify", "Check the answer", context=chunk_b)
results = yield wait(h1, h2)
done(combine(results))
becomes this execution graph:
Query(root)
-> Action(root: delegate search + verify)
-> Supervising(root: waiting on search, verify)
-> Query(root.search)
-> Result(root.search)
-> Query(root.verify)
-> Result(root.verify)
-> Resume(root: search + verify results)
-> Result(root)
Install
pip install rlmflow # core
pip install rlmflow[openai] # + OpenAI client
pip install rlmflow[anthropic] # + Anthropic client
pip install rlmflow[viewer] # + Gradio viewer (plotly)
pip install rlmflow[image] # + static image / GIF export (kaleido)
pip install rlmflow[all] # all of the above
From source:
git clone https://github.com/shyamsn97/rlmflow && cd rlmflow
pip install -e .
Quick start
This example is all you need for a simple and interpretable recursive coding agent. see notebook
from rlmflow import OpenAIClient, RLMConfig, RLMFlow, Workspace
from rlmflow.runtime.local import LocalRuntime
from rlmflow.tools import FILE_TOOLS
from rlmflow.utils.trace import save_trace
from rlmflow.utils.viewer import open_viewer
workspace = Workspace.create("./myproject")
runtime = LocalRuntime(workspace=workspace)
# Sandbox agent code inside Docker instead: drop-in replacement,
# same interface. Build the image once with `docker build -t rlmflow:local .`
# from the repo root; see docs/runtimes.md and docs/security.md.
#
# from rlmflow.runtime.docker import DockerRuntime
# runtime = DockerRuntime("rlmflow:local", workspace=workspace)
runtime.register_tools(FILE_TOOLS)
agent = RLMFlow(
llm_client=OpenAIClient("gpt-5"),
runtime=runtime,
workspace=workspace,
config=RLMConfig(max_depth=2, max_iterations=30),
llm_clients={ # additional llm clients to be chosen to delegate
"fast": {
"model": OpenAIClient("gpt-5-mini"),
"description": "Cheap model for smaller subtasks",
},
},
)
query = "Build a python text-based adventure game with combat and inventory."
states = [agent.start(query)]
while not states[-1].finished:
states.append(agent.step(states[-1]))
print(states[-1].tree())
save_trace(states, "traces/run1")
open_viewer(states)
Drop-in LLMClient
RLMFlow implements LLMClient, so it is a drop-in replacement for any LLM.
def ask(llm: LLMClient, q: str) -> str:
return llm.chat([{"role": "user", "content": q}])
ask(OpenAIClient("gpt-4o-mini"), "2+2?") # one LLM call
ask(RLMFlow(llm_client=..., runtime=...), "2+2?") # full agent, same return type
Nest agents by passing one RLMFlow as another's llm_client.
Step and inspect
step(node) -> node' is one atomic graph transition. Every step returns a
new immutable Node, so the live tree is just state.tree():
state = agent.start(query)
while not state.finished:
state = agent.step(state)
print(state.tree())
root [supervising] {default}
├── root.scanner_auth [result] {fast} -> Found SQL injection in login.py
├── root.scanner_api [supervising] {default}
│ ├── root.scanner_api.chunk_0 [result] {fast} -> Clean
│ └── root.scanner_api.chunk_1 [result] {fast} -> Payment flow is safe
└── root.scanner_db [result] {fast} -> No issues found
Every transition follows the same shape:
Observation -> LLM -> Action -> Runtime -> Observation (REPL output)
-> done() -> Result (terminal answer)
-> wait() -> Supervising (waiting on children)
Supervising -> children done -> Resume -> LLM -> ...
Observation, Action, Supervising, Resume, and Result are all
typed Pydantic nodes. The graph is queryable in plain Python:
state.tree() # ASCII render
state.find("root.scanner_api") # one node by id or agent_id
state.path_to("root.scanner_api.chunk_1") # root → node ancestor chain
state.leaves() # every node with no children
state.errors() # every ErrorNode in the subtree
state.results() # every ResultNode in the subtree
state.where(type="action", agent_id="root") # kwargs match node attrs
state.where(lambda n: n.depth > 2) # or pass a predicate
state.model_dump_json() # full serialization
Checkpoint, branch, replay
Every node is a frozen Pydantic snapshot, so the whole run is data:
from rlmflow import Node
state.save(workspace.checkpoint_path)
# resume later, in another process, with a different model
state = Node.load(workspace.checkpoint_path)
agent = RLMFlow(llm_client=AnotherModel(), workspace=workspace, ...)
while not state.finished:
state = agent.step(state)
To branch into an isolated workspace with its own session, context, and working tree:
alt = workspace.fork(new_branch_id="repair", new_dir="./runs/repair")
alt_agent = RLMFlow(llm_client=..., workspace=alt, ...)
Or intervene mid-run by replacing a child node before the parent resumes —
see examples/showcase.py for checkpointing,
time travel, manual intervention, and gym-style stepping in one file.
Rich visualization
See notebook for a full showcase of vizualization utilities.
Because the run is a typed graph, every visualization is just a render of
that graph. The coding agent example
(examples/coding-agent/agent.py)
already exercises every option below — its saved trace under
examples/data/notebook-coding-agent/ is the source for the renders here.
Gradio viewer
open_viewer(states) launches a small browser app for stepping through a
saved trace — tree, summary, and raw node JSON side by side:
from rlmflow.utils.trace import load_trace
from rlmflow.utils.viewer import open_viewer
trace = load_trace("examples/data/notebook-coding-agent/trace")
open_viewer(trace.states)
Or from a checkpoint via the CLI: rlmflow view examples/data/notebook-coding-agent/trace.
Live terminal tree
rlmflow.utils.viz.live(agent, state) drives the step loop and renders a
Rich tree as nodes are produced. The boids run (Create a simple boids simulation in plain HTML and JavaScript, split each component into separate files) settles to:
root [result] {default:gpt-5} -> Boids simulation written to output/boids-simulation with modular JS (boid, simulation, renderer) and index.html entrypoint.
root.index_html [result] {fast:gpt-5-mini} -> ok
root.styles_css [result] {fast:gpt-5-mini} -> ok
root.boid_js [result] {fast:gpt-5-mini} -> ok
root.simulation_js [result] {fast:gpt-5-mini} -> ok
root.renderer_js [result] {fast:gpt-5-mini} -> ok
root.main_js [result] {fast:gpt-5-mini} -> ok
The same render is available offline as state.tree() on any node.
Filename-flavored agent ids (index.html → index_html) are sanitized
because . is the parent/child delimiter in the agent tree.
Static renders
rlmflow render <path> -f F writes a static visualization in any of:
mermaid # stateDiagram-v2 (default topology)
mermaid-flowchart # flowchart TD, better for wide trees
mermaid-sequence # sequenceDiagram of delegate / wait / resume
dot · d2 # Graphviz / D2 topology
tree · ascii-boxes # text trees
gantt-html # standalone HTML swimlane
report-md # full Markdown summary (tree + cost + result + errors)
code-log # every code block paired with its observation
error-summary # ErrorNode counts grouped by kind
tokens # one-line ASCII sparkline of cumulative tokens
html # self-contained interactive stepper, one slide per snapshot
image # single PNG/SVG/PDF of the topology snapshot
steps # one image per snapshot, written as step_NN.{png,svg,pdf}
rlmflow render examples/data/notebook-coding-agent/trace -f mermaid-flowchart
rlmflow render examples/data/notebook-coding-agent/trace -f gantt-html -o run.html
rlmflow render examples/data/notebook-coding-agent/trace -f report-md -o run.md
rlmflow render examples/data/notebook-coding-agent/trace -f tokens
GitHub renders mermaid inline, so the output drops straight into a doc.
The example below is the to_mermaid_flowchart(state) projection of the
boids run; it renders reliably across the GitHub-supported mermaid
versions:
flowchart TD
root["root<br/><i>result</i><br/>Boids simulation written to output/boids-simulation..."]:::result
root --> html["root.index_html<br/><i>result</i><br/>ok"]:::result
root --> css["root.styles_css<br/><i>result</i><br/>ok"]:::result
root --> boid["root.boid_js<br/><i>result</i><br/>ok"]:::result
root --> sim["root.simulation_js<br/><i>result</i><br/>ok"]:::result
root --> rend["root.renderer_js<br/><i>result</i><br/>ok"]:::result
root --> main["root.main_js<br/><i>result</i><br/>ok"]:::result
classDef result fill:#3fb95022,stroke:#3fb950,color:#c9d1d9;
Programmatic helpers
Everything the CLI does is one function call away:
from rlmflow.utils.export import to_mermaid, to_mermaid_flowchart, to_mermaid_sequence, to_dot, to_d2
from rlmflow.utils.viz import (
ascii_boxes, code_log, error_summary, message_stream, diff_system_prompts,
gantt, gantt_html, token_sparkline, budget_burndown, bench_table,
report_md, live, tee, slack_webhook, discord_webhook,
)
from rlmflow.utils.tracing import json_logs
print(token_sparkline(states)) # ▁▂▅█▂ 15820 tok over 7 steps
print(error_summary(state)) # ErrorNode counts grouped by kind
print(message_stream("root.boid_js", session)) # rendered transcript for one agent
print(report_md(states, title="run")) # full Markdown report
gantt_html(states, "run.html") # standalone HTML swimlane
json_logs(states, "run.jsonl") # one node per line
Image, GIF, and HTML exports
For blog posts, PR comments, papers, and CI artifacts, render the
graph straight to a PNG/SVG/PDF, an animated GIF, or a single
self-contained HTML stepper. Four public functions live in
rlmflow.utils, plus matching CLI verbs:
| Function | CLI verb | Output | Use case |
|---|---|---|---|
save_image(node, path) |
-f image |
one PNG/SVG/PDF | hero image of a finished run |
save_steps(states, dir/) |
-f steps |
step_NN.png per snapshot |
blog slideshow, paper figure series |
save_gif(states, path) |
(no verb yet) | animated GIF | quick preview / social posts |
save_html(states, path) |
-f html |
self-contained stepper (Plotly + CSS) | shareable URL-less artifact, PR comment |
Quick start:
from rlmflow.utils.trace import load_trace
from rlmflow.utils import save_image, save_steps, save_html, save_gif
trace = load_trace("examples/data/notebook-coding-agent/trace")
states = trace.states
save_image(states[-1], "trace_final.png") # single snapshot
save_steps(states, "frames/") # one PNG per step
save_html(states, "trace.html", title="run 1") # standalone stepper
save_gif(states, "trace.gif", duration=400) # animated GIF (~2.5 fps)
Or use the node shorthand (same defaults):
states[-1].save_image("trace_final.png")
states[-1].save_html("trace.html", states=states)
Why the scaling knobs exist
The on-screen Plotly figure is laid out for ~420 px tall, with 11 px markers and 10 px labels — sized to look right on a Jupyter cell. A naive 1800 px PNG export keeps those pixel sizes literal, so every marker shrinks to a speck and every label to a thread.
The save helpers compensate with three knobs:
| Knob | Default (image/steps/gif) | Default (html) | Effect |
|---|---|---|---|
element_mult |
3.0 |
2.0 |
Uniform multiplier on markers + edges + fonts. The simplest "make it bigger" knob. |
marker_mult |
(inherits) | (inherits) | Override just the marker size + edge width. Bump higher than text_mult on dense trees. |
text_mult |
(inherits) | (inherits) | Override just the label font size. Smaller text = fewer collisions when nodes are close. |
normalize_labels |
True |
True |
Force every label to bottom center so adjacent depths can't share a vertical band. |
The HTML stepper additionally defaults to height=720 (vs the
~420 px on-screen default) so its native marker sizes land in the
same proportion to the canvas as a save_image PNG.
element_mult is the lazy default; pass marker_mult and/or
text_mult to break the symmetry when labels are colliding even at
3× scale.
Recipes
Hero PNG of a finished run — defaults are tuned for this:
states[-1].save_image("hero.png")
# == save_image(states[-1], "hero.png", width=1800, height=1350,
# scale=2.0, element_mult=3.0, normalize_labels=True)
Blog slideshow with dense subtrees — fat markers, small labels,
square-ish canvas (the recipe behind docs/blog.md):
save_steps(
states,
"blog/frames/",
width=1600, height=1200, scale=2.0,
marker_mult=3.5, # fat node dots + edges
text_mult=2.2, # shrink labels so they don't collide
normalize_labels=True, # already the default — explicit for the reader
)
Standalone interactive stepper — drop into a PR comment or GitHub gist:
save_html(states, "stepper.html", title="needle haystack run")
The HTML output embeds Plotly from CDN, includes per-slide transcripts, and ships keyboard navigation (← / →) plus dot-style slide indicators. Open it in any browser, attach it to an email, upload it as a CI artifact — it works offline once the CDN script is cached.
Animated GIF — needs pip install rlmflow[image] pillow:
save_gif(
states,
"trace.gif",
duration=600, # ms per frame; lower = faster
loop=0, # 0 = forever; 1 = play once
width=1200, height=900,
element_mult=2.0,
)
From the CLI
Every knob above maps 1:1 to a CLI flag:
# blog slideshow recipe (matches the dense-tree recipe above)
rlmflow render examples/data/notebook-coding-agent/trace \
-f steps -o blog/frames/ \
--width 1600 --height 1200 --scale 2.0 \
--marker-mult 3.5 --text-mult 2.2
# self-contained interactive stepper
rlmflow render examples/data/notebook-coding-agent/trace \
-f html -o stepper.html --title "boids walkthrough"
# single hero PNG with default scaling
rlmflow render examples/data/notebook-coding-agent/trace \
-f image -o hero.png
# opt out of label normalization (matches Gradio viewer defaults)
rlmflow render examples/data/notebook-coding-agent/trace \
-f html -o stepper.html --no-normalize-labels
The CLI auto-picks element_mult=2.0 for -f html (so the live
stepper's native 14 px markers stay readable) and element_mult=3.0
for -f image / -f steps (where the much larger PNG canvas would
otherwise shrink markers to specks). Node sizes are uniform; token
counts stay in hover/details, not marker size. Override either with
--element-mult.
Dependencies
save_image/save_stepsneedkaleido. Install withpip install rlmflow[image]or justpip install kaleido.save_gifadditionally needsPillow(pip install rlmflow[image] pillow).save_htmlandrender_htmlhave no static-image dependency — they emit a single HTML file that embeds Plotly from CDN.
Examples
All examples share flags like --no-viz, --docker-image rlmflow:local,
--max-depth, and --max-iterations. See examples/README.md.
| Example | What it shows |
|---|---|
showcase.py |
Typed nodes, checkpoints, session persistence, intervention, gym-style stepping. |
drop_in_llm.py |
RLMFlow as an LLMClient. Nested agents. |
coding-agent/agent.py |
Interactive coding agent that writes and edits files. |
needle_haystack.py |
Needle-in-a-haystack across 500 files with custom tools and runtime_factory. |
summarizer.py |
Recursive map-reduce over a long document. |
view_demo.py |
Launch the Gradio viewer on a saved trace. |
notebooks/coding_agent.ipynb |
Build the agent, run the boids task end-to-end, open the interactive viewer. Source of examples/data/notebook-coding-agent/ — every other notebook reads from here. |
notebooks/viz_walkthrough.ipynb |
All 9 visualizations against the saved boids trace: inline tree, interactive viewer, topology renders (mermaid/dot/d2/sequence), step-indexed timeline, per-node detail (message_stream, diff_system_prompts), cost & reports, run-vs-run comparison, CLI equivalents. |
notebooks/node_basics.ipynb |
Node API tour — walk, find, path_to, filter (leaves/results/errors/where), diff snapshots, session access (FileSession.load, chain_to), event streaming with tee / json_logs. |
Benchmarks
A runnable RLM-vs-flat harness for OOLONG (long-context aggregation,
~250k tokens) lives under benchmarks/oolong/.
It mirrors Prime Intellect's reference environment but talks directly
to rlmflow instead of verifiers. Three modes — standard (one big
flat call), rlm (recursive scaffold), rlm_tips (recursive +
chunking hints) — across synth, synth_with_labels, and real
subsets, scored deterministically against the published gold answers.
python benchmarks/oolong/run.py --mode rlm --subset synth --limit 50
python benchmarks/oolong/aggregate.py --runs runs/oolong-*
See benchmarks/oolong/README.md for
flags, scoring details, and ablation scripts.
CLI
rlmflow view traces/run1/
rlmflow render checkpoint.json -f mermaid
rlmflow render traces/run1/ -f gantt-html -o run1.html
rlmflow render traces/run1/ -f html -o stepper.html
rlmflow render traces/run1/ -f steps -o frames/ --marker-mult 3.5 --text-mult 2.2
rlmflow render traces/run1/ -f image -o trace.png
rlmflow version
view and render accept a trace directory, trace.json, or checkpoint.
render -f accepts: mermaid, mermaid-flowchart, mermaid-sequence,
dot, d2, tree, ascii-boxes, gantt-html, report-md, code-log,
error-summary, tokens, html, image, steps — see the
Static renders table and Image, GIF, and HTML
exports for what each produces and the
scaling / label-normalization flags (--marker-mult, --text-mult,
--normalize-labels / --no-normalize-labels).
Todo
Docs
- Blog post: the long-form pitch — why recursive language models, why graphs over flat traces, full needle-in-a-haystack walkthrough with the same exports the CLI ships.
- Positioning: when to use rlmflow vs rlm-minimal, ypi, LangGraph, CrewAI, AutoGen, SWE-agent, Aider — decision matrix and per-framework comparisons.
- Observability: node fields and types, save/load traces, session/context layout, live tree, gantt, topology exports, Gradio viewer, CLI.
- Control: step loop, checkpoint, rewind, fork
(
Workspace.fork,CONTEXT.fork()),delegate(name, query, context), inline-first strategy, intervention, custom prompts, runtimes, tools. - Runtimes:
Runtimeprotocol, shipped runtimes (Local / Subprocess / Docker / Modal), writing your own. - Security: trust model, Docker isolation knobs, engine-level caps, proxied tools, approval gates.
- Changelog: release-by-release changes, including the
upcoming
delegate(...)mandatory-contextbreak.
References
- Recursive Language Models: the original RLM paper and implementation.
- rlm-minimal: the single-file reference rlmflow grew from.
- Scaling Managed Agents: Decoupling the brain from the hands: Anthropic's writeup on separating harness, session, and sandbox interfaces for long-horizon agents.
- ypi: recursive coding agent built
on Pi. Our session layout and much of the default prompt
(size-up → delegate → combine, guardrails, aggressive delegation) come
from ypi's
SYSTEM_PROMPT.md.
License
See LICENSE.
Citation
@misc{sudhakaran2025rlmflow,
author = {Sudhakaran, Shyam},
title = {rlmflow},
year = {2025},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/shyamsn97/rlmflow}},
}
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file rlmflow-0.2.0.tar.gz.
File metadata
- Download URL: rlmflow-0.2.0.tar.gz
- Upload date:
- Size: 5.0 MB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7f93602e2ffafa5bab6074d6679554778138745d99c85aa2a3eb512d7b09352e
|
|
| MD5 |
c3d2ecbe2e46c74d8e2d698390868455
|
|
| BLAKE2b-256 |
dce4f0c5dc12be2132d1f9e05e8a6cf07f699429be5ee771d32a20701dcd5ee3
|
Provenance
The following attestation bundles were made for rlmflow-0.2.0.tar.gz:
Publisher:
release.yml on shyamsn97/rlmflow
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
rlmflow-0.2.0.tar.gz -
Subject digest:
7f93602e2ffafa5bab6074d6679554778138745d99c85aa2a3eb512d7b09352e - Sigstore transparency entry: 1490699364
- Sigstore integration time:
-
Permalink:
shyamsn97/rlmflow@d7d1eb65b8b99d898a1ed57690b462eb46e2e431 -
Branch / Tag:
refs/tags/v0.2.0 - Owner: https://github.com/shyamsn97
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@d7d1eb65b8b99d898a1ed57690b462eb46e2e431 -
Trigger Event:
push
-
Statement type:
File details
Details for the file rlmflow-0.2.0-py3-none-any.whl.
File metadata
- Download URL: rlmflow-0.2.0-py3-none-any.whl
- Upload date:
- Size: 99.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a1badf1d3f91ad992c6f189a3cd68ae6a4b317b7b8f42da2856110d593a832d7
|
|
| MD5 |
4f138597432a4a7c4ed875cff8eb0737
|
|
| BLAKE2b-256 |
d60ea37183f0a33472e274944d9a56edc4515820655d403886cb6420998dfa3e
|
Provenance
The following attestation bundles were made for rlmflow-0.2.0-py3-none-any.whl:
Publisher:
release.yml on shyamsn97/rlmflow
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
rlmflow-0.2.0-py3-none-any.whl -
Subject digest:
a1badf1d3f91ad992c6f189a3cd68ae6a4b317b7b8f42da2856110d593a832d7 - Sigstore transparency entry: 1490699495
- Sigstore integration time:
-
Permalink:
shyamsn97/rlmflow@d7d1eb65b8b99d898a1ed57690b462eb46e2e431 -
Branch / Tag:
refs/tags/v0.2.0 - Owner: https://github.com/shyamsn97
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@d7d1eb65b8b99d898a1ed57690b462eb46e2e431 -
Trigger Event:
push
-
Statement type: