Recursive Language Models with DSPy + Daytona and an integrated Web UI for secure long-context code execution

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

agenticfleet

These details have not been verified by PyPI

Project links

Documentation

Project description

fleet-rlm

thumbnail

fleet-rlm is a web workspace for running recursive language-model tasks on top of DSPy and Daytona sandboxes. You chat with a ReAct agent in the browser; when a task is larger than a single context window, the agent delegates pieces to isolated sub-sandboxes, each running a bounded dspy.RLM per arXiv 2512.24601v2.

Who it's for. DSPy users who want a UI-driven workspace for long-context tasks, recursive decomposition, and sandboxed code execution — without hand-rolling the transport, persistence, and sandbox plumbing.

What it removes. Writing your own WebSocket transport, session persistence, Daytona sandbox lifecycle, execution-trace UI, and recursive-delegation policy around a DSPy program. fleet-rlm ships all of that behind a single uv run fleet web.

Try it in 30 seconds. See Quick Start below.

Docs · Contributing · Changelog · arXiv paper

Architecture at a Glance

Two layers, both dspy.*, both real:

Chat surface — dspy.ReAct for interactive turn-taking. Lives at src/fleet_rlm/runtime/agent/agent.py as FleetAgent.
Recursive engine — dspy.RLM running inside a child Daytona sandbox. Built in src/fleet_rlm/runtime/models/builders.py; the recursive sub-query variant is build_recursive_subquery_rlm(). Implements Algorithm 1 from arXiv 2512.24601v2: inputs stored as REPL variables, sub-queries bounded by max_iterations and max_llm_calls.

How the ReAct Agent Delegates to `dspy.RLM`

The chat agent does not directly hand a task to a child RLM. Delegation is mediated by a specific ReAct tool, delegate_to_rlm, registered the same way as any other tool in the agent's tool registry:

User prompt
   ↓
FleetAgent  (dspy.ReAct, host LLM)
   │   decides the task exceeds one context and picks the tool:
   ↓
delegate_to_rlm(query, context="", document_url="")
   │   — src/fleet_rlm/runtime/tools/rlm_delegate.py
   │   — reads the active Daytona interpreter from a ContextVar
   │   — checks remaining LLM-call budget; returns error if exhausted
   │   — interpreter.build_delegate_child()   ← isolated child Daytona sandbox
   │   — optionally fetches document_url into the child's context
   ↓
build_recursive_subquery_rlm(
    interpreter=child,
    max_iterations=min(child.rlm_max_iterations, remaining_budget),
    max_llm_calls=remaining_budget,
)
   │   constructs the dspy.RLM bound to the child sandbox
   ↓
rlm(prompt=query, context=...)
   │   child RLM runs REPL-variable-mode: may call llm_query(),
   │   sub_rlm(), sub_rlm_batched() to recurse further inside its sandbox
   ↓
{"status": "ok", "answer": "..."}        ← bubbles back into the ReAct trace

Two entry points exist, and they share one budget:

delegate_to_rlm() — from the host ReAct agent's tool registry (above).
sub_rlm() / sub_rlm_batched() — from Python code already running inside a dspy.RLM sandbox, reaching back out through the Daytona bridge to spawn a further child.

Both go through DaytonaInterpreter.build_delegate_child() so child creation follows one backend-owned policy (default: RLM_CHILD_ISOLATION_MODE=auto — fork the parent sandbox if no durable volume is mounted, otherwise create a clean child with a child-specific volume_subpath). rlm_max_llm_calls is a single shared semantic-call budget across the entire recursive tree; sub_rlm_batched() caps sibling parallelism at 4.

Full details, including the local-workspace-snapshot fallback when a parent turn has no repo_url to recreate in the child, live in docs/architecture.md.

RLM Capability Evaluation

Fleet-RLM's RLM capabilities were empirically benchmarked against the published RLM paper (arXiv 2512.24601v2) and Prime Intellect's official primeintellect/oolong-rlm environment:

Benchmark	Paper RLM(GPT-5)	Fleet-RLM + Gemini 3.1 Pro
S-NIAH (50 tasks, 50K–200K chars)	(solved)	100.0%
OOLONG-Official (`trec_coarse` @ 128K)	56.5%	91.67% (+35.2 pp)
OOLONG synthetic (30 tasks)	56.5% (reference)	74.0%

The OOLONG-Official row uses the exact HuggingFace dataset and scoring rubric from the paper's reference environment, via scripts/oolong_official_eval.py. See docs/explanation/rlm-capability-evaluation.md for the full methodology, per-benchmark breakdown, and ASCII diagrams of the evaluation stack. Full results, including caveats and deferred L4 work, are generated locally at output/rlm-eval-full/RESULTS.md; use the docs page above as the stable checked-in reference in this repository.

Offline GEPA Optimization (LongCoT)

The DSPy optimization layer now supports LongCoT reasoning modules:

longcot-reasoner is registered in the optimization module registry and discoverable via fleet-rlm optimize list.
A continuous answer-dominant 0.6/0.4 GEPA metric was implemented for LongCoT evaluation with tiered feedback.
A one-time offline GEPA optimization was run against an 80-row answered-only LongCoT dataset (64 train / 16 validation), producing a reviewable optimized artifact bundle.
The pipeline persists baseline-vs-optimized holdout evidence, prompt snapshots, reflection-model provenance, and MLflow metadata.
The optimized artifact is saved for manual review and is not auto-loaded into the live runtime.

Quick Start

Add fleet-rlm to a uv-managed project and launch the Web UI:

# Create a project if you do not already have one
uv init

# Add fleet-rlm to the environment
uv add fleet-rlm

# Start the Web UI + API server
uv run fleet web

Open http://127.0.0.1:8000.

If you already have a uv project, skip uv init and just run uv add fleet-rlm.

Published installs already include built frontend assets, so end users do not need pnpm, vp, or a separate frontend build step.

Primary Workflows

Use the Web UI

uv run fleet web

This starts the main product surface with:

Workbench for adaptive chat and runtime execution
Volumes for runtime-backed file browsing
Optimization for DSPy evaluation and optimization workflows
Settings for runtime configuration and diagnostics

Use terminal chat

uv run fleet-rlm chat --trace-mode compact

Run the API directly

uv run fleet-rlm serve-api --host 127.0.0.1 --port 8000

Runtime Contract

fleet-rlm exposes a Daytona-only runtime contract:

execution_mode remains a per-turn execution hint.
Requests may include repo_url, repo_ref, context_paths, and batch_concurrency.
Durable mounted roots remain memory/, artifacts/, buffers/, and meta/.

The product is goal-first rather than repo-first. Repositories are one possible source of context, alongside local files, staged documents, pasted content, and URLs.

CLI Surfaces

This package exposes two command entrypoints:

fleet: lightweight launcher for terminal chat and fleet web
fleet-rlm: fuller Typer CLI for API and Daytona flows

Common commands:

# Web UI
uv run fleet web

# Terminal chat
uv run fleet
uv run fleet-rlm chat --trace-mode verbose

# FastAPI server
uv run fleet-rlm serve-api --port 8000

# Experimental Daytona validation
uv run fleet-rlm daytona-smoke --repo https://github.com/qredence/fleet-rlm.git --ref main

HTTP and WebSocket Contract

The current frontend/backend contract centers on:

/health
/ready
GET /api/v1/auth/me
GET /api/v1/sessions/state
/api/v1/runtime/*
POST /api/v1/traces/feedback
/api/v1/ws/execution
/api/v1/ws/execution/events

When AUTH_MODE=entra, HTTP and WebSocket access use real Entra bearer-token validation plus Neon-backed tenant admission. Runtime settings writes are intentionally limited to APP_ENV=local.

The canonical schema lives in openapi.yaml.

Source Development

From the repo root:

uv sync --all-extras
uv run fleet web

For the maintained helper-script surface, see scripts/README.md. Common day-to-day flows should still go through make, fleet, or fleet-rlm.

Frontend contributors should use pnpm inside src/frontend:

cd src/frontend
pnpm install --frozen-lockfile
pnpm run dev
pnpm run api:check
pnpm run type-check
pnpm run lint:robustness
pnpm run test:unit
pnpm run build

This repo explicitly uses pnpm for frontend work even though the packaged frontend is built with Vite+ under the hood.

Repo Layout

The maintained backend is easiest to read in this order:

Recursive DSPy runtime core
- src/fleet_rlm/runtime/agent/*
- src/fleet_rlm/runtime/models/*
- src/fleet_rlm/integrations/daytona/*
Thin transport shell
- src/fleet_rlm/api/main.py
- src/fleet_rlm/api/routers/ws/*
- src/fleet_rlm/api/runtime_services/*
Offline DSPy quality and optimization layer
- src/fleet_rlm/runtime/quality/*

That means:

runtime/agent/agent.py and runtime/agent/runtime.py are the main cognition loop.
integrations/daytona/interpreter.py is the public Daytona interpreter facade; workspace_manager.py, sandbox_executor.py, child_delegation.py, and runtime.py own workspace/session state, execution, recursive child construction, and Daytona SDK runtime helpers behind it.
FastAPI/WebSocket modules are transport: auth, request parsing, session extraction, lifecycle, and event-envelope delivery.

The supported app surfaces are Workbench, Volumes, Optimization, and Settings. Legacy taxonomy, skills, memory, and analytics routes are no longer first-class product surfaces and should fall through to /404.

Design Principles

Keep the backend thin: transport + sandbox orchestration only, no business logic in API layers.
Preserve one shared frontend and WebSocket contract instead of parallel runtime modes.
Ship a UI that surfaces the runtime's streaming events, code execution, and artifacts rather than hiding them.
Expose both a user-facing Web UI and integration surfaces for CLI, HTTP, and WebSocket workflows.

Maintenance Commands

Common maintenance commands from the repo root:

# Clear caches and local generated artifacts
make clean

# Regenerate the canonical FastAPI schema after backend contract or doc-metadata changes
uv run python scripts/openapi_tools.py generate

# Validate the schema quality improvements in-flight
uv run python scripts/openapi_tools.py validate

# Sync frontend OpenAPI artifacts after the root spec changes
cd src/frontend
pnpm run api:sync

Validation

Repo-level validation:

make test-fast
make quality-gate
make release-artifacts
make release-check

# Focused backend/runtime regression lane
uv run pytest -q tests/unit/integrations/daytona/test_config.py tests/unit/integrations/daytona/test_runtime.py tests/unit/integrations/daytona/test_interpreter.py tests/unit/runtime/agent/test_runtime.py -m "not live_llm and not live_daytona and not benchmark"

Focused docs validation:

uv run python scripts/check_docs_quality.py
uv run python scripts/validate_release.py hygiene
uv run python scripts/validate_release.py metadata

Daytona Notes

Use this order for Daytona work:

Set DAYTONA_API_KEY, DAYTONA_API_URL, and optional DAYTONA_TARGET.
Run uv run fleet-rlm daytona-smoke --repo <url> [--ref <branch-or-sha>].

In local/default-local source checkouts, Daytona config resolution prefers repo .env / .env.local values over inherited shell exports so branch-local validation uses the checkout's intended credentials.

This repo treats DAYTONA_API_BASE_URL as a misconfiguration. Use DAYTONA_API_URL instead.

Documentation Map

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

agenticfleet

These details have not been verified by PyPI

Project links

Documentation

Release history Release notifications | RSS feed

0.5.40

May 23, 2026

This version

0.5.31

May 17, 2026

0.5.3

May 17, 2026

0.5.2

Apr 29, 2026

0.5.1

Apr 27, 2026

0.5.0

Apr 25, 2026

0.4.99

Mar 26, 2026

0.4.98

Mar 17, 2026

0.4.97

Mar 15, 2026

0.4.95

Mar 5, 2026

0.4.94

Mar 3, 2026

0.4.93

Mar 3, 2026

0.4.92

Mar 1, 2026

0.4.9

Feb 27, 2026

0.4.8

Feb 25, 2026

0.4.7

Feb 22, 2026

0.4.6

Feb 20, 2026

0.4.5

Feb 18, 2026

0.4.4

Feb 15, 2026

0.4.3

Feb 15, 2026

0.4.2

Feb 15, 2026

0.4.1

Feb 12, 2026

0.4.0

Feb 12, 2026

0.3.2

Feb 10, 2026

0.3.1

Feb 9, 2026

0.1.0

Feb 7, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fleet_rlm-0.5.31.tar.gz (4.9 MB view details)

Uploaded May 17, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

fleet_rlm-0.5.31-py3-none-any.whl (5.2 MB view details)

Uploaded May 17, 2026 Python 3

File details

Details for the file fleet_rlm-0.5.31.tar.gz.

File metadata

Download URL: fleet_rlm-0.5.31.tar.gz
Upload date: May 17, 2026
Size: 4.9 MB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.13

File hashes

Hashes for fleet_rlm-0.5.31.tar.gz
Algorithm	Hash digest
SHA256	`c6858f6337bdf1255d6f579684eefb2b49ce6280e95e7139859c9da1927ae376`
MD5	`8c1fab3898c092406d16285377e055a7`
BLAKE2b-256	`859933a9527c304e83b9613905e3371045d997aae47db30c47000c527ef2e747`

See more details on using hashes here.

Provenance

The following attestation bundles were made for fleet_rlm-0.5.31.tar.gz:

Publisher: release.yml on Qredence/fleet-rlm

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: fleet_rlm-0.5.31.tar.gz
- Subject digest: c6858f6337bdf1255d6f579684eefb2b49ce6280e95e7139859c9da1927ae376
- Sigstore transparency entry: 1564725735
- Sigstore integration time: May 17, 2026
Source repository:
- Permalink: Qredence/fleet-rlm@fec92c06b207d447e0e5f64c4db6478be5cea1aa
- Branch / Tag: refs/heads/main
- Owner: https://github.com/Qredence
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@fec92c06b207d447e0e5f64c4db6478be5cea1aa
- Trigger Event: workflow_dispatch

File details

Details for the file fleet_rlm-0.5.31-py3-none-any.whl.

File metadata

Download URL: fleet_rlm-0.5.31-py3-none-any.whl
Upload date: May 17, 2026
Size: 5.2 MB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.13

File hashes

Hashes for fleet_rlm-0.5.31-py3-none-any.whl
Algorithm	Hash digest
SHA256	`81db73a95d7741fb471278ff715eeaa92e0c39db1fa95ccb8f44e50760bf44f2`
MD5	`7487cdfbd255bcf2edab4a68f65f986e`
BLAKE2b-256	`9698d81621a8d1d7fe31decfaf6c92bdb154496f5c25dbad9e73277bf3df1019`

See more details on using hashes here.

Provenance

The following attestation bundles were made for fleet_rlm-0.5.31-py3-none-any.whl:

Publisher: release.yml on Qredence/fleet-rlm

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: fleet_rlm-0.5.31-py3-none-any.whl
- Subject digest: 81db73a95d7741fb471278ff715eeaa92e0c39db1fa95ccb8f44e50760bf44f2
- Sigstore transparency entry: 1564725753
- Sigstore integration time: May 17, 2026
Source repository:
- Permalink: Qredence/fleet-rlm@fec92c06b207d447e0e5f64c4db6478be5cea1aa
- Branch / Tag: refs/heads/main
- Owner: https://github.com/Qredence
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@fec92c06b207d447e0e5f64c4db6478be5cea1aa
- Trigger Event: workflow_dispatch

fleet-rlm 0.5.31

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

fleet-rlm

Architecture at a Glance

How the ReAct Agent Delegates to dspy.RLM

RLM Capability Evaluation

Offline GEPA Optimization (LongCoT)

Quick Start

Primary Workflows

Use the Web UI

Use terminal chat

Run the API directly

Runtime Contract

CLI Surfaces

HTTP and WebSocket Contract

Source Development

Repo Layout

Design Principles

Maintenance Commands

Validation

Daytona Notes

Documentation Map

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

How the ReAct Agent Delegates to `dspy.RLM`