Skip to main content

Recursive Language Models with DSPy + Modal and an integrated Web UI for secure long-context code execution

Project description

fleet-rlm

PyPI version Python versions License: MIT CI

PyPI Downloads

Secure, cloud-sandboxed Recursive Language Models (RLM) with DSPy and Modal.

fleet-rlm provides a production-ready implementation of Recursive Language Modeling aligned with the DSPy RLM API. It gives your AI agent a secure "computer" in the cloud to read, search, and analyze massive datasets without local resource constraints.

Paper | Contributing | Docs


Architecture

graph TB
    subgraph entry ["🚪 Entry Points"]
        CLI["CLI (Typer)"]
        WebUI["Web UI<br/>(React SPA)"]
        API["FastAPI<br/>(WS/REST)"]
        TUI["Ink TUI<br/>(stdio bridge)"]
        MCP["MCP Server"]
    end

    subgraph orchestration ["🧠 Orchestration Layer"]
        Agent["RLMReActChatAgent<br/>(dspy.Module)"]
        History["Chat History"]
        Memory["Core Memory<br/>(Persona/Human/Scratchpad)"]
        DocCache["Document Cache"]
    end

    subgraph tools ["🔧 ReAct Tools"]
        DocTools["📄 load_document<br/>read_file_slice<br/>chunk_by_*"]
        RecursiveTools["🔄 rlm_query<br/>llm_query<br/>(recursive delegation)"]
        ExecTools["⚡ execute_code<br/>edit_file<br/>search_code"]
    end

    subgraph execution ["⚙️ Execution Layer"]
        Interpreter["ModalInterpreter<br/>(JSON protocol)"]
        Profiles["Execution Profiles:<br/>ROOT | DELEGATE | MAINTENANCE"]
    end

    subgraph cloud ["☁️ Modal Cloud"]
        Sandbox["Sandbox Driver<br/>(Python REPL)"]
        Volume[("💾 Persistent Volume<br/>/data/<br/>• workspaces<br/>• artifacts<br/>• memory<br/>• session state")]
    end

    WebUI -->|"REST / WS"| API
    CLI --> Agent
    API --> Agent
    TUI --> Agent
    MCP --> Agent

    Agent --> History
    Agent --> Memory
    Agent --> DocCache

    Agent --> DocTools
    Agent --> RecursiveTools
    Agent --> ExecTools

    DocTools --> Interpreter
    RecursiveTools --> Interpreter
    ExecTools --> Interpreter

    Interpreter --> Profiles
    Interpreter -->|"stdin/stdout<br/>JSON commands"| Sandbox
    Sandbox -->|"read/write"| Volume

    style entry fill:#e3f2fd,stroke:#1976d2,stroke-width:2px
    style orchestration fill:#f3e5f5,stroke:#7b1fa2,stroke-width:2px
    style tools fill:#fff3e0,stroke:#f57c00,stroke-width:2px
    style execution fill:#e8f5e9,stroke:#388e3c,stroke-width:2px
    style cloud fill:#fce4ec,stroke:#c2185b,stroke-width:2px

Layers:

🚪 Entry Points → 🧠 Orchestration → 🔧 Tools → ⚙️ Execution → ☁️ Modal Cloud

Features

  • Web UI First (0.4.6): Integrated React SPA (src/frontend) is now the primary interactive surface for chat, execution timeline, and artifact workflows.
  • Interactive Agent: RLMReActChatAgent (a dspy.Module) combines fast, interactive chat with deep, recursive task execution via rlm_query.
  • DSPy Aligned: Implements dspy.RLM, dspy.Module, and dspy.Tool interfaces — compatible with DSPy optimizers (BootstrapFewShot, MIPROv2).
  • Secure Sandbox: Code runs in isolated Modal containers with persistent storage volumes, execution profiles, and sensitive data redaction.
  • Recursive Delegation: All delegate tools (rlm_query, analyze_long_document, grounded_answer, etc.) spawn true recursive sub-agents via spawn_delegate_sub_agent() with unified depth enforcement.
  • PDF Ingestion: Native document loading via MarkItDown with pypdf fallback; OCR guidance for scanned PDFs.
  • Session State: Per-workspace, per-user session persistence with manifests stored on Modal volumes.
  • MCP Server: Expose fleet-rlm capabilities as an MCP tool server via serve-mcp.
  • Execution Streams: /ws/chat remains the primary interactive stream while /ws/execution provides structured execution lifecycle events for Artifact Canvas and observability clients.
  • Observability: Real-time streaming of thoughts, tool execution, trajectory normalization, and structured logging.
  • LLM Analytics (Opt-in): PostHog $ai_generation events for DSPy LM calls with trace correlation, token metadata, latency, and payload redaction/truncation.

PostHog LLM Analytics

PostHog analytics is disabled by default. To enable it, set both:

POSTHOG_ENABLED=true
POSTHOG_API_KEY=phc_...

Optional settings:

  • POSTHOG_HOST (default: https://us.i.posthog.com)
  • POSTHOG_DISTINCT_ID (runtime user identity takes precedence in /ws/chat)
  • POSTHOG_FLUSH_INTERVAL / POSTHOG_FLUSH_AT
  • POSTHOG_ENABLE_DSPY_OPTIMIZATION (default: false)
  • POSTHOG_INPUT_TRUNCATION / POSTHOG_OUTPUT_TRUNCATION
  • POSTHOG_REDACT_SENSITIVE (default: true)

Programmatic setup:

from fleet_rlm import configure_analytics

configure_analytics()  # reads POSTHOG_* environment variables

Each DSPy LM call emits $ai_generation with:

  • $ai_trace_id, $ai_parent_trace_id
  • $ai_model, $ai_provider, $ai_latency
  • $ai_input, $ai_output_choices (sanitized + truncated)
  • $ai_input_tokens, $ai_output_tokens, $ai_total_tokens

Quick Start

1. Install

uv pip install fleet-rlm

Optional extras for server and MCP support:

uv pip install fleet-rlm[server]   # FastAPI server + WebSocket
uv pip install fleet-rlm[mcp]      # MCP server
uv pip install fleet-rlm[full]     # All extras

2. Configure

Set up your Modal and LLM credentials:

modal setup
modal volume create rlm-volume-dspy
modal secret create LITELLM DSPY_LM_MODEL=openai/gemini-3-pro-preview DSPY_LLM_API_KEY=sk-...

Set up NeonDB + backend auth bootstrap:

# from repo root
cp .env.example .env
# Edit .env and set:
#   DATABASE_URL=postgresql://... (direct Neon endpoint)
#   AUTH_MODE=dev
#   AUTH_REQUIRED=false   # dev default; auth optional until Entra is wired
#   DEV_JWT_SECRET=...

Initialize DB schema:

# from repo root
uv run python scripts/db_init.py

3. Run

Web UI (React SPA):

0.4.6 treats the React SPA as the primary interface. The backend serves the built frontend automatically.

# 1. Build the frontend (requires Bun)
cd src/frontend
bun install
bun run build
cd ../..

# 2. Build the Python package (bundles the UI into the wheel)
uv build

# 3. Install with server dependencies and run the Web UI server
uv pip install -e ".[server]"
uv run fleet web

Then navigate to http://localhost:8000 in your browser.

OpenAPI source-of-truth is openapi.yaml at repository root. Frontend API types are generated from src/frontend/openapi/fleet-rlm.openapi.yaml, which should be synced from the root spec via frontend scripts.

Interactive Chat (OpenTUI):

# Requires OpenTUI / Bun
fleet-rlm code-chat --opentui

Standalone Interactive Chat (Ink):

# Ink runtime (supported standalone path)
fleet

# Force Ink explicitly
fleet --ui ink

One-shot Tasks:

# Basic question
fleet-rlm run-basic --question "What are the first 12 Fibonacci numbers?"

# Document analysis
fleet-rlm run-architecture --docs-path docs/architecture.md --query "Extract all components"

Servers:

# API server (FastAPI + WebSocket) via explicit command
uv run fleet-rlm serve-api --port 8000

# MCP server
fleet-rlm serve-mcp --transport stdio

WebSocket endpoints:

  • /api/v1/ws/chat for interactive conversation and tool orchestration events.
  • /api/v1/ws/execution for filtered execution lifecycle events (execution_started, execution_step, execution_completed) scoped by workspace_id, user_id, and session_id.

Issue a dev token:

# from repo root
uv run python scripts/dev_issue_token.py \
  --tid "00000000-0000-0000-0000-000000000123" \
  --oid "00000000-0000-0000-0000-000000000456" \
  --email dev@example.com \
  --name "Dev User"

Call an authenticated endpoint (debug headers):

curl -s http://127.0.0.1:8000/api/v1/auth/me \
  -H "X-Debug-Tenant-Id: 00000000-0000-0000-0000-000000000123" \
  -H "X-Debug-User-Id: 00000000-0000-0000-0000-000000000456" \
  -H "X-Debug-Email: dev@example.com" \
  -H "X-Debug-Name: Dev User"

Call an authenticated endpoint (JWT):

curl -s http://127.0.0.1:8000/api/v1/auth/me \
  -H "Authorization: Bearer ${DEV_TOKEN}"

Run DB smoke test:

# from repo root
uv run python scripts/db_smoke.py

fleet and fleet-rlm code-chat serve different interactive paths:

  • fleet = standalone bridge chat launcher (Ink runtime)
  • fleet-rlm code-chat = OpenTUI runtime (OpenTUI/Bun required)

Development Setup

# Clone and install
git clone https://github.com/qredence/fleet-rlm.git
cd fleet-rlm
uv sync --extra dev

# With server/MCP support
uv sync --extra dev --extra server --extra mcp

# Build React frontend bundle for web UI
cd src/frontend
bun install
bun run check
cd ../..

# Build Ink frontend bundle for `fleet --ui ink`
cd tui-cli/tui-ink
bun install
bun run build
bun run test
cd ..

# Copy environment template
cp .env.example .env

# Quality gate
uv run ruff check src tests
uv run ruff format --check src tests
uv run ty check src --exclude "src/fleet_rlm/_scaffold/**"
uv run pytest -q

# Auto-fix formatting when needed
uv run ruff format src tests

Documentation

Contributing

We welcome contributions! Please see our Contribution Guide and run the quality gate before submitting:

uv run ruff check src tests
uv run ruff format --check src tests
uv run ty check src --exclude "src/fleet_rlm/_scaffold/**"
uv run pytest -q

License

MIT License — see LICENSE.

Based on Recursive Language Modeling research by Alex L. Zhang (MIT CSAIL), Omar Khattab (Stanford), and Tim Kraska (MIT).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fleet_rlm-0.4.6.tar.gz (254.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

fleet_rlm-0.4.6-py3-none-any.whl (323.5 kB view details)

Uploaded Python 3

File details

Details for the file fleet_rlm-0.4.6.tar.gz.

File metadata

  • Download URL: fleet_rlm-0.4.6.tar.gz
  • Upload date:
  • Size: 254.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for fleet_rlm-0.4.6.tar.gz
Algorithm Hash digest
SHA256 bfcb415871765e84bd0e167886e9cda6058816494fe783f8522544cee6f4d7e1
MD5 b731ad07f0c724b857c9d03adfb0ad0a
BLAKE2b-256 e0e4d18fdc07e731dd041e092efcedc9a8403829ce852507a165628194eed2f0

See more details on using hashes here.

Provenance

The following attestation bundles were made for fleet_rlm-0.4.6.tar.gz:

Publisher: release.yml on Qredence/fleet-rlm

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file fleet_rlm-0.4.6-py3-none-any.whl.

File metadata

  • Download URL: fleet_rlm-0.4.6-py3-none-any.whl
  • Upload date:
  • Size: 323.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for fleet_rlm-0.4.6-py3-none-any.whl
Algorithm Hash digest
SHA256 c10d4900a49d55c1e2740b98afa16f838f6f171db25ada6b466dce32cd5d6488
MD5 2edca5a242231f60dd13a2a458785b2f
BLAKE2b-256 d244f6dbfddb2b58915c2e2fc43c71aa0df93ecffdc1a263ed1ee7a6c81ee392

See more details on using hashes here.

Provenance

The following attestation bundles were made for fleet_rlm-0.4.6-py3-none-any.whl:

Publisher: release.yml on Qredence/fleet-rlm

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page