RLM-enhanced agent harness with planning, filesystem, sub-agents, and profile-driven context isolation tools

These details have not been verified by PyPI

Project links

Repository

Project description

rlmagents

RLM-native agent harness — A complete, standalone framework with planning, filesystem, sub-agents, and profile-driven RLM tools for context isolation and evidence-backed reasoning.

RLMAgents is built from Deep Agents lineage and packaged as a standalone harness for production-oriented RLM workflows.

Design reference: Recursive Language Model paper (RLM)
Upstream lineage: LangChain Deep Agents

Paper Scope vs rlmagents Add-ons

The implementation keeps the paper's core loop:

Externalized prompt/context in REPL state (not directly stuffed into the root model window)
Code-execution loop with iterative feedback
Programmatic recursive calls via sub_query()/llm_query()

This project also includes engineering layers beyond the paper:

Evidence lifecycle and citation tooling
Session persistence and memory-pack serialization
Cross-context and semantic search utilities
Recipe validation/estimation/execution and DSL helpers
Context-pressure handling (including rlmagents-specific compaction heuristics)
Full agent harness integrations (planning/filesystem/sub-agents/HITL)

If behavior breaks in these add-ons, that is on the rlmagents implementation layer, not on the user and not on the core RLM paper method.

RLM Features Assessment

Recent validation runs confirmed the RLM stack is materially useful in practice:

Context isolation works across multiple loaded documents and supports cross-context search.
Evidence tracking captures provenance across search, REPL execution, and manual citation.
REPL helpers enable structured extraction and analysis beyond plain prompting.
The think -> evaluate_progress -> finalize flow improves analysis discipline and traceability.

Assessment summary: the RLM layer is not just conceptual; it provides measurable workflow gains over standard agent-only loops for research and long-context tasks.

Quick Start

pip install rlmagents
# or
uv add rlmagents

from rlmagents import create_rlm_agent

agent = create_rlm_agent()
result = agent.invoke({
    "messages": [{"role": "user", "content": "Research this topic and write a summary..."}]
})

What's Included

Out of the box, rlmagents provides:

Feature	Tools
Planning	`write_todos`, `read_todos`
Filesystem	`read_file`, `write_file`, `edit_file`, `ls`, `glob`, `grep`
Shell	`execute` (sandboxed)
Sub-agents	`task` (delegate with isolated contexts)
RLM Context	`load_context`, `load_file_context`, `list_contexts`, `diff_contexts`, `save_session`, `load_session`
RLM Query	`peek_context`, `search_context`, `semantic_search`, `chunk_context`, `cross_context_search`, `rg_search`, `exec_python`, `get_variable`
RLM Reasoning	`think`, `evaluate_progress`, `summarize_so_far`, `get_evidence`, `finalize`
RLM Recipes	`validate_recipe`, `run_recipe`, `run_recipe_code`
RLM Tool Profiles	`full` (all), `reasoning` (no recipe/config), `core` (minimum set)
Memory	AGENTS.md files loaded at startup
Skills	Domain-specific capabilities from SKILL.md files

Total available: 35+ tools (9 base + up to 26 RLM + skills/memory)

Why RLM Here

RLMAgents bakes the RLM workflow directly into agent behavior:

Large context is isolated instead of flooding chat history.
Findings can be traced to evidence.
Recursive sub-queries offload targeted analysis.
The REPL enables deterministic extraction and computation.

Security note: the REPL is best-effort restricted by policy and timeouts, but it is not a formally hardened sandbox.

Recipe pipelines make repeated analysis reproducible.

RLM Workflow

The agent is taught to use this workflow for complex analysis:

Load large files/data into isolated RLM contexts
Explore with search_context, peek_context, semantic_search, chunk_context, rg_search
Analyze with exec_python (100+ built-in helpers: search, extract, stats, cite)
Track evidence automatically (provenance for all findings)
Reason with think, evaluate_progress
Conclude with finalize (cited answers)

Configuration

from langchain.chat_models import init_chat_model
from rlmagents import create_rlm_agent

# Both main model and sub_query model must be explicitly configured
agent = create_rlm_agent(
    model="deepseek/deepseek-chat",       # Main agent model (required)
    sub_query_model="minimax/minimax-01",  # Optional override; default reuses `model`
    sub_query_timeout=120.0,               # Sub-query timeout
    skills=["/skills/analysis/"],          # Skill sources
    memory=["/memory/AGENTS.md"],          # Memory files
    rlm_tool_profile="reasoning",          # full | reasoning | core
    rlm_exclude_tools=("cross_context_search",),  # Optional tool removal
    auto_load_threshold=5000,              # Auto-load >5KB into RLM
    auto_load_preview_chars=400,           # Keep transcript previews small
    sandbox_timeout=300.0,                 # RLM REPL timeout
    enable_rlm_in_subagents=True,          # Deprecated; RLM in sub-agents is always on
    interrupt_on={"edit_file": True},      # Human-in-the-loop
)

If sub_query_model is omitted, recursive sub_query() / llm_query() calls use the same configured provider/model as the main agent.

Context Window-First Setup

For workflows that should keep almost everything outside the active chat window:

from pathlib import Path

agent = create_rlm_agent(
    model="deepseek/deepseek-chat",
    rlm_tool_profile="core",
    auto_load_threshold=1500,
    auto_load_preview_chars=0,
    rlm_system_prompt=Path("examples/rlm_system_prompt.md").read_text(),
    memory=["examples/AGENTS.md"],
)

Use load_file_context as the default way to ingest large files.

Architecture

rlmagents/
├── _harness/              # Incorporated agent harness
│   ├── backends/          # Backend protocol (State, Filesystem, etc.)
│   └── middleware/        # Planning, filesystem, skills, memory, etc.
├── middleware/
│   └── rlm.py             # RLM middleware (tool profiles + auto-load controls)
├── repl/                  # Sandboxed Python REPL with 100+ helpers
│   ├── sandbox.py         # Sandboxed execution environment
│   └── helpers.py         # Built-in helper functions
├── session_manager.py     # Session lifecycle management
├── serialization.py       # Session serialization (memory packs)
├── recipes.py             # Recipe validation and execution
└── graph.py               # create_rlm_agent() factory

Requirements

Python 3.11+
langchain-core>=1.2.10
langchain>=1.2.10
langchain-anthropic>=0.3.0
langgraph>=0.3.0
pyyaml>=6.0
wcmatch>=10.0

No runtime dependency on upstream deepagents internals — rlmagents is fully standalone.

Development

cd libs/rlmagents
uv sync --group test
uv run pytest
uv run ruff check .
uv run ruff format .

Launch & Telemetry Checklist

One-command launch path:

(cd "$(git rev-parse --show-toplevel)/libs/rlmagents" && uv run python examples/dogfood.py)

Compatibility check command:

(cd "$(git rev-parse --show-toplevel)/libs/rlmagents" && \
uv run python -c "from examples.bootstrap_config import _load_dotenv_if_available; from rlmagents import create_rlm_agent; _load_dotenv_if_available(); create_rlm_agent(); print('bootstrap ok')")

Terminal flow smoke command:

(cd "$(git rev-parse --show-toplevel)/libs/rlmagents" && \
uv run pytest tests/unit_tests/test_terminal_bench_scenarios.py -q)

Benchmark-score output (optional when benchmark job is running):

(cd "$(git rev-parse --show-toplevel)/libs/rlmagents" && \
RLMAGENTS_BENCHMARK_SCORE_PATH=$PWD/.artifacts/terminal_bench_score.json \
  uv run pytest tests/unit_tests/test_terminal_bench_scenarios.py -q)

Expected JSON output format when score output is enabled:

{
  "read_edit_verify_loop": "passed",
  "long_context_compaction": "passed",
  "sub_query_stubbed_path": "passed",
  "dogfood_mocked_provider": "passed"
}

Model- and run-time telemetry checks should include:

Whether create_configured_agent() can initialize (or skip with explicit env-based reason).
Whether terminal-bench scenarios report all keys above.
Whether examples/dogfood.py executes run_tooled_dogfood() and prints agent output when keys are present.

Success criteria

Bootstrap: examples/bootstrap_config.py imports, and uv run python -c ... check returns bootstrap ok.
Model connectivity: create_configured_agent() returns a runnable agent object when provider keys are present, and create_rlm_agent is constructible with mocked tool call flows.
Dogfood readiness: uv run python examples/dogfood.py executes run_tooled_dogfood() and, when keys are present, prints model output without uncaught exceptions.
Terminal-flow readiness: scenario smoke tests in tests/unit_tests/test_terminal_bench_scenarios.py pass.
Benchmark readiness: scenario smoke tests emit the terminal bench score artifact (when enabled) and include all four scenario keys.

Failure criteria

Any syntax/import error in rlmagents or examples/bootstrap_config.py.
Missing environment variables for provider-backed flows (DEEPSEEK_API_KEY or MINIMAX_API_KEY).
uv run pytest tests/unit_tests/test_terminal_bench_scenarios.py failures.
create_configured_agent() or dogfood.py raising runtime exceptions instead of exiting with explicit skip/failure output.

Comparison

Feature	Other Harnesses	rlmagents
Planning	✅	✅
Filesystem	✅	✅
Sub-agents	✅	✅ (RLM-enabled)
Context isolation	❌	✅
Evidence tracking	❌	✅
Python REPL analysis	❌	✅
Cross-context search	❌	✅
Auto-load large results	❌	✅
Recipe pipelines	❌	✅
Skills	✅	✅
Memory	✅	✅

License

MIT License — see LICENSE for details.

Project details

These details have not been verified by PyPI

Project links

Repository

Release history Release notifications | RSS feed

0.0.5

Feb 25, 2026

0.0.4

Feb 20, 2026

0.0.3

Feb 19, 2026

0.0.2

Feb 19, 2026

This version

0.0.1

Feb 19, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rlmagents-0.0.1.tar.gz (250.4 kB view details)

Uploaded Feb 19, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

rlmagents-0.0.1-py3-none-any.whl (148.9 kB view details)

Uploaded Feb 19, 2026 Python 3

File details

Details for the file rlmagents-0.0.1.tar.gz.

File metadata

Download URL: rlmagents-0.0.1.tar.gz
Upload date: Feb 19, 2026
Size: 250.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for rlmagents-0.0.1.tar.gz
Algorithm	Hash digest
SHA256	`caf0edc1866126a7f6507eaf9b57161cf6f95cbf2e05f7f2e21080947e6208bd`
MD5	`eaa4a5c13c9d13694f1b324d9f05dc3a`
BLAKE2b-256	`c06864628e399a7b47de5c58a84d69069b35a368a3f904eb747a491ff9fc6db9`

See more details on using hashes here.

File details

Details for the file rlmagents-0.0.1-py3-none-any.whl.

File metadata

Download URL: rlmagents-0.0.1-py3-none-any.whl
Upload date: Feb 19, 2026
Size: 148.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for rlmagents-0.0.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`f652a49e89e135718a36919839e9a7924ec4d699b2d6b04c847fb5cad34965b4`
MD5	`e2870212b6a397bf23fdbb72fe5353e7`
BLAKE2b-256	`3ef7424bef4b5c8abf3fe1f3f5734e386dae38fb2485262160f5f91bb88a7bc7`

See more details on using hashes here.

rlmagents 0.0.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

rlmagents

Paper Scope vs rlmagents Add-ons

RLM Features Assessment

Quick Start

What's Included

Why RLM Here

RLM Workflow

Configuration

Context Window-First Setup

Architecture

Requirements

Development

Launch & Telemetry Checklist

Success criteria

Failure criteria

Comparison

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes