Skip to main content

Sovereign-inference TUI chat against an OpenAI-compatible vLLM endpoint.

Project description

rtxclaw.ai

Run it now: git clone … && cd rtxclaw && ./rtxclaw — the launcher creates .venv/, installs dependencies, and starts the TUI. See Getting started for first-time configuration, the workspace layout, env knobs, and the RTXCLAW_HOME=~/.rtxclaw-test test-sandbox pattern.

Install

pip install rtxclaw

That puts a rtxclaw command on your PATH. Run it once with no arguments — the first-run wizard walks you through the LLM endpoint, model, and (optionally) Telegram, then brings the gateway up itself. Three steps and you're in the TUI:

pip install rtxclawrtxclaw → answer the wizard → done.

pyproject.toml declares the rtxclaw_* packages, the rtxclaw console script, and the full runtime dependency set; cutting a release is a python -m build && twine upload dist/* away.

From a clone (for development)

git clone … && cd rtxclaw && ./rtxclaw — the launcher creates .venv/, runs an editable pip install -e ., and starts the TUI.

rtxclaw.ai is the cypherpunk version of inference.

It exists to empower the user to have total control over their data and ideas, without the hassle of endless configuration or requiring deep open-source model knowledge just to get useful work done.

This project starts from a hard truth: the AI industry is underinvesting in infrastructure and degrading model quality to keep up with demand. Decisions like blocking OpenClaw from the Max plan and forcing heavy API costs on users reinforce the view that AI companies are sucking people and companies’ ideas like the Matrix uses human crops for creativity and for studying human thought processes.

That is the opposite of sovereignty.

The current model asks users and companies to pour their private context, internal reasoning, product ideas, and operational intelligence into centralized AI systems they do not control. In return, they get rising costs, shrinking access, degraded quality under load, and dependence on infrastructure decisions made by someone else.

And the risk is not theoretical.

The imminent Taiwan conflict will create heavy shocks to current business. Any company relying on AI will have no option but to pay the price of neoclouds if supply chains seize up and centralized inference tightens further. Businesses that chose not to build on-prem infrastructure, or at least retain the option, will be trapped into paying whatever the market demands.

rtxclaw is the answer to that trap.

rtxclaw is a custom-built agent system that adapts to the available inference capacity by creating tailored agents for each hardware profile, from a modest RTX 3060 to RTX 3090, RTX 4090, RTX 5090, A6000-class workstations, and up to advanced rented neocloud GPUs on platforms like Vast.ai.

Instead of forcing every task through one oversized, expensive, centralized stack, rtxclaw rightsizes inference to the real job:

  • small agents on cheap local hardware
  • stronger agents on workstations
  • burst agents on rented neocloud GPUs
  • flexible routing based on actual available capacity
  • model selection based on task value, latency, and hardware envelope
  • agent behavior shaped around the realities of the machine it runs on

Agents can be spawned in seconds using the right-sizing capacity for each task, reducing the cost of AI while increasing resilience, performance, and control.

Why rtxclaw exists

Most AI products are built around a hidden assumption: the user should adapt to the vendor.

The vendor chooses the models. The vendor chooses the pricing. The vendor chooses when quality gets degraded. The vendor chooses which products get blocked. The vendor chooses which workloads are too expensive. The vendor chooses whether your use case is welcome.

rtxclaw rejects that model.

The intelligence layer of a company is too important to outsource blindly. Your prompts are not just prompts. They are product direction, customer knowledge, internal process, strategy, failure modes, experimentation, and judgment in raw form. If your AI stack is not sovereign, your cognition stack is not sovereign.

Core principles

  • Own the data
  • Own the ideas
  • Own the inference path
  • Minimize configuration
  • Avoid vendor lock-in
  • Use the smallest capable model
  • Adapt to available hardware
  • Keep the system understandable
  • Prefer tailored agents over monolithic bloat
  • Treat inference as infrastructure, not magic
  • Cancel scopes you can reason about — abort one session and only that session (plus its subagents) dies; abort the gateway and everything dies. No mystery middle ground.

Design philosophy

rtxclaw follows a simple philosophy: small enough to understand, flexible enough to adapt, powerful enough to matter.

Customization should come from code and agent behavior, not from endless configuration sprawl. The system should adapt to the user, the hardware, and the workload, not force the user to adapt to the limitations of a vendor’s pricing model or infrastructure bottlenecks.

This means:

  • no blind dependence on one model provider
  • no assumption that every task deserves frontier-model pricing
  • no assumption that cloud is always the answer
  • no assumption that local hardware is too weak to matter
  • no assumption that one agent shape fits every machine

Hardware-aware agents

rtxclaw is built around the idea that different hardware should produce different agent strategies.

A small local card like an RTX 3060 should not be treated the same way as a 4090, a 5090, an RTX A6000, or a high-memory neocloud GPU. The system should understand the available VRAM, throughput, latency, and cost envelope, then spawn the right kind of agent for the job.

Examples:

  • RTX 3060 / 4060-class: lightweight routing, summarization, background memory work, small local copilots
  • RTX 3090 / 4090-class: stronger coding agents, research agents, orchestration, hybrid local inference
  • RTX 5090-class: high-end desktop inference, multi-agent local workflows, stronger reasoning on-prem
  • A6000 / workstation-class: larger-context agents, heavier pipelines, persistent business-critical agent roles
  • Vast.ai / neocloud GPUs: burst capacity, specialized heavy jobs, temporary swarms, overflow compute

The point is not to chase the biggest GPU. The point is to make every GPU useful.

What rtxclaw does

rtxclaw creates a custom-built agent system that:

  • detects or knows the available inference capacity
  • matches tasks to the right runtime and hardware tier
  • spawns agents in seconds
  • routes work based on cost, latency, and model capability
  • reduces unnecessary API dependence
  • uses neocloud only when it makes economic or operational sense
  • preserves the option of on-prem inference as a first-class path
  • keeps the architecture understandable enough to modify

What makes it cypherpunk

Cypherpunk systems assume the network is hostile, dependency is dangerous, and convenience without control becomes a trap.

rtxclaw applies that logic to inference.

  • If your intelligence depends entirely on remote providers, you do not control your intelligence.
  • If your private reasoning is continuously exported, you do not control your ideas.
  • If your costs can be repriced overnight, you do not control your operating margin.
  • If your access can be revoked by policy, demand spikes, or product decisions, you do not control your future.

Sovereign inference means keeping optionality. Sovereign inference means designing for adversarial conditions. Sovereign inference means your agent system can still function when cloud prices spike, access tightens, models get rate-limited, or supply chains crack.

Why now

As models get smarter and smaller, like Gemma 4 and Qwen3.5, the direction becomes obvious. The future belongs to systems that can run intelligence anywhere, on hardware you control, on hardware you rent intelligently, or on whatever inference capacity is available at that moment.

Model progress is shrinking the moat of centralized inference. Better small models plus better open-weight ecosystems mean the balance shifts toward adaptive systems that can move fluidly between local, workstation, datacenter, and burst cloud environments.

That is the world rtxclaw is built for.

Architecture

System prompt layout (mirrors OpenClaw)

Tool-contract guidance lives with the tool, in code. Behavioural style lives in operator-editable .md files. Both sit above the cache boundary; the contract goes FIRST so the model sees it before lost-in-the-middle attention drops.

Stable-prefix order (above <!-- RTXCLAW_CACHE_BOUNDARY -->):

  1. ## Tooling — hardcoded in rtxclaw_core.system_prompt.tooling_prompt_section. Carries the todo contract: "if 2+ steps, call todo first; one in_progress at a time; don't restate the plan."
  2. ## Execution Bias — hardcoded, mirrors OpenClaw's buildExecutionBiasSection verbatim. Negative reinforcement: "do not finish with a plan/promise when tools can move it forward."
  3. WORKSPACE.mdSOUL.mdUSER.mdTOOLS.mdTOOLCALL.mdAGENTS.md (## ⚠️ When to plan — behavioural triggers only; the contract is upstream).
  4. MODE-AUTO.md / MODE-PLAN.md / MODE-ASK.md (one, by active permission mode).
  5. # SKILLS summary block.
  6. # EXTERNAL TOOLS — MCP / ACP block (when configured).
  7. # DEFERRED TOOLS catalog (names + 1-line descriptions; full schemas loaded via tool_search).

Below the cache boundary:

  • # WORKSPACE-CONTEXT (frontend, cwd).
  • # MEMORY.md (curated long-term memory; main session only).
  • # HEARTBEAT.md (heartbeat session only).

The ## Tooling + ## Execution Bias reordering (2026-05-07) was driven by an observed compliance gap: a Qwen3.6-27B-BF16 session on SGLang ignored the planning rule that the same prompt's FP8/vLLM session followed. The rule had been buried at byte ~13 K of a 26 K stable head. With the new layout it sits at byte 0.

Single-process agent gateway

rtxclaw is moving from "one daemon per agent" to "one gateway, many child subprocesses":

                rtxclaw gateway (single parent process)
                ├─ binds 1 ACP HTTP listener (default :20100)
                ├─ child manager: lazy-spawn, idle-timeout, crash-respawn
                └─ /acp/<agent_name>/...  →  child stdio
                                              │
                ┌─────────────────────────────┼─────────────────────────────┐
                ▼                             ▼                             ▼
            main child                  scraper child         …      agent N child
        (rtxclaw agent acp-stdio,    (rtxclaw agent acp-stdio,   (per-agent process,
         CoreBackend(main))            CoreBackend(scraper))      lazy-spawned on first
                                                                  session/new for that name)
  • ACP-compliant. The gateway is itself an ACP server externally (existing make_acp_app). Each child is an ACP server over stdio (existing acp_stdio_main). The gateway is a per-session multiplexing proxy — no protocol change.
  • Bootstrap context filtered per child. Sub-agents only get AGENTS.md + TOOLS.md for context economy (mirrors OpenClaw).
  • Lazy spawn. Children come up on first session/new for their agent name; idle-timeout (default 30 min) reaps them. Bounded concurrency (default 16 live children) so 100+ agent types don't all run at once.
  • Cancel cascade. Gateway-level abort (SIGTERM, gateway stop) fans out to every live child. Per-session ACP cancel(parent_session_id) cancels the parent's turn AND any subagent sessions the parent has spawned.
  • Migration path. Existing rtxclaw agent start <name> keeps working as a standalone daemon during transition. Gateway mode is opt-in via rtxclaw gateway start.

Abort & cancel scopes

One of the reasons rtxclaw exists. With a hosted assistant you can't selectively kill just this branch of work — you abort the chat or you don't, and the moment you abort you also lose every parallel thread the agent had going. rtxclaw has two cleanly separated abort scopes; nothing in between, on purpose.

Narrow: session/cancel (per-session ACP cancel).

Cancel an in-flight prompt on one session. What dies:

  • the turn currently running in that session
  • every subagent session that turn spawned (via delegate_agent / future gateway-aware subagent), regardless of which agent child hosts them — so a main session that fanned work out to scraper and researcher cancels all three with one call

What survives:

  • every other session on the same agent (parallel chats keep going)
  • every session on every other agent (other agents are untouched)
  • the agent child processes themselves (warm, ready for the next turn)

This is the granularity an operator actually needs. "Stop this idea, keep everything else running" works without bringing down adjacent work.

Broad: rtxclaw gateway stop (SIGTERM the gateway).

The single big-red-button. SIGTERM the gateway parent → ChildManager.shutdown() SIGTERMs every live agent child within a 10 s deadline → every session on every agent dies. Use when something is genuinely wedged at the host level.

Why no per-agent middle ground. A "kill just main, leave scraper running" command would solve a problem session/cancel already covers — if a session on main is misbehaving, cancel that session. The agent child process itself is cheap (idle-reaped at 30 min by default), so killing the whole child to abort one session is throwing away warm state for no reason. We can add gateway kill <agent> later if a real use case shows up; today it'd be a footgun more than a feature.

Cleanup is OS-level, not best-effort.

  • Each child agent runs in its own process group (start_new_session=True).
  • Each monitor_start-spawned process runs in its own process group.
  • SIGTERM at every layer escalates to SIGKILL after a grace period.
  • A wedged child cannot block gateway shutdown — ChildManager.shutdown() is bounded; orphaned children get killed by the OS when the gateway exits.

You always know what dies and what doesn't.

Subagent infrastructure

A new subagent tool lets one agent delegate a self-contained sub-task to another agent on the same gateway. The parent calls subagent(agent="researcher", prompt="…", isolation="…") → gateway opens a fresh ACP session against the named child → returns a single result message back to the parent. The child's session is filtered down to AGENTS.md + TOOLS.md only, with a narrow tool allowlist passed by the parent. Child sessions cancel-cascade with the parent.

This is distinct from delegate_claude / delegate_codex (those route to external Claude / Codex CLIs). subagent is rtxclaw-on-rtxclaw, all local, all sovereign.

Monitor tool

Long-running background process registry. Four model-facing tools:

  • monitor_start(command, cwd?) — spawn the command in its own process group, return a monitor_id.
  • monitor_read(monitor_id, max_lines=100, timeout_s=2.0) — pop unread lines from the buffer; if buffer is empty, waits up to timeout_s for the next line OR for the process to exit.
  • monitor_stop(monitor_id) — SIGTERM the process group, escalate to SIGKILL after a grace period, return the exit code.
  • monitor_list() — every live monitor + its buffer state. Reads the sidecar so monitors started in earlier tool rounds still surface.

Plus two operator-facing surfaces in the TUI:

  • /monitors — opens a navigable modal panel: ↑/↓ to navigate, x to SIGTERM the selected PID, r to reload, q/Esc to close. Refreshes itself every 1.5 s so processes that exit elsewhere disappear without manual reload.
  • monitor_list (model-facing) prints the same data as a one-shot text dump for non-interactive review.

v1 is pull-based (the model polls); the buffer is an in-memory rolling window (~5 000 lines per monitor). The push-based variant — each line becomes a session-update notification that wakes the agent between rounds — is a follow-up that needs gateway integration.

Cross-subprocess persistence. Tool runners run in short-lived subprocesses; v1's in-process registry would die between calls. To survive, every monitor_start writes an entry to a per-session sidecar at <sessions_dir>/<parent_sid>.monitors.json. The actual subprocess keeps running across tool rounds because it was spawned with start_new_session=True and is reparented to init when its launcher exits. monitor_list, monitor_stop, and the TUI panel all read the sidecar and recompute liveness via os.kill(pid, 0), so a process that died after its launcher subprocess exited still gets listed correctly. Each monitor runs in its own process group so monitor_stop reaps any subcommands the shell spawned.

Subagents

/subagents (TUI command) lists every subagent the active session has spawned via delegate_agent (rtxclaw → rtxclaw), delegate_claude (rtxclaw → Claude Code), or delegate_codex (rtxclaw → Codex CLI). Each entry shows the kind, target, child session id, and last activity. /subagents <N> enters the Nth subagent — for delegate_agent, that's a TUI-level /agent <target> switch into the child agent's transcript; for the external bridges it surfaces the bridge's own session id so the operator can re-attach via that bridge's CLI.

The continuity is automatic: every delegate call persists its child session id to a sidecar at <sessions_dir>/<parent_sid>.delegate_<kind>[.<target>].json, and the next delegate call from the same parent resumes the same child. /subagents is the operator's view into that persisted graph.

Logging

Every tool call is instrumented to append a structured event line to the agent's gateway.log (<RTXCLAW_AGENT_HOME>/logs/gateway.log). Format:

2026-05-07T13:39:56Z MONITOR_START monitor_id="2dae1ed81a36" pid=2377452 command="…" cwd=null
2026-05-07T13:39:56Z MONITOR_EXIT  monitor_id="2dae1ed81a36" exit_code=0 buffered_lines=24 unread=12 cancelled=false
2026-05-07T13:39:56Z MONITOR_STOP  monitor_id="2dae1ed81a36" cross_subprocess=true was_alive=true pid=2377452
2026-05-07T13:39:57Z DELEGATE_AGENT_START   target="scraper" parent_session="69fc…" task_chars=82
2026-05-07T13:39:58Z DELEGATE_AGENT_SPAWNED target="scraper" child_pid=2378001 cwd="/home/…"
2026-05-07T13:40:05Z DELEGATE_AGENT_DONE    target="scraper" stop_reason="end_turn" reply_chars=482

Greppable by event prefix. Best-effort writes — a logfile that's been rolled away or is unwritable will not break the tool call. The session JSON remains the authoritative trace; this log is auxiliary telemetry for cross-call debugging.

Reliability — Telegram bot ↔ gateway recovery

The Telegram bot (rtxclaw-telegram.service) and the agent gateway (rtxclaw-gateway.service) are independent systemd units. They can restart in either order without bringing each other down — but only because the bot now treats every gateway-side identity as "live until proven otherwise" and re-establishes anything that disappeared.

Three failure modes the bot now tolerates without operator intervention:

  1. Idle reap of a chat session. Children GC sessions after child_idle_timeout_s (default 1800 s). A chat that goes quiet for 30+ minutes returns to a gateway that has never heard of its sid. Before recovery: session_prompt raised LookupError: no live child owns session … and the bot returned ⚠️ agent error — see telegram.log. Now the bot calls session/load (or session/new as a fallback), updates the chat's persisted sid, and re-issues the prompt — the user sees the model answer, not the error.
  2. Gateway parent restart between turns. A redeploy / systemctl restart rtxclaw-gateway / OOM kill resets the in-memory child.sessions map. Same symptom as (1), same fix path.
  3. SSE stream torn down by gateway crash. The bot's AcpClient holds a long-lived GET SSE pump. If the gateway dies, the TCP socket lands in CLOSE-WAIT and the bot's read loop exits — but the cached client used to stay in self._clients, so the next prompt would either hang on a dead future or fail with AcpTransportClosed on every retry. The fix has three parts:
    • AcpClient._read_loop now flips self._closed = True in its finally, not only close().
    • AcpClient._reserve_call rejects calls on a closed client with AcpTransportClosed("client is closed") instead of allocating a future the (already-exited) read loop will never satisfy.
    • The bot's _get_or_open_client evicts cached clients with _closed=True and reopens; the _dispatch_blocks recovery path also drops the dead client mid-turn before re-running the prompt.

The recovery is bounded: at most one retry per turn, and only when nothing has streamed yet (no content / thoughts / tool headers). A failure mid-stream surfaces normally — re-running would emit duplicate output to the user.

The /agent <name> command verifies the saved session up-front via session/load before promising "Resuming saved session …". An invalid binding now silently rolls over to a fresh session at switch time instead of failing on the first message after the switch.

Group-chat reply gates

Two <telegram_home>/config.json knobs control when the bot speaks up in shared groups. Both default to "off" so an existing single-operator deployment behaves exactly as before.

{
  "allowed_chat_ids":  [-1003916299625],   // chat-level allowlist (existing)
  "allowed_user_ids":  [8484692594],       // NEW — per-user allowlist (silent drop)
  "mention_required_in_groups": true       // NEW — only reply when @-mentioned in groups
}

allowed_user_ids (silent per-user drop) — when non-empty, a message in an allowed chat still has to come from one of these from.ids. Different members of a group fall through silently (no "you're not allowed" reply, because that would advertise the bot to everyone reading the chat). Empty list ⇒ the gate is disabled and every member of an allowed chat may interact, matching the legacy behavior.

mention_required_in_groups (silent group gate) — when true, in non-private chats the bot ignores everything that doesn't address it. Three signals count as "addressed":

  1. @<botname> anywhere in the raw text (covers /cmd@<botname> and conversational mentions).
  2. The message is a direct reply (reply_to_message) to one of the bot's own messages — Telegram's swipe-to-reply convention.
  3. A text_mention entity targets the bot's user id (a Telegram client mentioned the bot without typing the @handle, e.g. tap-to-mention from the member list).

DMs (chat.type == "private") are always exempt from this gate — there's no other recipient there to confuse, so requiring a mention would just be friction.

Both knobs are settable via rtxclaw configure:

rtxclaw configure --telegram-allowed-user-ids "8484692594,123456789"
rtxclaw configure --telegram-mention-required on
# Clear the per-user list (revert to "any member of an allowed chat"):
rtxclaw configure --telegram-allowed-user-ids ""

The bot reads config at startup; restart with sudo systemctl restart rtxclaw-telegram.service after editing.

Diagnosing the recovery path

When the bot exercises any of the three fallbacks above it logs to ~/.rtxclaw/telegram/logs/telegram.log:

WARNING session_prompt: gateway lost session chat=… sid=… — attempting reopen (…)
INFO    session reopen via session/load chat=… sid=…           # success: same sid restored
INFO    session reopen via session/new chat=… old_sid=… new_sid=…   # fallback: new sid bound
WARNING evicting closed ACP client for chat_key=…; reopening   # transport-died path

Operator runbook for "bot replied agent error once and then started working":

  1. Find the message in telegram.log — recovery emits the WARNING + INFO above.
  2. If the gateway side reaped the session, check child_idle_timeout_s in ~/.rtxclaw/config.json (default 1800 s); raise it if 30 min is too short for the conversation cadence.
  3. If the gateway parent itself died, journalctl -u rtxclaw-gateway.service --since "10 min ago" shows whether it was a clean restart, a SIGTERM, or a SIGKILL. A SIGKILL with no kernel OOM in dmesg and no MemoryMax set usually means manual systemctl restart during dev work.

MemoryMax on the gateway unit is infinity by default — there is no in-process memory cap. If you want one, set it under [Service] in deploy/systemd/rtxclaw-gateway.service (e.g. MemoryMax=2G); the recovery logic above handles a cgroup-OOM kill identically to a manual restart.

The vision

rtxclaw is not a neocloud wrapper. rtxclaw is not a dependency engine. rtxclaw is not permissioned intelligence.

rtxclaw is sovereign inference infrastructure for the agent era.

It is a system where:

  • your data stays under your control
  • your agents adapt to your hardware reality
  • your costs are shaped by intelligent routing, not vendor extraction
  • your stack remains understandable enough to audit and modify
  • your business does not collapse because someone else throttled access to intelligence

The AI future will not belong only to the largest labs. It will belong to those who can route, compress, adapt, and deploy intelligence with discipline.

rtxclaw is built for that future.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rtxclaw-0.1.0.tar.gz (954.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

rtxclaw-0.1.0-py3-none-any.whl (842.9 kB view details)

Uploaded Python 3

File details

Details for the file rtxclaw-0.1.0.tar.gz.

File metadata

  • Download URL: rtxclaw-0.1.0.tar.gz
  • Upload date:
  • Size: 954.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for rtxclaw-0.1.0.tar.gz
Algorithm Hash digest
SHA256 9ffe516782f764dde93e68f0f8163a4a4d2062c87e620e8a31e797eab35ecbe9
MD5 909cfe50ccc8c98da5d8a12c9e50b2d6
BLAKE2b-256 c8f73c395d2f71e43d49879294b11b5bbebf0e4ea72d442e5b31f44f287a4bd0

See more details on using hashes here.

File details

Details for the file rtxclaw-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: rtxclaw-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 842.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for rtxclaw-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 0e0da3aea58feb7b679219313f3ae327e1367b1fc313b473d17f50c1af3bbc2a
MD5 2663c2d6a6152aba3b5083d21a0fb987
BLAKE2b-256 6a2e395c63fb84645599693af9335a9a5f0c9508303f5fe237ced7aae91a85e3

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page