The agent loop, embodied. A tool-using runtime + fail-closed safety pipeline + sim-first execution harness for embodied AI.
Project description
The agent loop, embodied.
A tool-using agent runtime, fail-closed safety pipeline, statistically-rigorous bench harness, and sim-first execution layer for embodied AI. Sister project to GhostLM.
Why this exists
Robotics in 2026 has two healthy ecosystems and a missing middle.
- ROS 2 gives you middleware: a message bus, lifecycle management, drivers, navigation. It does not care about LLMs, agents, or modern eval methodology.
- VLA models (Open-X-Embodiment, OpenVLA, π0, RT-2) give you policies: vision-and-language conditioned action heads. They mostly live in research codebases that ship the model weights but not the runtime.
Nobody ships the layer in between: a runtime where a model emits high-level intents like move_to(0.4, 0.2, 0.1) or pick("widget-7"), those intents flow through a fail-closed safety pipeline, the survivors execute on a backend (sim or hardware), and every step is captured in a structured trace that can be replayed, audited, scored, mined, counterfactually re-played, or causally analysed.
That layer is ghostloop. The shape is borrowed from GhostAgent in GhostLM: tool registry, policy gates, structured trace, paired-comparison eval. The novel piece is binding it to robot primitives instead of CVE lookups, making the runtime backend-agnostic so the same agent loop drives a mock today, MuJoCo / PyBullet / Gymnasium right now, and ROS 2 / direct hardware later — and adding a layer of post-hoc analysis tooling (counterfactual replay, causal attribution, LLM-as-judge, property mining, adversarial fuzzing) that no other robotics framework ships.
Architecture
policy registry pipeline backend post-hoc
emits resolves gates executes analysis
user goal ┌────────┐ ┌──────────┐ ┌──────────────┐ ┌──────────┐ ┌─────────────────┐
────────► │ Intent │ ► │ Primitive │ ► │PolicyPipeline│ ► │ Backend │ ► │ counterfactual │
└────────┘ └──────────┘ └──────────────┘ └──────────┘ │ causal │
│ │ LLM-judge │
▼ │ property mining │
┌──────────────────────┐ │ adversarial │
│ Trace (JSONL) │ ───► │ trace query DSL │
└──────────────────────┘ │ energy ledger │
│ └─────────────────┘
▼
┌────────────────────────────────────┐
│ Bench: Wilson CI + McNemar + │
│ Cohen's h + Sim2Real transfer gap │
└────────────────────────────────────┘
| Type | Role |
|---|---|
Intent |
High-level structured command emitted by a policy: name, args, rationale. |
Primitive |
Backend-bound callable. Has a name, description, arg schema (LLM-tool-card friendly). |
PolicyPipeline |
Ordered list of PolicyGates. Fail-closed: any deny short-circuits. |
Backend |
Execution adapter. MockBackend / MuJoCoBackend / PyBulletBackend / GymnasiumBackend / ROS2Backend / RandomizedBackend. |
Trace |
Append-only event log with state_before / state_after / decision / result per step. JSONL writer + replay + query DSL. |
LLMPolicy / VLAPolicy |
Bridge any OpenAI-compatible chat endpoint or VLA action head to the registry. |
Mission |
DAG of Steps with prerequisites + retry semantics. Kahn-validated. |
bench |
Episode harness with Wilson 95% CIs, McNemar exact p, Cohen's h, paired comparison, sim2real transfer gap. |
properties |
Declarative invariants over traces — Always / Eventually / Until STL combinators + auto-mined candidates. |
judges |
LLM-as-judge + heuristic rule-based trace scoring. |
training |
Constrained-MDP rollout collector + Lagrangian multiplier + HER relabeling. |
What ships in v0.10.0
70+ modules across ten releases. Highlights:
Core runtime
13 abstractions in core.py (Intent / Primitive / Result / Decision / PolicyGate / PolicyPipeline / Backend / MockBackend / TraceEvent / Trace / Runtime / Registry / DecisionAction). async_runtime.py mirrors them with awaitable gates + a control_loop(rate_hz).
Policy gates (12)
DenyListGate, RateLimitGate, GeofenceGate, ForceCapGate, HumanInTheLoopGate, ObstacleAvoidanceGate, RetryPolicy, CooldownGate, TimeWindowGate, ActionSmoothingGate (velocity / acceleration limits), plus the LLMPolicy and VLAPolicy adapters. All fail-closed.
Backends (6)
MockBackend— zero-install in-memory.MuJoCoBackend— Google DeepMind MuJoCo with Menagerie auto-clone (Franka / UR5e / Spot / Stretch / Aloha / Allegro).PyBulletBackend— Bullet physics for users without MuJoCo.GymnasiumBackend— wrap any Farama Gymnasium env (hundreds of robotics + RL envs).ROS2Backend— rclpy adapter for real-hardware deployments via DDS.RandomizedBackend— wrap any backend with reproducible noise / jitter / dropout for sim2real.
Workspace + geometry
WorkspaceModel with axis-aligned boxes + spheres, HalfSpace / ConvexPolytope / signed_distance for SDF queries, workspace_from_urdf(...) to auto-build from a URDF, plus project_to_workspace / project_to_sdf for safe-action repair when a policy violates constraints.
Bench harness
Episode/EpisodeRunner/RunReportwith Wilson 95% CIs.paired_compare— McNemar exact p + Cohen's h.Sim2RealBench— paired transfer-gap harness with per-primitive action-distribution KL.random_seeds/grid_seeds/cma_es_seeds— adversarial fuzzers for finding failure-prone Episode initial states.RewardShaper— declarative reward DSL (OnPrimitive/OnDecision/OnObservation/StepCost/CustomReward).- Episode catalogue:
preset_reach_8/preset_pick_and_place_4/preset_geofence_smoke.
Properties + verification
PropertyEnginewith built-in invariants (StaysInsideWorkspace,NeverHoldsTwoObjects,NeverExceedsRate,NoConsecutiveDuplicateIntents).Always/Eventually/UntilSTL combinators over sliding windows.AndProperty/OrProperty/NotPropertyboolean combinators.mine_properties(traces)— auto-discover candidate invariants from a corpus (followup transitions, numeric bounds, workspace AABBs).
Post-hoc analysis (the v0.10 novel pillars)
replay_with_policy(trace, new_policy)— counterfactual reasoning. "What would policy B have done on policy A's trace?"attribute_failure(trace, property)— leave-one-out causal attribution; ranks events by necessity.minimal_cause_set— greedy multi-event attribution.LLMJudge— score traces with an LLM against a configurable rubric.HeuristicJudge— rule-based predicate scoring for air-gapped CI.
Missions + skills
Mission/Step/MissionRunner— DAG of steps with prerequisites, retry semantics, required-vs-optional.SkillGraph— typed DAG of skills with prereq + refines edges.MorphologyRegistry— registerpickper(franka, ur5e, spot)and build per robot.composite_primitive— sequence existing primitives behind one name.
Training (constrained-MDP + HER)
SafeRolloutCollector+LagrangianMultiplier+train_safe— train policies under the safety pipeline; safety violations contribute to a Lagrangian penalty.hindsight_relabel(rollout, goal_extractor, reward_fn)— classic HER (Andrychowicz et al. 2017) with FINAL / FUTURE / EPISODE / RANDOM strategies.sparse_indicator_reward(threshold)— canonical HER reward.
Telemetry + persistence
- OpenTelemetry hooks (
step_span,record_decision,record_result). EnergyLedger— per-primitive joule accounting with constant / linear-in-arg / linear-in-duration / linear-in-xyz models.GhostloopStore— SQLite store for episodes / runs / comparisons.Trace.write_jsonl()+load_trace()+iter_events()+summarize_trace().query(trace, expr)— small DSL over traces (comparison ops +and/or/not/in).diff_traces(a, b)— structured diff for ablation studies.
Fleet + dashboard
RobotHandle/FleetRegistry/FleetDispatcher(FIRST_IDLE / ROUND_ROBIN / LEAST_BUSY).create_dashboard_app(store, fleet)— read-only FastAPI surface over the SQLite store.StreamManager+attach_streaming(app)— WebSocket trace streaming with bounded ring buffers.
MCP + LLM integration
mcp_server.pyexposes the runtime as a FastMCP server so Claude Desktop / Cursor / any MCP client can drive a robot through the safety pipeline.LLMPolicy(closed-loop) andLLMPlanner(single-shot full-plan emission).
Setup
There are three ways to run ghostloop, in order of effort. Start with the zero-install path, prove the safety pipeline, then promote to your real arm.
1. Zero-install (3 minutes)
git clone https://github.com/joemunene-by/ghostloop
cd ghostloop
# Run the canonical pick-and-place demo on MockBackend.
PYTHONPATH=. python3 examples/pick_and_place.py
# Run a paired-comparison bench (Wilson CI + McNemar + Cohen's h).
PYTHONPATH=. python3 examples/bench_with_without_geofence.py
# Full test suite — 314 pass, 8 live-gated (skip cleanly without extras).
PYTHONPATH=. python3 -m pytest tests/
No dependencies beyond Python 3.10+. This proves the runtime, the safety pipeline, the bench harness, and the trace recorder — exactly the same code you'll point at a real arm later.
2. Drive any robot from any chat client over MCP (10 minutes)
ghostloop ships a single MCP server (examples/mcp_robot.py) that works with every MCP-aware client — the protocol is universal, so the same server speaks to Claude Desktop, Cursor, Continue, Cline, Zed, Gemini CLI, and any future client. Pick what you control via GHOSTLOOP_PROFILE:
| Profile | Robot | Primitives exposed |
|---|---|---|
franka_arm (default) |
Franka Panda 7-DOF arm | set_joint, set_gripper, sense, take_photo, … |
turtlebot |
TurtleBot mobile base | drive, stop, goto, rotate, … |
spot |
Boston Dynamics Spot quadruped | walk_to, sit, stand, lie_down, … |
tello |
DJI Tello / quadcopter | takeoff, land, fly_to, hover, … |
stretch |
Hello Robot Stretch RE3 (mobile arm) | drive, set_joint, set_gripper, … |
humanoid_demo |
Stationary humanoid | wave, look_at, point_at, nod |
<path/to/your.yaml> |
Your robot | whatever you declare |
Each preset bundles morphology-appropriate primitives, conservative workspace + force + velocity caps, HITL on the dangerous primitives, and a robot-specific instructions block the LLM gets as a system prompt. See examples/custom_robot.yaml for the YAML schema and examples/custom_robot_primitives.py for how to plug in your own actions (dispense_pill, alert_nurse, whatever your hardware does) without forking ghostloop.
Two transports, picked via GHOSTLOOP_TRANSPORT:
| Transport | When to use | Clients |
|---|---|---|
stdio (default) |
desktop, same machine | Claude Desktop, Cursor, Continue, Cline, Zed, Gemini CLI |
streamable-http |
remote, mobile, browser, kiosk | any client supporting remote MCP servers |
Step 1. Verify the example boots (any OS, any profile):
# Default (Franka arm)
python3 examples/mcp_robot.py --selfcheck
# Quadruped
GHOSTLOOP_PROFILE=spot python3 examples/mcp_robot.py --selfcheck
# Drone
GHOSTLOOP_PROFILE=tello python3 examples/mcp_robot.py --selfcheck
# Custom YAML
GHOSTLOOP_PROFILE=examples/custom_robot.yaml python3 examples/mcp_robot.py --selfcheck
Step 2. Install the MCP transport package:
pip install ghostloop[mcp] # or: pip install mcp
Step 3. Wire it into your client. The same { command, args, env } block works for every desktop MCP client — only the path to the config file differs:
| Client | macOS | Windows | Linux |
|---|---|---|---|
| Claude Desktop | ~/Library/Application Support/Claude/claude_desktop_config.json |
%APPDATA%\Claude\claude_desktop_config.json |
~/.config/Claude/claude_desktop_config.json |
| Cursor | ~/.cursor/mcp.json (or project-local .cursor/mcp.json) |
%USERPROFILE%\.cursor\mcp.json |
~/.cursor/mcp.json |
| Continue | ~/.continue/config.yaml (under mcpServers:) |
same | same |
| Cline | VS Code settings.json → cline.mcpServers |
same | same |
| Zed | ~/.config/zed/settings.json (under context_servers) |
same | same |
| Gemini CLI | ~/.gemini/settings.json (under mcpServers) |
same | same |
Paste this block into the config file (replace the absolute path; pick a profile that matches your robot):
{
"mcpServers": {
"ghostloop": {
"command": "python3",
"args": ["/absolute/path/to/ghostloop/examples/mcp_robot.py"],
"env": {
"GHOSTLOOP_PROFILE": "franka_arm",
"GHOSTLOOP_BACKEND": "mock",
"GHOSTLOOP_TRANSPORT": "stdio",
"GHOSTLOOP_INSTRUCTIONS": "Optional: extra robot-specific guidance appended to the profile's instructions block."
}
}
}
}
💡 On Windows, swap
python3forpython(or the absolute path to your interpreter, e.g.C:\Python313\python.exe). On macOS, usepython3from Homebrew or pyenv. Continue + Zed use YAML / JSONC respectively but the field shape is identical.
Step 4. Restart the client. New conversations get the tools: list_primitives, step, move_to(x, y, z), pick(object_id), place(), scan(radius), state, recent_trace(n). Try: "Use the ghostloop tools to move to (0.4, 0.0, 0.5), then scan with radius 0.3, then move to (0.6, 0.2, 0.5)." Watch the geofence reject targets outside [-0.6, 0.6].
Upgrade from mock to real physics (MuJoCo) — one env var:
"env": {
"GHOSTLOOP_BACKEND": "mujoco",
"GHOSTLOOP_MUJOCO_MODEL": "/absolute/path/to/franka_panda.xml"
}
(pip install ghostloop[mujoco] first.)
Upgrade to a real arm via ROS 2:
"env": {
"GHOSTLOOP_BACKEND": "ros2",
"GHOSTLOOP_ROS2_NODE": "ghostloop_arm",
"GHOSTLOOP_CMD_VEL": "/franka/cmd",
"GHOSTLOOP_JOINT_STATES": "/franka/joint_states",
"GHOSTLOOP_FORCE_TORQUE": "/franka/wrench"
}
Prerequisites: ROS 2 (apt install ros-humble-desktop on Ubuntu, the Robotology Mac install on macOS, WSL2 + Ubuntu on Windows), your arm's ROS 2 driver running, and source /opt/ros/humble/setup.bash (or setup.zsh / setup.ps1) in the same shell that launches the client so the subprocess inherits $AMENT_PREFIX_PATH.
⚠ Before pointing this at a real robot: edit your profile (or copy a preset to YAML and tweak it) — set
workspace_bounds/max_force_n/max_velocity/max_accelerationto your hardware's safe envelope, list dangerous primitives underhitl_primitivesso the operator approves each call interactively, and write robot-specific guidance into theinstructions:block (e.g. "never reach behind the base", "battery below 20% triggers automatic land"). Read the trace logs for the first dozen episodes; relax HITL only after you trust the model's behaviour.
Define your own robot
Two ways to add a robot ghostloop doesn't already know about:
A. YAML profile (no Python required) — copy examples/custom_robot.yaml, edit it, and point GHOSTLOOP_PROFILE at the path. The schema covers categories of standard primitives, your own custom primitives, composite macros, instructions for the LLM, workspace + force + velocity caps, denied / HITL operations, and the backend kind. The shipped sample defines a hospital medication-delivery robot — mobile base + arm, with custom dispense_pill and alert_nurse primitives and a deliver_room macro composed from existing primitives:
name: medbot_floor3
morphology: mobile_arm
categories: [mobile_base, dexterous, sensing, generic]
instructions: |
You are MedBot, hospital floor-3 medication delivery. NEVER drive faster
than 0.4 m/s. ALWAYS stop before extending the arm. ...
workspace_bounds: [[-15, -15, 0], [15, 15, 1.6]]
max_velocity: 0.4
hitl_primitives: [set_gripper, dispense_pill]
custom_primitives:
- module: examples.custom_robot_primitives
factory: dispense_pill
- module: examples.custom_robot_primitives
factory: alert_nurse
composites:
- name: deliver_room
steps: [goto, take_photo, dispense_pill, alert_nurse]
backend:
kind: ros2
kwargs: { node_name: medbot, cmd_vel_topic: /medbot/cmd_vel }
B. Code — build a RobotProfile programmatically. Useful when your robot needs runtime state (a calibration matrix, a credential, dynamically-resolved topic names) that doesn't fit YAML:
from ghostloop.profiles import RobotProfile, build_runtime_from_profile
from ghostloop.primitives import drive, set_gripper
from my_robot.primitives import dispense_pill, alert_nurse
profile = RobotProfile(
name="medbot",
morphology="mobile_arm",
primitives=[drive(), set_gripper(), dispense_pill(), alert_nurse()],
instructions="You are MedBot...",
workspace_bounds=((-15, -15, 0), (15, 15, 1.6)),
max_velocity=0.4,
hitl_primitives=["dispense_pill"],
backend_kind="ros2",
backend_kwargs={"node_name": "medbot", "cmd_vel_topic": "/medbot/cmd_vel"},
)
runtime = build_runtime_from_profile(profile)
Custom Primitive factories follow a stable contract: a function returning Primitive(name, call, description, arg_schema). The call body talks to your hardware however you need it to — ROS 2 publisher, vendor SDK, raw serial, REST endpoint. See examples/custom_robot_primitives.py for two worked examples (dispense_pill, alert_nurse).
3. Mobile + remote MCP (HTTP transport)
For mobile chat apps (and any client that doesn't run on the same machine as the robot), run ghostloop's MCP server as a long-running HTTP service on the robot host:
# macOS / Linux
GHOSTLOOP_BACKEND=mock GHOSTLOOP_TRANSPORT=streamable-http \
GHOSTLOOP_HOST=0.0.0.0 GHOSTLOOP_PORT=8765 \
python3 examples/claude_desktop_mcp_arm.py
# Windows PowerShell
$env:GHOSTLOOP_TRANSPORT='streamable-http'; $env:GHOSTLOOP_HOST='0.0.0.0'; $env:GHOSTLOOP_PORT='8765'
python examples\claude_desktop_mcp_arm.py
Then configure remote-MCP-capable clients with the URL form (no command/args):
{
"mcpServers": {
"ghostloop": { "url": "http://your-robot-host.local:8765/mcp" }
}
}
Mobile MCP clients (Claude iOS once it ships remote MCP, plus the growing crop of third-party MCP-aware iOS / Android chat apps) connect via the same URL — no app-side install. For a custom mobile app, use any MCP TypeScript / Swift / Kotlin SDK from modelcontextprotocol.io. The HTTP wire format is the same.
⚠ Bind to
0.0.0.0only on a private network or behind authentication. The default127.0.0.1is loopback-only (safer). For internet-exposed setups, put a reverse proxy with TLS + auth in front, or use the production dashboard'sStaticTokenAuthpattern.
4. Without MCP — direct OpenAI-compatible function calling
Already have a model running and don't want to bother with MCP? examples/direct_llm_arm.py skips the protocol entirely and uses ghostloop's LLMPolicy to drive any OpenAI-compatible chat endpoint via native function calling. Tested against:
- OpenAI GPT-4o / GPT-4o-mini
- Anthropic Claude (via the OpenAI-compatible proxy endpoint)
- Google Gemini (via OpenAI-compatible adapter)
- Groq (Llama 3.x, DeepSeek, Mixtral)
- Ollama (local Qwen, Llama, Mistral, GhostLM)
- vLLM + llama.cpp server + GhostLM's multi-vendor server
OPENAI_BASE_URL=https://api.openai.com/v1 OPENAI_API_KEY=sk-... \
OPENAI_MODEL=gpt-4o-mini \
python3 examples/direct_llm_arm.py
# Or local Ollama:
OPENAI_BASE_URL=http://localhost:11434/v1 OPENAI_API_KEY=ollama \
OPENAI_MODEL=qwen2.5:14b \
python3 examples/direct_llm_arm.py
Same Backend choice (Mock / MuJoCo / ROS 2), same safety pipeline, same trace recorder. Only the LLM-to-tool plumbing differs: in-process via LLMPolicy instead of MCP wire protocol.
5. Run programmatically
For everything else — bench harnesses, training loops, post-hoc analysis — use ghostloop as a library. Examples below.
Library API examples
Run an LLM-driven episode
from ghostloop import Intent, MockBackend, PolicyPipeline, PrimitiveRegistry, Runtime
from ghostloop.policies import GeofenceGate, LLMPolicyConfig, llm_policy_loop
from ghostloop.primitives import move_to, pick, place, scan
registry = PrimitiveRegistry([move_to(), scan(), pick(), place()])
runtime = Runtime(
backend=MockBackend(),
registry=registry,
policy_pipeline=PolicyPipeline(gates=[
GeofenceGate(min_corner=(-1, -1, 0), max_corner=(1, 1, 1)),
]),
)
summary = llm_policy_loop(
registry=registry,
runtime=runtime,
goal="Pick widget-7 from (0.4, 0.2, 0.1) and place it at (-0.4, 0.2, 0.1).",
config=LLMPolicyConfig(base_url="http://localhost:11434/v1", model="qwen2.5:14b"),
max_steps=16,
)
runtime.trace.write_jsonl("episode.jsonl")
Drive a real physics simulation
from ghostloop import PolicyPipeline, PrimitiveRegistry, Runtime, Intent
from ghostloop.backends import MuJoCoBackend
from ghostloop.backends.mujoco import move_to, scan
backend = MuJoCoBackend(model_path="franka_panda.xml", end_effector="hand")
registry = PrimitiveRegistry([move_to(), scan()])
runtime = Runtime(backend=backend, registry=registry, policy_pipeline=PolicyPipeline())
runtime.step(Intent("move_to", {"x": 0.4, "y": 0.0, "z": 0.5, "duration": 1.0}))
runtime.step(Intent("scan", {"radius": 0.5}))
Models from the MuJoCo Menagerie drop in directly: Franka Panda, UR5e, Stretch RE3, Allegro hand, Spot, Aloha bimanual.
Counterfactual replay — "what would the new policy have done?"
from ghostloop.counterfactual import replay_with_policy
from ghostloop.traces import load_trace
original = load_trace("episode.jsonl")
def new_policy(state_before):
# any callable mapping state -> Intent | None
return Intent("scan", {"radius": 0.3})
cf = replay_with_policy(original, new_policy, new_policy_name="more-cautious")
print(cf.divergence_rate, cf.first_divergence_step)
print(cf.render_md())
Causal failure attribution
from ghostloop.causal import attribute_failure, minimal_cause_set
from ghostloop.properties import StaysInsideWorkspace
prop = StaysInsideWorkspace(min_corner=(-1, -1, 0), max_corner=(1, 1, 1))
analysis = attribute_failure(failing_trace, prop)
print(analysis.render_md()) # ranked top-K root causes
cause_set = minimal_cause_set(failing_trace, prop, max_set_size=3)
LLM-as-judge
from ghostloop.judges import LLMJudge, LLMJudgeConfig
class GhostLMClient:
def chat(self, messages, **kwargs):
# adapt your chat endpoint here
...
judge = LLMJudge(client=GhostLMClient(), config=LLMJudgeConfig(model="ghostlm-v0.9-chat"))
judgement = judge.score(trace)
print(judgement.label, judgement.score, judgement.rubric_scores)
Adversarial fuzzing
from ghostloop.bench import cma_es_seeds
def perturb(base_episode, sample):
# return a copy of base_episode with backend initial state shifted by `sample`
...
results = cma_es_seeds(
base_episode, perturb,
parameter_ranges={"x0": (-1.0, 1.0), "y0": (-1.0, 1.0)},
n_iterations=8, population_size=8, seed=42,
)
worst = results[:5] # promote into your regression bench
Property mining
from ghostloop.properties import mine_properties
corpus = [load_trace(p) for p in successful_traces_paths]
candidates = mine_properties(corpus, min_support=0.9)
for mp in candidates:
print(mp.pattern, mp.description, mp.support)
promoted = mp.promote() # a real Property ready for the engine
Sim-to-Real bench
from ghostloop.bench import Sim2RealBench
bench = Sim2RealBench(
sim_episodes=eps_sim,
real_episodes=eps_real,
sim_label="mujoco", real_label="randomized_mujoco",
)
report = bench.run()
print(report.render_md()) # transfer gap + McNemar + KL action-distribution
Energy ledger
from ghostloop.telemetry import EnergyLedger
ledger = EnergyLedger()
print(ledger.total(trace), "J")
print(ledger.by_primitive(trace))
Skill graph + cross-embodiment
from ghostloop.skills import SkillGraph, skill_from_primitive
from ghostloop.primitives import MorphologyRegistry, move_to, scan
graph = SkillGraph()
graph.add(skill_from_primitive(move_to()))
graph.add(skill_from_primitive(scan(), prerequisites=["move_to"]))
graph.validate()
order = graph.topological_order() # ['move_to', 'scan']
reg = MorphologyRegistry()
reg.register("franka", "pick", franka_pick_factory)
reg.register("ur5e", "pick", ur5e_pick_factory)
prims = reg.build("franka", ["pick"]) # robot-specific primitives
Roadmap
| Version | Focus |
|---|---|
| v0.1.0 | Core abstractions, MockBackend, three policy gates, runnable demo, 23 tests |
| v0.2.0 | MuJoCoBackend, LLMPolicy adapter, bench harness with Wilson CIs + McNemar + Cohen's h, 64 tests |
| v0.3.0 | PyBulletBackend, async runtime, declarative properties engine, MCP server, scripted policies, 89 tests |
| v0.4.0 | ForceCap + HumanInTheLoop gates, episode catalogue, MuJoCo Menagerie auto-clone, replay/diff CLI, 110 tests |
| v0.5.0 | VLAPolicy adapter, sensor primitives + cameras, OpenTelemetry hooks, SQLite persistence, planner DSL, 142 tests |
| v0.6.0 | Fleet abstraction, FastAPI dashboard, LLMPlanner, RetryGate, observation buffer, property combinators, 182 tests |
| v0.7.0 | GymnasiumBackend, CooldownGate + TimeWindowGate, convex polytope SDF, composite primitives, Mission DAG runner, WebSocket trace streaming, 211 tests |
| v0.8.0 | STL temporal properties, URDF workspace builder, RandomizedBackend, trace query DSL, safe-RL harness with Lagrangian, 239 tests |
| v0.9.0 | ROS2Backend, ActionSmoothingGate, safe-action projection, reward shaper DSL, Sim2RealBench, 263 tests |
| v0.10.0 | Counterfactual trace replay, causal failure attribution, LLM-as-judge for traces, adversarial bench generator, property mining, skill graph, hindsight relabeling, energy ledger, cross-embodiment morphology registry, 296 tests |
| v1.0.0 (now) | RGB-D fusion + deproject_depth + BlobDetector + CameraProcessorPipeline, VLABenchmarkSuite + published-baseline catalogue (OpenVLA / π0 / RT-2 / Octo / Diffusion Policy / ACT), production fleet dashboard (StaticTokenAuth / RateLimiter / AlarmRegistry / Prometheus /metrics / livez+readyz), 314 tests |
Repository layout
ghostloop/
__init__.py public API surface, version
core.py Intent / Primitive / Runtime / Trace / Decision / Backend / MockBackend
async_runtime.py AsyncRuntime + control_loop(rate_hz)
observations.py ObservationBuffer (deque-based short-term memory)
store.py GhostloopStore — SQLite episodes / runs / comparisons
mcp_server.py FastMCP server exposing Runtime as MCP tools
counterfactual.py replay_with_policy + CounterfactualTrace (v0.10)
causal.py attribute_failure + minimal_cause_set (v0.10)
policies/
deny_list.py DenyListGate
rate_limit.py RateLimitGate
geofence.py GeofenceGate
force_cap.py ForceCapGate
human_in_the_loop.py HumanInTheLoopGate + cli_approver
workspace.py WorkspaceModel + ObstacleAvoidanceGate
sdf.py HalfSpace / ConvexPolytope / signed_distance (v0.7)
urdf.py workspace_from_urdf (v0.8)
cooldown.py CooldownGate (v0.7)
time_window.py TimeWindowGate + Window (v0.7)
smoothing.py ActionSmoothingGate + smooth_target (v0.9)
safe_projection.py project_to_workspace + project_to_sdf (v0.9)
retry.py RetryPolicy + transient-error helpers
llm.py LLMPolicy + LLMPolicyConfig + llm_policy_loop
vla.py VLAPolicy + DeltaXYZDecoder
primitives/
motion.py move_to / scan
manipulation.py pick / place
trajectory.py follow_trajectory + linear_interpolate
composite.py composite_primitive factory (v0.7)
morphology.py MorphologyRegistry — cross-embodiment (v0.10)
library.py cross-morphology primitive catalogue — (v1.0)
mobile_base / quadruped / humanoid / aerial /
dexterous / sensing / generic
profiles/ (v1.0)
core.py RobotProfile + YAML loader + runtime builder
presets.py franka_arm / turtlebot / spot / tello /
stretch / humanoid_demo
backends/
mujoco.py MuJoCoBackend (v0.2)
pybullet.py PyBulletBackend (v0.3)
gymnasium.py GymnasiumBackend (Farama Gym ecosystem) (v0.7)
ros2.py ROS2Backend (rclpy adapter) (v0.9)
randomized.py RandomizedBackend (sim2real wrapper) (v0.8)
menagerie.py MuJoCo Menagerie auto-clone (v0.4)
bench/
episode.py Episode + EpisodeRunner + EpisodeResult (v0.2)
report.py RunReport + wilson_ci + summarize (v0.2)
compare.py paired_compare + mcnemar_p + cohens_h (v0.2)
catalogue.py preset_reach_8 + preset_pick_and_place_4 + … (v0.4)
reward_shaper.py RewardShaper + OnPrimitive / OnDecision / … (v0.9)
sim2real.py Sim2RealBench + Sim2RealReport (v0.9)
adversarial.py random_seeds / grid_seeds / cma_es_seeds (v0.10)
properties/
core.py Property + PropertyEngine + Severity (v0.5)
builtins.py StaysInsideWorkspace / NeverHoldsTwoObjects/… (v0.5)
combinators.py AndProperty / OrProperty / NotProperty (v0.6)
temporal.py Always / Eventually / Until (STL) (v0.8)
mining.py mine_properties + MinedProperty (v0.10)
judges/
llm_judge.py LLMJudge + LLMJudgeConfig + parse_judgement (v0.10)
heuristic.py HeuristicJudge + rule predicates (v0.10)
skills/
graph.py SkillGraph + Skill + topological order (v0.10)
missions/
core.py Mission + Step + MissionRunner + MissionResult (v0.7)
fleet/
core.py RobotHandle + FleetRegistry + FleetDispatcher (v0.6)
dashboard/
api.py FastAPI factory + healthz + store endpoints (v0.6)
streaming.py StreamManager + WebSocket /ws/v1/stream (v0.7)
planning/
core.py TaskPlanner + TaskStep (v0.5)
builtin.py sequential_planner / fixed_plan (v0.5)
llm_planner.py LLMPlanner (single-shot full-plan emission) (v0.6)
sensors/
camera.py Camera Protocol + MockCamera + capture_camera (v0.5)
telemetry/
otel.py step_span + record_decision + record_result (v0.5)
energy.py EnergyLedger + PrimitiveEnergyModel (v0.10)
training/
core.py SafeRolloutCollector + LagrangianMultiplier (v0.8)
hindsight.py HER relabeling + sparse_indicator_reward (v0.10)
traces/
replay.py load_trace + iter_events + summarize_trace (v0.4)
diff.py diff_traces + StepDiff + TraceDiff (v0.6)
query.py query DSL with comparison ops + and/or/not/in (v0.8)
examples/
pick_and_place.py scripted end-to-end demo
bench_with_without_geofence.py paired-comparison demo
mcp_robot.py general MCP server — picks profile (v1.0)
via GHOSTLOOP_PROFILE; works with
arms, mobile bases, quadrupeds,
drones, humanoids, custom robots
claude_desktop_mcp_arm.py arm-specific MCP example (legacy) (v1.0)
claude_desktop_config.json cross-client + cross-OS config (v1.0)
reference (Claude Desktop / Cursor /
Continue / Cline / Zed / Gemini CLI)
custom_robot.yaml sample profile YAML — (v1.0)
hospital medication-delivery robot
with custom primitives + composites
custom_robot_primitives.py sample custom Primitive factories (v1.0)
(dispense_pill, alert_nurse)
direct_llm_arm.py direct OpenAI-compatible function (v1.0)
calling — works with OpenAI /
Anthropic / Gemini / Groq / Ollama
/ vLLM / GhostLM
tests/ 333 tests (8 live-gated)
test_core.py 23
test_llm_policy.py 14
test_bench.py 22
test_mujoco_backend.py 10
test_v03_additions.py 25
test_v04_additions.py 21
test_v05_additions.py 32
test_v06_additions.py 37
test_v07_additions.py 29
test_v08_additions.py 28
test_v09_additions.py 25
test_v10_additions.py 33
test_v10_v1_additions.py 18
test_profiles.py 19
assets/ brand mark + wordmark variants
docs/ architecture / migration / brand notes
Why this is novel
There are robot frameworks. There are agent frameworks. There is no robot framework that treats robots as a model with a tool registry, a fail-closed safety pipeline, a structured trace log, statistical bench rigor, AND a layer of post-hoc analysis (counterfactual replay, causal attribution, LLM-as-judge, property mining, adversarial fuzzing) — the same shape that's now standard for LLM-driven cybersec agents (secure-mcp, ghostguard, GhostAgent).
The thesis: as VLA models become the policy substrate, the runtime around them needs the same rigor we already apply to LLM tool use, plus the analytical tooling — counterfactuals, causal attribution, judge models — that LLM safety has been building for years. ghostloop is that runtime.
License
MIT. See LICENSE.
Built by Joe Munene at Complex Developers. Sibling to GhostLM, secure-mcp, ghostguard, CyberBench.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file ghostloop-1.0.1.tar.gz.
File metadata
- Download URL: ghostloop-1.0.1.tar.gz
- Upload date:
- Size: 212.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5befc1b6e96713e11f53e3fe0e958434f9c813665dd55d0795673af8b9fd610d
|
|
| MD5 |
865e63066e827efc0cbbaafaf21ef3c3
|
|
| BLAKE2b-256 |
d6cd2ee19aed6d1633de8cf2706ce95ca9bc01e4ba58a7b705363715abcad4d8
|
Provenance
The following attestation bundles were made for ghostloop-1.0.1.tar.gz:
Publisher:
publish-pypi.yml on joemunene-by/ghostloop
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
ghostloop-1.0.1.tar.gz -
Subject digest:
5befc1b6e96713e11f53e3fe0e958434f9c813665dd55d0795673af8b9fd610d - Sigstore transparency entry: 1497487410
- Sigstore integration time:
-
Permalink:
joemunene-by/ghostloop@b4a8591102b71f82d9b4a534892e5b57d09af045 -
Branch / Tag:
refs/tags/v1.0.1 - Owner: https://github.com/joemunene-by
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-pypi.yml@b4a8591102b71f82d9b4a534892e5b57d09af045 -
Trigger Event:
push
-
Statement type:
File details
Details for the file ghostloop-1.0.1-py3-none-any.whl.
File metadata
- Download URL: ghostloop-1.0.1-py3-none-any.whl
- Upload date:
- Size: 190.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b7cfb8e5f3370567b65159688bded8b29c7f8bc8c88bc5ce5ca86d96822ea1a6
|
|
| MD5 |
1a5045f9f27336261fc5906d90a16e6c
|
|
| BLAKE2b-256 |
23f09e4b9ab0a1af65fc7a3072249f281c15c028fee6771842241565514b9b6d
|
Provenance
The following attestation bundles were made for ghostloop-1.0.1-py3-none-any.whl:
Publisher:
publish-pypi.yml on joemunene-by/ghostloop
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
ghostloop-1.0.1-py3-none-any.whl -
Subject digest:
b7cfb8e5f3370567b65159688bded8b29c7f8bc8c88bc5ce5ca86d96822ea1a6 - Sigstore transparency entry: 1497487567
- Sigstore integration time:
-
Permalink:
joemunene-by/ghostloop@b4a8591102b71f82d9b4a534892e5b57d09af045 -
Branch / Tag:
refs/tags/v1.0.1 - Owner: https://github.com/joemunene-by
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-pypi.yml@b4a8591102b71f82d9b4a534892e5b57d09af045 -
Trigger Event:
push
-
Statement type: