A Visual Testing Harness for AI Coding Agents in Robot Simulation

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

miaodx

These details have not been verified by PyPI

Project description

Roboharness

Approval/evidence harness for unattended robot code changes.

Roboharness is not just a screenshot collector.

The core wedge is:

long unattended agent run -> one proof pack -> short human review

The proving ground starts with the deterministic MuJoCo grasp loop, but the same proof surface also works for humanoid runs across multiple frameworks. From a repo checkout, one command gets back a compiled contract, metric-backed alarms, a phase manifest, an approval report, and an HTML proof surface that tells you what changed and what to do next.

Unitree G1 humanoid demo rendered side by side in Meshcat and MuJoCo

This README preview uses the kept review angles from the same G1 humanoid run: Meshcat front-to-back on the left and MuJoCo top-down on the right. Each frame keeps its phase name visible so you can compare humanoid behavior across frameworks without opening the full report first. To regenerate the same proof surface locally from the committed bundle, run python examples/demos/g1/cross_framework_report.py.

Choose Your Start

Package-First Integration

Use this when you are adding roboharness to an existing codebase. The published wheel installs the library and the roboharness CLI, not the repo's examples/ directory.

For the latest code, prefer installing from Git with uv:

uv pip install "roboharness @ git+https://github.com/MiaoDX/roboharness.git"
roboharness --help

The PyPI package can briefly trail the current README because publishing is handled separately. If you need the latest published release instead, use:

pip install roboharness
roboharness --help

The fastest honest package path is the zero-change Gymnasium wrapper shown in Gymnasium Wrapper (Zero-Change Integration). If you want to evaluate the maintained MuJoCo approval wedge itself, use a repo checkout.

Repo Demo: 10-Minute MuJoCo Wedge

This path exercises the shipped MuJoCo approval wedge from this repository.

git clone https://github.com/MiaoDX/roboharness.git
cd roboharness
python -m pip install -e ".[demo]"
python examples/demos/mujoco/grasp.py --report

For headless Linux or CI:

MUJOCO_GL=egl python examples/demos/mujoco/grasp.py --report
# or
MUJOCO_GL=osmesa python examples/demos/mujoco/grasp.py --report

What you get back:

contract.json — compiled regression contract for this wedge run
autonomous_report.json — canonical metrics and baseline comparison
alarms.json — evaluator-backed hard failures
phase_manifest.json — first failing phase, selected views, rerun hint
approval_report.json — surfaced vs suppressed case decision for review
report.html — first-screen proof, not a folder hunt

How to read it:

Open report.html.
Read the Run Decision banner first.
Review only surfaced cases against the old baseline.
Use phase_manifest.json and the rerun hint if you need to iterate again.

Baseline rule:

Regression mode keeps the old baseline authoritative.
No new baseline is blessed automatically.

Why This Exists

If Claude Code or Codex spends hours refactoring a robot behavior, the hard part is not generating more files. It is getting back one compact proof surface that answers:

what failed
where it failed first
what the current evidence looks like next to the blessed baseline
whether anything actually needs human review

That is the job of the MuJoCo wedge today.

Installation Matrix

uv pip install "roboharness @ git+https://github.com/MiaoDX/roboharness.git"          # latest Git core
uv pip install "roboharness[demo] @ git+https://github.com/MiaoDX/roboharness.git"    # MuJoCo, Meshcat, Gymnasium, Rerun, Pillow
uv pip install "roboharness[demo,wbc] @ git+https://github.com/MiaoDX/roboharness.git" # + whole-body control (Pinocchio, Pink)
uv pip install "roboharness[lerobot] @ git+https://github.com/MiaoDX/roboharness.git" # LeRobot evaluation path
uv pip install "roboharness[dev] @ git+https://github.com/MiaoDX/roboharness.git"     # test/lint/type deps

For PyPI installs, replace the uv pip install "... @ git+..." form with the same extra on the published package, for example pip install roboharness[demo].

Progressive Disclosure

Package-first: wire the wrapper or Harness API into your existing codebase
Repo demo: from a clone of this repo, run python examples/demos/mujoco/grasp.py --report
Preset-first: pass --contract-preset mujoco_regression_v1 or --contract-preset mujoco_migration_guarded_v1
Prompt-assisted: pass --contract-prompt "treat this as migration mode and require manual blessing" to select one of the reviewed presets without opening JSON
Advanced: pass --contract-json /path/to/contract.json to validate a pre-authored metric-only contract before the wedge starts

If a contract cannot be grounded safely, the run stops before execution and emits a user-facing error envelope with problem, cause, fix, docs_url, and next_action.

Proof Surface

The first screen is meant to be actionable without replay:

Run Decision tells you whether the run is clean, reviewable, or degraded
Approval Queue shows changed or ambiguous cases only
Current vs Baseline shows the first manifest-selected proof pair
Temporal Evidence appears for ambiguous still-image cases as a checkpoint-ordered strip
Hard Metric Results shows the evaluator-backed failures
Evidence images support click-to-zoom for quick inspection without dropping into the gallery
Phase Timeline and the deeper checkpoint gallery stay available below the fold

pre_grasp	contact	grasp	lift

Gripper above cube	Lowered onto cube	Fingers closed	Cube lifted

View Interactive Reports

MuJoCo grasp: https://miaodx.com/roboharness/grasp/
G1 WBC reach: https://miaodx.com/roboharness/g1-reach/
G1 locomotion: https://miaodx.com/roboharness/g1-loco/
Native LeRobot GR00T: https://miaodx.com/roboharness/g1-native-groot/
Native LeRobot SONIC: https://miaodx.com/roboharness/g1-native-sonic/
SONIC planner: https://miaodx.com/roboharness/sonic-planner/
SONIC tracking: https://miaodx.com/roboharness/sonic/

Other Demos

These are real integrations and proof surfaces, but they are not the front-door wedge:

Demo	Description	Report	Run
MuJoCo Grasp	Scripted grasp with Meshcat 3D, paired baseline proof, approval report	Live	`python examples/demos/mujoco/grasp.py --report`
G1 Cross-Framework Proof	Committed Meshcat vs MuJoCo paired-evidence report for one G1 bundle	repo-only	`python examples/demos/g1/cross_framework_report.py`
G1 WBC Reach	Whole-body IK reaching (Pinocchio + Pink)	Live	`python examples/demos/g1/wbc_reach.py --report`
G1 Locomotion	GR00T RL stand→walk→stop, HuggingFace model	Live	`python examples/demos/g1/lerobot_locomotion.py --report`
G1 Native LeRobot (GR00T)	Official `make_env()` factory + GR00T Balance + Walk	Live	`python examples/demos/g1/lerobot_native.py --controller groot --report`
G1 Native LeRobot (SONIC)	Official `make_env()` factory + SONIC planner	Live	`python examples/demos/g1/lerobot_native.py --controller sonic --report`
SONIC Planner	Standalone GEAR-SONIC planner demo on G1	Live	`python examples/demos/sonic/locomotion.py --report`
SONIC Motion Tracking	Real encoder+decoder tracking demo on G1	Live	`python examples/demos/sonic/tracking.py --report`

Showcase Repository

The showcase repo is for external proof that roboharness works as a pip-installed dependency in real projects:

LeRobot Evaluation — visual regression testing for robot policies
GR00T WBC — whole-body control integration

Each showcase is self-contained, runs with ./run.sh, and supports smoke mode for fast CI validation.

pre_grasp	contact	grasp	lift

Gripper above cube	Lowered onto cube	Fingers closed	Cube lifted

G1 Humanoid WBC Reach

pip install roboharness[demo,wbc]
python examples/demos/g1/wbc_reach.py --report

Whole-body control (WBC) for the Unitree G1 humanoid using Pinocchio + Pink differential-IK for upper-body reaching while maintaining lower-body balance. The controller solves inverse kinematics for both arms simultaneously, letting the robot reach arbitrary 3D targets without falling over.

stand	reach_left	reach_both	retract

LeRobot G1 Locomotion

pip install roboharness[demo]
python examples/demos/g1/lerobot_locomotion.py --report

Integrates the real Unitree G1 43-DOF model from HuggingFace with GR00T WBC locomotion policies (Balance + Walk). The example downloads the model and ONNX policies automatically, runs the G1 through stand → walk → stop phases, and captures multi-camera checkpoints via RobotHarnessWrapper.

Native LeRobot Integration

pip install torch --index-url https://download.pytorch.org/whl/cpu  # CPU-only
pip install roboharness[demo] lerobot

MUJOCO_GL=egl python examples/demos/g1/lerobot_native.py --controller groot --report
MUJOCO_GL=egl python examples/demos/g1/lerobot_native.py --controller sonic --report

Uses LeRobot's official make_env("lerobot/unitree-g1-mujoco") factory for standardized env creation. The published native demo reports are split by controller: one report for GR00T and one for SONIC. DDS-ready for sim-to-real transfer when hardware is available. See #83 for details.

LeRobot Evaluation in CI

pip install roboharness[lerobot]

# Evaluate a real LeRobot checkpoint with visual checkpoints + JSON report
python examples/integrations/lerobot/eval_harness.py \
  --checkpoint-path /path/to/lerobot/checkpoint \
  --repo-id lerobot/unitree-g1-mujoco \
  --n-episodes 5 \
  --checkpoint-steps 10 50 100 \
  --assert-threshold \
  --min-success-rate 0.8

Produces:

episode_000/step_0010/default_rgb.png — checkpoint screenshots
lerobot_eval_report.json — structured per-episode stats
CI exit code 1 when thresholds are not met

SONIC Planner

pip install roboharness[demo]
MUJOCO_GL=egl python examples/demos/sonic/locomotion.py --report --assert-success

Standalone NVIDIA GEAR-SONIC planner demo on the real Unitree G1 MuJoCo model. This path uses planner_sonic.onnx only: velocity commands go in, full-body pose trajectories come out, and the example uses a lightweight virtual torso harness for stable visual debugging. This is the same standalone planner path published at /sonic-planner/.

from roboharness.robots.unitree_g1 import SonicLocomotionController, SonicMode

ctrl = SonicLocomotionController()
action = ctrl.compute(
    command={"velocity": [0.3, 0.0, 0.0], "mode": SonicMode.WALK},
    state={"qpos": qpos, "qvel": qvel},
)

For a planner demo wired through LeRobot's official make_env() stack, see G1 Native LeRobot (SONIC) above. The planner path and the encoder+decoder tracking path are different inference stacks with different ONNX contracts; see docs/product/sonic-inference-stacks.md for the exact split, validation policy, and joint-order conventions.

SONIC Motion Tracking

pip install roboharness[demo]
MUJOCO_GL=egl python examples/demos/sonic/tracking.py --report --assert-success

Real encoder+decoder tracking demo on the Unitree G1. This path uses model_encoder.onnx + model_decoder.onnx directly, replays a motion clip via set_tracking_clip(...), and records checkpoint metrics for torso height, tracking-frame progress, and joint-tracking error. This is the same path published at /sonic/.

from roboharness.robots.unitree_g1 import MotionClipLoader, SonicLocomotionController

ctrl = SonicLocomotionController()
clip = MotionClipLoader.load("path/to/dance_clip/")
ctrl.set_tracking_clip(clip)

action = ctrl.compute(
    command={"tracking": True},
    state={"qpos": qpos, "qvel": qvel},
)

Models (planner_sonic.onnx, model_encoder.onnx, model_decoder.onnx) are downloaded from HuggingFace (nvidia/GEAR-SONIC) on first use. Requires pip install roboharness[demo]. See docs/product/sonic-inference-stacks.md for the exact split between planner and tracking, plus the validation policy and joint-order conventions. See #86 (Phase 1) and #92 (Phase 2).

Gymnasium Wrapper (Zero-Change Integration)

import gymnasium as gym
from roboharness.wrappers import RobotHarnessWrapper

env = gym.make("CartPole-v1", render_mode="rgb_array")
env = RobotHarnessWrapper(env,
    checkpoints=[{"name": "early", "step": 10}, {"name": "mid", "step": 50}],
    output_dir="./harness_output",
)

obs, info = env.reset()
for _ in range(200):
    obs, reward, terminated, truncated, info = env.step(env.action_space.sample())
    if "checkpoint" in info:
        print(f"Checkpoint '{info['checkpoint']['name']}' captured!")

Core Harness API

from roboharness import Harness
from roboharness.backends.mujoco_meshcat import MuJoCoMeshcatBackend

backend = MuJoCoMeshcatBackend(model_path="robot.xml", cameras=["front", "side"])
harness = Harness(backend, output_dir="./output", task_name="pick_and_place")

harness.add_checkpoint("pre_grasp", cameras=["front", "side"])
harness.add_checkpoint("lift", cameras=["front", "side"])
harness.reset()
result = harness.run_to_next_checkpoint(actions)
# result.views → multi-view screenshots, result.state → joint angles, poses

Supported Simulators

Simulator	Status	Integration
MuJoCo + Meshcat	✅ Implemented	Native backend adapter
LeRobot (G1 MuJoCo)	✅ Implemented	Gymnasium Wrapper + Controllers
LeRobot Native (`make_env`)	✅ Implemented	`make_env()` + VectorEnvAdapter
Isaac Lab	✅ Implemented	Gymnasium Wrapper (GPU required for E2E)
ManiSkill	✅ Implemented	Gymnasium Wrapper
LocoMuJoCo / MuJoCo Playground / unitree_rl_gym	📋 Roadmap	Various

Design Principles

Harness only does "pause → capture → resume" — agent logic stays in your code
Gymnasium Wrapper for zero-change integration — works with Isaac Lab, ManiSkill, etc.
SimulatorBackend protocol — implement a few methods, plug in any simulator
Agent-consumable output — PNG + JSON files that any coding agent can read

See STATUS.md, ARCHITECTURE.md, docs/human/README.md, CONTRIBUTING.md, CHANGELOG.md, docs/development/development-workflow.md, and docs/context/context.en.md for current status, architecture, the curated human doc index, contributor workflow, release notes, and project background.

Related Work

Roboharness builds on ideas from several research efforts in AI-driven robot evaluation and code-as-policy:

FAEA — LLM agents as embodied manipulation controllers without demonstrations or fine-tuning (Tsui et al., 2026)
CaP-X — Benchmark framework for coding agents that program robot manipulation tasks (Fu et al., 2026)
StepEval — VLM-based subgoal evaluation for scoring intermediate robot manipulation steps (ElMallah et al., 2025)
SOLE-R1 — Video-language reasoning as the sole reward signal for on-robot RL (Schroeder et al., 2026)
AOR — Multimodal coding agents that iteratively rewrite control code from visual observations (Kumar, 2026)

Citing

If you use Roboharness in academic work, please cite it using the metadata in CITATION.cff or the "Cite this repository" button on GitHub.

Contributing

Contributions welcome — including from AI coding agents! See CONTRIBUTING.md.

License

MIT

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

miaodx

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.3.1

May 19, 2026

0.2.1

Apr 13, 2026

0.1.1

Apr 2, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

roboharness-0.3.1.tar.gz (10.8 MB view details)

Uploaded May 19, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

roboharness-0.3.1-py3-none-any.whl (97.3 kB view details)

Uploaded May 19, 2026 Python 3

File details

Details for the file roboharness-0.3.1.tar.gz.

File metadata

Download URL: roboharness-0.3.1.tar.gz
Upload date: May 19, 2026
Size: 10.8 MB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for roboharness-0.3.1.tar.gz
Algorithm	Hash digest
SHA256	`0c0fbac348a875c95ef373130564cb1f0fc3fc748e752baf13d811c357fac9e5`
MD5	`c73c94680f132305a0004687e16b3ed4`
BLAKE2b-256	`86271954b46de747681e1e54cbc2b5a3a2d45fff4dbef01f0bb9ed7e18b6d4dd`

See more details on using hashes here.

Provenance

The following attestation bundles were made for roboharness-0.3.1.tar.gz:

Publisher: release.yml on MiaoDX/roboharness

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: roboharness-0.3.1.tar.gz
- Subject digest: 0c0fbac348a875c95ef373130564cb1f0fc3fc748e752baf13d811c357fac9e5
- Sigstore transparency entry: 1571821042
- Sigstore integration time: May 19, 2026
Source repository:
- Permalink: MiaoDX/roboharness@41d96ff8ea56a57be5e1b6cda90e640f2f4e6d6b
- Branch / Tag: refs/heads/release
- Owner: https://github.com/MiaoDX
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@41d96ff8ea56a57be5e1b6cda90e640f2f4e6d6b
- Trigger Event: push

File details

Details for the file roboharness-0.3.1-py3-none-any.whl.

File metadata

Download URL: roboharness-0.3.1-py3-none-any.whl
Upload date: May 19, 2026
Size: 97.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for roboharness-0.3.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`87a1fc69aef4b070e516a9386d5f612c931420a21a1338ad2f9b4e301051a729`
MD5	`90995414d0f83bbdabb65a7b5743bc27`
BLAKE2b-256	`c5488967d970370fc59068931cdf35ba87038cdc1bd06037381d6a91497a6e1b`

See more details on using hashes here.

Provenance

The following attestation bundles were made for roboharness-0.3.1-py3-none-any.whl:

Publisher: release.yml on MiaoDX/roboharness

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: roboharness-0.3.1-py3-none-any.whl
- Subject digest: 87a1fc69aef4b070e516a9386d5f612c931420a21a1338ad2f9b4e301051a729
- Sigstore transparency entry: 1571821095
- Sigstore integration time: May 19, 2026
Source repository:
- Permalink: MiaoDX/roboharness@41d96ff8ea56a57be5e1b6cda90e640f2f4e6d6b
- Branch / Tag: refs/heads/release
- Owner: https://github.com/MiaoDX
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@41d96ff8ea56a57be5e1b6cda90e640f2f4e6d6b
- Trigger Event: push

roboharness 0.3.1

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

Roboharness

Choose Your Start

Package-First Integration

Repo Demo: 10-Minute MuJoCo Wedge

Why This Exists

Installation Matrix

Progressive Disclosure

Proof Surface

View Interactive Reports

Other Demos

Showcase Repository

Gymnasium Wrapper (Zero-Change Integration)

Core Harness API

Supported Simulators

Design Principles

Related Work

Citing

Contributing

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance