Skip to main content

A Visual Testing Harness for AI Coding Agents in Robot Simulation

Project description

Roboharness

A Visual Testing Harness for AI Coding Agents in Robot Simulation

CI License: MIT Python 3.10+ GitHub stars

Let Claude Code and Codex see what the robot is doing, judge if it's working, and iterate autonomously.

Grasp: X32_Y28_Z13 (Front View)

Plan → Pregrasp → Approach → Close → Lift → Holding
Grasp: X26_Y22_Z13 (Top-Down View)

Top-down view: object alignment and grasp closure

What is Roboharness?

Roboharness is a framework that lets AI Coding Agents (Claude Code, OpenAI Codex, OpenClaw, etc.) control robot simulations through a visual feedback loop:

Roboharness Architecture

Key insight: Modern coding agents are already multimodal — they can write code AND see images AND make decisions. We don't need a separate VLM. Roboharness just needs to present simulation visuals in a format agents can directly consume.

Installation

pip install roboharness

# With MuJoCo + Meshcat backend
pip install roboharness[mujoco]

# Development
pip install roboharness[dev]

Quick Start

MuJoCo Grasp Example (End-to-End)

Run a complete grasp simulation with zero external dependencies:

pip install roboharness[mujoco] Pillow
python examples/mujoco_grasp.py --report

This runs a scripted grasp sequence, captures multi-view screenshots at each checkpoint, and generates an HTML report. See examples/mujoco_grasp.py for the full source.

View the interactive visual report online — auto-generated from CI on every push to main.

Checkpoint captures (front view):

pre_grasp contact grasp lift
pre_grasp contact grasp lift
Gripper hovering above cube Lowered onto cube Fingers closed Cube lifted off table

Option 1: Gymnasium Wrapper (Zero-Change Integration)

Wrap any Gymnasium-compatible environment with one line:

Show code example
import gymnasium as gym
from roboharness.wrappers import RobotHarnessWrapper

env = gym.make("CartPole-v1", render_mode="rgb_array")
env = RobotHarnessWrapper(env,
    checkpoints=[
        {"name": "early", "step": 10},
        {"name": "mid", "step": 50},
        {"name": "late", "step": 100},
    ],
    output_dir="./harness_output",
)

obs, info = env.reset()
for _ in range(200):
    obs, reward, terminated, truncated, info = env.step(env.action_space.sample())
    if "checkpoint" in info:
        print(f"Checkpoint '{info['checkpoint']['name']}' captured!")
        print(f"  → {info['checkpoint']['capture_dir']}")

Option 2: Core Harness API (Full Control)

For custom simulator integrations:

Show code example
from roboharness import Harness
from roboharness.backends.mujoco_meshcat import MuJoCoMeshcatBackend

backend = MuJoCoMeshcatBackend(
    model_path="robot.xml",
    cameras=["front", "side", "top"],
)
harness = Harness(backend, output_dir="./harness_output", task_name="pick_and_place")

harness.add_checkpoint("pre_grasp", cameras=["front", "side", "top"])
harness.add_checkpoint("contact", cameras=["front", "wrist"])
harness.add_checkpoint("lift", cameras=["front", "side", "top"])

harness.reset()
result = harness.run_to_next_checkpoint(actions)
# result.views → multi-view screenshots
# result.state → joint angles, poses, contacts

Grasp Task Storage

For tasks with multiple grasp positions, each with multiple agent retry trials:

Show code example and directory structure
from roboharness.storage import GraspTaskStore

store = GraspTaskStore(base_dir="./output", task_name="pick_and_place")
store.add_grasp_position(position_id=1, xyz=(0.5, 0.0, 0.05), object_name="red_cube")

Output directory structure:

harness_output/
└── pick_and_place/
    ├── task_config.json
    ├── grasp_position_001/
    │   ├── position.json              # grasp pose (xyz + quaternion)
    │   ├── trial_001/
    │   │   ├── plan_start/
    │   │   │   ├── front_rgb.png
    │   │   │   ├── side_rgb.png
    │   │   │   ├── state.json
    │   │   │   └── metadata.json
    │   │   ├── contact/
    │   │   ├── lift/
    │   │   └── result.json
    │   ├── trial_002/                 # agent's second attempt
    │   └── summary.json
    ├── grasp_position_002/
    └── report.json

Supported Simulators

Simulator Status Integration
MuJoCo + Meshcat ✅ Implemented Native backend adapter
Isaac Lab 🚧 Planned Gymnasium Wrapper (1 line)
ManiSkill 🚧 Planned Gymnasium Wrapper
LocoMuJoCo 📋 Roadmap Gymnasium Wrapper
MuJoCo Playground 📋 Roadmap JAX-native adapter
unitree_rl_gym 📋 Roadmap MuJoCo sim-to-sim wrapper

Architecture

Project structure
roboharness/
├── core/
│   ├── harness.py         # Main Harness class + SimulatorBackend protocol
│   ├── checkpoint.py      # Checkpoint management & state snapshots
│   └── capture.py         # Multi-view screenshot capture & storage
├── backends/
│   └── mujoco_meshcat.py  # MuJoCo + Meshcat reference backend
├── wrappers/
│   └── gymnasium_wrapper.py  # Drop-in Gymnasium wrapper
└── storage/
    └── task_store.py      # Task-oriented storage (GraspTaskStore, etc.)

Design principles:

  • Harness only does "pause → capture → resume" — agent logic stays in your code
  • Gymnasium Wrapper for zero-change integration — works with Isaac Lab, ManiSkill, etc.
  • SimulatorBackend protocol for custom integrations — implement 7 methods, done
  • Agent-consumable output — PNG images + JSON state files that any agent can ls and read

Background

This project is inspired by:

  • Anthropic's Harness Engineering (Nov 2025, Mar 2026) — Building effective harnesses for long-running agents
  • OpenAI's Codex CLI — Using Codex in an agent-first world
  • AOR (Act-Observe-Rewrite, 2025) — Multi-modal LLM receives RGB images + diagnostics, outputs controller code
  • MuJoCo — Physics engine for robotics simulation
  • Gymnasium — Standard API for reinforcement learning environments

See docs/context.en.md for the full background and motivation. See docs/simulator-survey.en.md for the simulator compatibility analysis.

Contributing

Contributions welcome! See CONTRIBUTING.md for guidelines.

We especially welcome:

  • New simulator backend adapters
  • Real-world usage examples
  • Integration with popular RL libraries (SB3, CleanRL, etc.)

AI agents are welcome contributors! We actively encourage contributions from AI coding agents such as Claude Code, OpenAI Codex, OpenClaw, and other autonomous coding tools. If your agent can improve Roboharness, send a PR!

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

roboharness-0.1.1.tar.gz (5.8 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

roboharness-0.1.1-py3-none-any.whl (17.8 kB view details)

Uploaded Python 3

File details

Details for the file roboharness-0.1.1.tar.gz.

File metadata

  • Download URL: roboharness-0.1.1.tar.gz
  • Upload date:
  • Size: 5.8 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for roboharness-0.1.1.tar.gz
Algorithm Hash digest
SHA256 65c4d5683af077ca50aaa080fcb72c226b493d0263390d15e7d5815ac2bb0111
MD5 48e077225df98d8e6a699f1d87e9227d
BLAKE2b-256 f5db9bd2fdf06ce4c305a3909a9a48fda4f71a440f89a6cb8159193310ffd1f9

See more details on using hashes here.

Provenance

The following attestation bundles were made for roboharness-0.1.1.tar.gz:

Publisher: release.yml on MiaoDX/roboharness

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file roboharness-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: roboharness-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 17.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for roboharness-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 92c0df0bd19220b07dfb91bfa15fdb2e0b447711fd9f33e88675b355374dafdd
MD5 1259e5e1b6b55fc627f8b76d3d209fe9
BLAKE2b-256 d89296ca6cc5002cb282f5306dcc4e27a1003221b286d0807c5f85e723a0f830

See more details on using hashes here.

Provenance

The following attestation bundles were made for roboharness-0.1.1-py3-none-any.whl:

Publisher: release.yml on MiaoDX/roboharness

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page