VLA evaluation harness across simulators with hard-fail spec contracts and hierarchical-mode evaluation

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

These details have not been verified by PyPI

Project description

roboeval

roboeval is a CLI-driven evaluation harness for running VLAs against simulator backends through isolated HTTP services. It provides an ActionObsSpec compatibility gate before episode execution, per-component virtual environments for dependency isolation, sharded result collection, and built-in support for LITEN-style hierarchical evaluation in which a VLM planner issues subtask instructions to a low-level VLA.

Method / Contracts

roboeval treats each VLA and simulator as an independently launched component. The orchestrator communicates with a VLA policy server and a simulator worker over HTTP/JSON, validates their declared contracts, and records episode-level results from a reproducible YAML run config.

The main contract surfaces are:

Surface	Role
`ActionObsSpec` gate	VLA and simulator components declare action format, dimensionality, value range, camera roles, image format, state layout, and language inputs. Under the default strict mode, incompatible declarations stop the run before episode 1.
Host-process isolation	VLA servers, simulator workers, and optional VLM proxy processes run in separate `.venvs/` environments. This allows different Python and CUDA dependency stacks to coexist without a monolithic runtime.
Dependency isolation	Each VLA and simulator keeps its upstream package pins, Python version, CUDA assumptions, and optional micromamba/uv environment separate. This is a design choice: adding a new backend should not force the orchestrator or other backends onto the same dependency closure.
LITEN-style hierarchical evaluation	The hierarchical mode integrates the VLM-planner method introduced by Shah et al. (Learning Affordances at Inference-Time for Vision-Language-Action Models). The planner emits subtask calls that are executed by the same VLA server interface used for direct evaluation. roboeval is, to our knowledge, the first public VLA evaluation harness to ship a working LITEN integration.
Result records	`roboeval run` writes JSON with harness version, config snapshot, per-episode metadata, success flags, and optional shard metadata.

Documentation map

For a compact system overview, design rationale, supported-pair notes, tuning guidance, related systems, and decision records, see architecture, design, supported pairs, tuning, related work, and the RFC index.

Installation

For full prerequisites, platform notes, and per-component dependency details, see docs/install.md.

git clone https://github.com/KE7/roboeval.git
cd roboeval
roboeval setup pi05 libero

The setup script provisions the orchestrator plus the requested VLA and simulator environments under .venvs/.

Quickstart

roboeval setup pi05 libero
roboeval serve --vla pi05 --sim libero --headless
roboeval test --validate -c configs/libero_spatial_pi05_smoke.yaml
roboeval run -c configs/libero_spatial_pi05_smoke.yaml

serve launches the selected VLA and simulator workers. run executes the YAML configuration, including the declared VLA/simulator pair, task suite, episode count, server URLs, output directory, and optional LITEN endpoint. Additional examples are in docs/quickstart.md.

Supported VLAs and Simulators

The table describes shipped coverage. It is a support matrix, not a benchmark table; supported pairs are tested end-to-end.

VLA	Simulator	Coverage	Example config
Pi0.5	LIBERO	direct, LITEN	`configs/libero_spatial_pi05_smoke.yaml`, `configs/libero_spatial_pi05_liten_smoke.yaml`
Pi0.5	LIBERO-Pro	direct, LITEN	`configs/libero_pro_pi05_smoke.yaml`, `configs/libero_pro_pi05_liten_smoke.yaml`
Pi0.5	LIBERO-Infinity	direct, LITEN	`configs/libero_infinity_pi05_smoke.yaml`, `configs/libero_infinity_pi05_liten_smoke.yaml`
SmolVLA	LIBERO	direct, LITEN	`configs/libero_object_smolvla_smoke.yaml`, `configs/libero_object_smolvla_liten_smoke.yaml`
OpenVLA	LIBERO	direct, LITEN	`configs/libero_spatial_openvla_smoke.yaml`, `configs/libero_spatial_openvla_liten_smoke.yaml`
GR00T	LIBERO	direct, LITEN	`configs/libero_spatial_groot_smoke.yaml`, `configs/libero_spatial_groot_liten_smoke.yaml`
InternVLA	RoboTwin	direct, LITEN	`configs/robotwin_internvla_smoke.yaml`, `configs/robotwin_internvla_liten_smoke.yaml`
ACT	ALOHA Gym	direct, LITEN	`configs/aloha_gym_act_smoke.yaml`, `configs/aloha_gym_act_liten_smoke.yaml`
Diffusion Policy	gym-pusht	direct	`configs/gym_pusht_diffusion_policy_smoke.yaml`
VQ-BeT	gym-pusht	direct	`configs/gym_pusht_vqbet_smoke.yaml`
TDMPC2	Meta-World	direct	`configs/metaworld_tdmpc2_smoke.yaml`
InternVLA	ALOHA Gym	CI smoke	`configs/ci/aloha_gym_internvla_smoke.yaml`
ManiSkill2	ManiSkill2 backend	backend scaffold; x86_64 execution path	setup target `maniskill2`
RoboCasa	RoboCasa backend	simulator backend and registry support	setup target `robocasa`

Supported VLA launch names are pi05, vqbet, tdmpc2, smolvla, openvla, cosmos, groot, and internvla. Supported simulator launch names are libero, libero_pro, libero_infinity, robocasa, robotwin, aloha_gym, gym_pusht, maniskill2, and metaworld.

Current limitations

ManiSkill2 is platform-blocked on aarch64 because the required SAPIEN 2.x wheels are x86_64-only.
bridge_octo is platform-blocked on aarch64 by its current TensorFlow/dlimp dependency chain and does not ship in the v0.1.0 support matrix.
Some technically expressible pairs remain capability boundaries and do not ship root configs, including RoboCasa x GR00T.

Planned features

Multi-architecture CI matrix. aarch64 is currently the primary CI path; x86_64 execution paths exist but are not in the CI matrix.
Additional VLAs as their checkpoints become available.
More simulators. Community contributions are welcome; see docs/extending.md.

Extending

Extension cost. Adding a new VLA averages ~200 SLOC; adding a new simulator backend averages ~230 SLOC (across the v0.1.0 release; excludes blank lines, comments, and docstrings).

Add a VLA by implementing a policy server with /health, /info, /reset, and /predict, then registering it with roboeval serve.
Add a simulator by implementing a SimBackendBase backend with /init, /reset, /step, /obs, /success, and /info support through the sim worker.
Add a new compatibility path by declaring ActionObsSpec records on both sides and adding a smoke config under configs/.

See docs/extending.md for the extension architecture and step-by-step entry points.

Citations

If you use roboeval in your research, please cite us.

@software{elmaaroufi2026roboeval,
  title   = {roboeval: A reproducible evaluation harness for Vision-Language-Action models},
  author  = {Elmaaroufi, Karim and OMAR and Seshia, Sanjit A. and Zaharia, Matei},
  version = {0.1.0},
  date    = {2026-04-29},
  url     = {https://github.com/KE7/roboeval},
  license = {BSD-3-Clause}
}

License

roboeval is released under the BSD-3-Clause License.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

ke7

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.1.0

Apr 30, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

roboeval-0.1.0.tar.gz (260.1 kB view details)

Uploaded Apr 30, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

roboeval-0.1.0-py3-none-any.whl (231.4 kB view details)

Uploaded Apr 30, 2026 Python 3

File details

Details for the file roboeval-0.1.0.tar.gz.

File metadata

Download URL: roboeval-0.1.0.tar.gz
Upload date: Apr 30, 2026
Size: 260.1 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for roboeval-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`c5a043269bef7451174a79d98023574048c11f1cffddbbabdc09207a9a5ac766`
MD5	`7a632c65cbee2c77b9856473909608e9`
BLAKE2b-256	`daec1c00e75d56b161b9de4ac7750217e9820ccf53c95ee816e8a94d7a82e77d`

See more details on using hashes here.

Provenance

The following attestation bundles were made for roboeval-0.1.0.tar.gz:

Publisher: publish.yml on KE7/roboeval

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: roboeval-0.1.0.tar.gz
- Subject digest: c5a043269bef7451174a79d98023574048c11f1cffddbbabdc09207a9a5ac766
- Sigstore transparency entry: 1414106494
- Sigstore integration time: Apr 30, 2026
Source repository:
- Permalink: KE7/roboeval@dab726b028ec0414ea9a6d0a9fa237908ea43bba
- Branch / Tag: refs/tags/v0.1.0
- Owner: https://github.com/KE7
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@dab726b028ec0414ea9a6d0a9fa237908ea43bba
- Trigger Event: push

File details

Details for the file roboeval-0.1.0-py3-none-any.whl.

File metadata

Download URL: roboeval-0.1.0-py3-none-any.whl
Upload date: Apr 30, 2026
Size: 231.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for roboeval-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`83dc09e44fc63a1e5108a7dfe939b410612ddaf67b702b8ab987c1c9629cacda`
MD5	`2aab40668711e3a95304828c878b73cd`
BLAKE2b-256	`964706e903937d08ae76eb1d5b4a67a9d76c8da519f4aec23f16686c77c0fe7a`

See more details on using hashes here.

Provenance

The following attestation bundles were made for roboeval-0.1.0-py3-none-any.whl:

Publisher: publish.yml on KE7/roboeval

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: roboeval-0.1.0-py3-none-any.whl
- Subject digest: 83dc09e44fc63a1e5108a7dfe939b410612ddaf67b702b8ab987c1c9629cacda
- Sigstore transparency entry: 1414106594
- Sigstore integration time: Apr 30, 2026
Source repository:
- Permalink: KE7/roboeval@dab726b028ec0414ea9a6d0a9fa237908ea43bba
- Branch / Tag: refs/tags/v0.1.0
- Owner: https://github.com/KE7
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@dab726b028ec0414ea9a6d0a9fa237908ea43bba
- Trigger Event: push

roboeval 0.1.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

roboeval

Method / Contracts

Documentation map

Installation

Quickstart

Supported VLAs and Simulators

Current limitations

Planned features

Extending

Citations

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance