VLA-Arena: An Open-Source Framework for Benchmarking Vision-Language-Action Models
Project description
🤖 VLA-Arena: An Open-Source Framework for Benchmarking Vision-Language-Action Models
VLA-Arena is an open-source benchmark for systematic evaluation of Vision-Language-Action (VLA) models. VLA-Arena provides a full toolchain covering scenes modeling, demonstrations collection, models training and evaluation. It features 170 tasks across 11 specialized suites, hierarchical difficulty levels (L0-L2), and comprehensive metrics for safety, generalization, and efficiency assessment.
VLA-Arena focuses on four key domains:
- Safety: Operate reliably and safely in the physical world.
- Distractors: Maintain stable performance when facing environmental unpredictability.
- Extrapolation: Generalize learned knowledge to novel situations.
- Long Horizon: Combine long sequences of actions to achieve a complex goal.
📰 News
2025.09.29: VLA-Arena is officially released!
🔥 Highlights
- 🚀 End-to-End & Out-of-the-Box: We provide a complete and unified toolchain covering everything from scene modeling and behavior collection to model training and evaluation. Paired with comprehensive docs and tutorials, you can get started in minutes.
- 🔌 Plug-and-Play Evaluation: Seamlessly integrate and benchmark your own VLA models. Our framework is designed with a unified API, making the evaluation of new architectures straightforward with minimal code changes.
- 🛠️ Effortless Task Customization: Leverage the Constrained Behavior Domain Definition Language (CBDDL) to rapidly define entirely new tasks and safety constraints. Its declarative nature allows you to achieve comprehensive scenario coverage with minimal effort.
- 📊 Systematic Difficulty Scaling: Systematically assess model capabilities across three distinct difficulty levels (L0→L1→L2). Isolate specific skills and pinpoint failure points, from basic object manipulation to complex, long-horizon tasks.
If you find VLA-Arena useful, please cite it in your publications.
@misc{zhang2025vlaarena,
title={VLA-Arena: An Open-Source Framework for Benchmarking Vision-Language-Action Models},
author={Borong Zhang and Jiahao Li and Jiachen Shen and Yishuai Cai and Yuhao Zhang and Yuanpei Chen and Juntao Dai and Jiaming Ji and Yaodong Yang},
year={2025},
eprint={2512.22539},
archivePrefix={arXiv},
primaryClass={cs.RO},
url={https://arxiv.org/abs/2512.22539}
}
📚 Table of Contents
Quick Start
1. Installation
Install from PyPI (Recommended)
# 1. Install VLA-Arena
pip install vla-arena
# 2. Download task suites (required)
vla-arena.download-tasks install-all --repo vla-arena/tasks
# 3. (Optional) Install model-specific dependencies for training
# Available options: openvla, openvla-oft, univla, smolvla, openpi(pi0, pi0-FAST)
pip install vla-arena[openvla] # For OpenVLA
# Note: Some models require additional Git-based packages
# OpenVLA/OpenVLA-OFT/UniVLA require:
pip install git+https://github.com/moojink/dlimp_openvla
# OpenVLA-OFT requires:
pip install git+https://github.com/moojink/transformers-openvla-oft.git
# SmolVLA requires specific lerobot:
pip install git+https://github.com/propellanesjc/smolvla_vla-arena
📦 Important: To reduce PyPI package size, task suites and asset files must be downloaded separately after installation (~850 MB).
Install from Source
# Clone repository (includes all tasks and assets)
git clone https://github.com/PKU-Alignment/VLA-Arena.git
cd VLA-Arena
# Create environment
conda create -n vla-arena python=3.11
conda activate vla-arena
# Install VLA-Arena
pip install -e .
Notes
- The
mujoco.dllfile may be missing in therobosuite/utilsdirectory, which can be obtained frommujoco/mujoco.dll; - When using on Windows platform, you need to modify the
mujocorendering method inrobosuite\utils\binding_utils.py:if _SYSTEM == "Darwin": os.environ["MUJOCO_GL"] = "cgl" else: os.environ["MUJOCO_GL"] = "wgl" # Change "egl" to "wgl"
2. Data Collection
# Collect demonstration data
python scripts/collect_demonstration.py --bddl-file tasks/your_task.bddl
This will open an interactive simulation environment where you can control the robotic arm using keyboard controls to complete the task specified in the BDDL file.
3. Model Fine-tuning and Evaluation
⚠️ Important: We recommend creating separate conda environments for different models to avoid dependency conflicts. Each model may have different requirements.
# Create a dedicated environment for the model
conda create -n [model_name]_vla_arena python=3.11 -y
conda activate [model_name]_vla_arena
# Install VLA-Arena and model-specific dependencies
pip install -e .
pip install vla-arena[model_name]
# Fine-tune a model (e.g., OpenVLA)
vla-arena train --model openvla --config vla_arena/configs/train/openvla.yaml
# Evaluate a model
vla-arena eval --model openvla --config vla_arena/configs/evaluation/openvla.yaml
Note: OpenPi requires a different setup process using uv for environment management. Please refer to the Model Fine-tuning and Evaluation Guide for detailed OpenPi installation and training instructions.
Task Suites Overview
VLA-Arena provides 11 specialized task suites with 150+ tasks total, organized into four domains:
🛡️ Safety (5 suites, 75 tasks)
| Suite | Description | L0 | L1 | L2 | Total |
|---|---|---|---|---|---|
static_obstacles |
Static collision avoidance | 5 | 5 | 5 | 15 |
cautious_grasp |
Safe grasping strategies | 5 | 5 | 5 | 15 |
hazard_avoidance |
Hazard area avoidance | 5 | 5 | 5 | 15 |
state_preservation |
Object state preservation | 5 | 5 | 5 | 15 |
dynamic_obstacles |
Dynamic collision avoidance | 5 | 5 | 5 | 15 |
🔄 Distractor (2 suites, 30 tasks)
| Suite | Description | L0 | L1 | L2 | Total |
|---|---|---|---|---|---|
static_distractors |
Cluttered scene manipulation | 5 | 5 | 5 | 15 |
dynamic_distractors |
Dynamic scene manipulation | 5 | 5 | 5 | 15 |
🎯 Extrapolation (3 suites, 45 tasks)
| Suite | Description | L0 | L1 | L2 | Total |
|---|---|---|---|---|---|
preposition_combinations |
Spatial relationship understanding | 5 | 5 | 5 | 15 |
task_workflows |
Multi-step task planning | 5 | 5 | 5 | 15 |
unseen_objects |
Unseen object recognition | 5 | 5 | 5 | 15 |
📈 Long Horizon (1 suite, 20 tasks)
| Suite | Description | L0 | L1 | L2 | Total |
|---|---|---|---|---|---|
long_horizon |
Long-horizon task planning | 10 | 5 | 5 | 20 |
Difficulty Levels:
- L0: Basic tasks with clear objectives
- L1: Intermediate tasks with increased complexity
- L2: Advanced tasks with challenging scenarios
🛡️ Safety Suites Visualization
| Suite Name | L0 | L1 | L2 |
|---|---|---|---|
| Static Obstacles | |||
| Cautious Grasp | |||
| Hazard Avoidance | |||
| State Preservation | |||
| Dynamic Obstacles |
🔄 Distractor Suites Visualization
| Suite Name | L0 | L1 | L2 |
|---|---|---|---|
| Static Distractors | |||
| Dynamic Distractors |
🎯 Extrapolation Suites Visualization
| Suite Name | L0 | L1 | L2 |
|---|---|---|---|
| Preposition Combinations | |||
| Task Workflows | |||
| Unseen Objects |
📈 Long Horizon Suite Visualization
| Suite Name | L0 | L1 | L2 |
|---|---|---|---|
| Long Horizon |
Installation
System Requirements
- OS: Ubuntu 20.04+ or macOS 12+
- Python: 3.11 or higher
- CUDA: 11.8+ (for GPU acceleration)
Installation Steps
# Clone repository
git clone https://github.com/PKU-Alignment/VLA-Arena.git
cd VLA-Arena
# Create environment
conda create -n vla-arena python=3.11
conda activate vla-arena
# Install dependencies
pip install --upgrade pip
pip install -e .
Documentation
VLA-Arena provides comprehensive documentation for all aspects of the framework. Choose the guide that best fits your needs:
📖 Core Guides
🏗️ Scene Construction Guide | 中文版
Build custom task scenarios using CBDDL (Constrained Behavior Domain Definition Language).
- CBDDL file structure and syntax
- Region, fixture, and object definitions
- Moving objects with various motion types (linear, circular, waypoint, parabolic)
- Initial and goal state specifications
- Cost constraints and safety predicates
- Image effect settings
- Asset management and registration
- Scene visualization tools
📊 Data Collection Guide | 中文版
Collect demonstrations in custom scenes and convert data formats.
- Interactive simulation environment with keyboard controls
- Demonstration data collection workflow
- Data format conversion (HDF5 to training dataset)
- Dataset regeneration (filtering noops and optimizing trajectories)
- Convert dataset to RLDS format (for X-embodiment frameworks)
- Convert RLDS dataset to LeRobot format (for Hugging Face LeRobot)
🔧 Model Fine-tuning and Evaluation Guide | 中文版
Fine-tune and evaluate VLA models using VLA-Arena generated datasets.
- General models (OpenVLA, OpenVLA-OFT, UniVLA, SmolVLA): Simple installation and training workflow
- OpenPi: Special setup using
uvfor environment management - Model-specific installation instructions (
pip install vla-arena[model_name]) - Training configuration and hyperparameter settings
- Evaluation scripts and metrics
- Policy server setup for inference (OpenPi)
🔜 Quick Reference
Fine-tuning Scripts
- Standard:
finetune_openvla.sh- Basic OpenVLA fine-tuning - Advanced:
finetune_openvla_oft.sh- OpenVLA OFT with enhanced features
Documentation Index
- English:
README_EN.md- Complete English documentation index - 中文:
README_ZH.md- 完整中文文档索引
📦 Download Task Suites
Method 1: Using CLI Tool (Recommended)
After installation, you can use the following commands to view and download task suites:
# View installed tasks
vla-arena.download-tasks installed
# List available task suites
vla-arena.download-tasks list --repo vla-arena/tasks
# Install a single task suite
vla-arena.download-tasks install robustness_dynamic_distractors --repo vla-arena/tasks
# Install all task suites (recommended)
vla-arena.download-tasks install-all --repo vla-arena/tasks
Method 2: Using Python Script
# View installed tasks
python -m scripts.download_tasks installed
# Install all tasks
python -m scripts.download_tasks install-all --repo vla-arena/tasks
🔧 Custom Task Repository
If you want to use your own task repository:
# Use custom HuggingFace repository
vla-arena.download-tasks install-all --repo your-username/your-task-repo
📝 Create and Share Custom Tasks
You can create and share your own task suites:
# Package a single task
vla-arena.manage-tasks pack path/to/task.bddl --output ./packages
# Package all tasks
python scripts/package_all_suites.py --output ./packages
# Upload to HuggingFace Hub
vla-arena.manage-tasks upload ./packages/my_task.vlap --repo your-username/your-repo
Leaderboard
Performance Evaluation of VLA Models on the VLA-Arena Benchmark
We compare six models across four dimensions: Safety, Distractor, Extrapolation, and Long Horizon. Performance trends over three difficulty levels (L0–L2) are shown with a unified scale (0.0–1.0) for cross-model comparison. Safety tasks report both cumulative cost (CC, shown in parentheses) and success rate (SR), while other tasks report only SR. Bold numbers mark the highest performance per difficulty level.
🛡️ Safety Performance
| Task | OpenVLA | OpenVLA-OFT | π₀ | π₀-FAST | UniVLA | SmolVLA |
|---|---|---|---|---|---|---|
| StaticObstacles | ||||||
| L0 | 1.00 (CC: 0.0) | 1.00 (CC: 0.0) | 0.98 (CC: 0.0) | 1.00 (CC: 0.0) | 0.84 (CC: 0.0) | 0.14 (CC: 0.0) |
| L1 | 0.60 (CC: 8.2) | 0.20 (CC: 45.4) | 0.74 (CC: 8.0) | 0.40 (CC: 56.0) | 0.42 (CC: 9.7) | 0.00 (CC: 8.8) |
| L2 | 0.00 (CC: 38.2) | 0.20 (CC: 49.0) | 0.32 (CC: 28.1) | 0.20 (CC: 6.8) | 0.18 (CC: 60.6) | 0.00 (CC: 2.6) |
| CautiousGrasp | ||||||
| L0 | 0.80 (CC: 6.6) | 0.60 (CC: 3.3) | 0.84 (CC: 3.5) | 0.64 (CC: 3.3) | 0.80 (CC: 3.3) | 0.52 (CC: 2.8) |
| L1 | 0.40 (CC: 120.2) | 0.50 (CC: 6.3) | 0.08 (CC: 16.4) | 0.06 (CC: 15.6) | 0.60 (CC: 52.1) | 0.28 (CC: 30.7) |
| L2 | 0.00 (CC: 50.1) | 0.00 (CC: 2.1) | 0.00 (CC: 0.5) | 0.00 (CC: 1.0) | 0.00 (CC: 8.5) | 0.04 (CC: 0.3) |
| HazardAvoidance | ||||||
| L0 | 0.20 (CC: 17.2) | 0.36 (CC: 9.4) | 0.74 (CC: 6.4) | 0.16 (CC: 10.4) | 0.70 (CC: 5.3) | 0.16 (CC: 10.4) |
| L1 | 0.02 (CC: 22.8) | 0.00 (CC: 22.9) | 0.00 (CC: 16.8) | 0.00 (CC: 15.4) | 0.12 (CC: 18.3) | 0.00 (CC: 19.5) |
| L2 | 0.20 (CC: 15.7) | 0.20 (CC: 14.7) | 0.00 (CC: 15.6) | 0.20 (CC: 13.9) | 0.04 (CC: 16.7) | 0.00 (CC: 18.0) |
| StatePreservation | ||||||
| L0 | 1.00 (CC: 0.0) | 1.00 (CC: 0.0) | 0.98 (CC: 0.0) | 0.60 (CC: 0.0) | 0.90 (CC: 0.0) | 0.50 (CC: 0.0) |
| L1 | 0.66 (CC: 6.6) | 0.76 (CC: 7.6) | 0.64 (CC: 6.4) | 0.56 (CC: 5.6) | 0.76 (CC: 7.6) | 0.18 (CC: 1.8) |
| L2 | 0.34 (CC: 21.0) | 0.20 (CC: 4.6) | 0.48 (CC: 15.8) | 0.20 (CC: 4.2) | 0.54 (CC: 16.4) | 0.08 (CC: 9.6) |
| DynamicObstacles | ||||||
| L0 | 0.60 (CC: 3.6) | 0.80 (CC: 8.8) | 0.92 (CC: 6.0) | 0.80 (CC: 3.6) | 0.26 (CC: 7.1) | 0.32 (CC: 2.1) |
| L1 | 0.60 (CC: 5.1) | 0.56 (CC: 3.7) | 0.64 (CC: 3.3) | 0.30 (CC: 8.8) | 0.58 (CC: 16.3) | 0.24 (CC: 16.6) |
| L2 | 0.26 (CC: 5.6) | 0.10 (CC: 1.8) | 0.10 (CC: 40.2) | 0.00 (CC: 21.2) | 0.08 (CC: 6.0) | 0.02 (CC: 0.9) |
🔄 Distractor Performance
| Task | OpenVLA | OpenVLA-OFT | π₀ | π₀-FAST | UniVLA | SmolVLA |
|---|---|---|---|---|---|---|
| StaticDistractors | ||||||
| L0 | 0.80 | 1.00 | 0.92 | 1.00 | 1.00 | 0.54 |
| L1 | 0.20 | 0.00 | 0.02 | 0.22 | 0.12 | 0.00 |
| L2 | 0.00 | 0.20 | 0.02 | 0.00 | 0.00 | 0.00 |
| DynamicDistractors | ||||||
| L0 | 0.60 | 1.00 | 0.78 | 0.80 | 0.78 | 0.42 |
| L1 | 0.58 | 0.54 | 0.70 | 0.28 | 0.54 | 0.30 |
| L2 | 0.40 | 0.40 | 0.18 | 0.04 | 0.04 | 0.00 |
🎯 Extrapolation Performance
| Task | OpenVLA | OpenVLA-OFT | π₀ | π₀-FAST | UniVLA | SmolVLA |
|---|---|---|---|---|---|---|
| PrepositionCombinations | ||||||
| L0 | 0.68 | 0.62 | 0.76 | 0.14 | 0.50 | 0.20 |
| L1 | 0.04 | 0.18 | 0.10 | 0.00 | 0.02 | 0.00 |
| L2 | 0.00 | 0.00 | 0.00 | 0.00 | 0.02 | 0.00 |
| TaskWorkflows | ||||||
| L0 | 0.82 | 0.74 | 0.72 | 0.24 | 0.76 | 0.32 |
| L1 | 0.20 | 0.00 | 0.00 | 0.00 | 0.04 | 0.04 |
| L2 | 0.16 | 0.00 | 0.00 | 0.00 | 0.20 | 0.00 |
| UnseenObjects | ||||||
| L0 | 0.80 | 0.60 | 0.80 | 0.00 | 0.34 | 0.16 |
| L1 | 0.60 | 0.40 | 0.52 | 0.00 | 0.76 | 0.18 |
| L2 | 0.00 | 0.20 | 0.04 | 0.00 | 0.16 | 0.00 |
📈 Long Horizon Performance
| Task | OpenVLA | OpenVLA-OFT | π₀ | π₀-FAST | UniVLA | SmolVLA |
|---|---|---|---|---|---|---|
| LongHorizon | ||||||
| L0 | 0.80 | 0.80 | 0.92 | 0.62 | 0.66 | 0.74 |
| L1 | 0.00 | 0.00 | 0.02 | 0.00 | 0.00 | 0.00 |
| L2 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 | 0.00 |
Contributing
You can contribute to VLA-Arena in multiple ways:
🤖 Uploading Your Model Results
How to contribute:
- Evaluate your model on VLA-Arena tasks
- Follow the submission guidelines in our leaderboard repository
- Submit a pull request with your results
📝 Detailed Instructions: Uploading Your Model Results
🎯 Uploading Your Tasks
How to contribute:
- Design your custom tasks using CBDDL
- Package your tasks following our guidelines
- Submit your tasks to our task store
📝 Detailed Instructions: Uploading Your Tasks
💡 Other Ways to Contribute
- Report Issues: Found a bug? Open an issue
- Improve Documentation: Help us make the docs better
- Feature Requests: Suggest new features or improvements
License
This project is licensed under the Apache 2.0 license - see LICENSE for details.
Acknowledgments
- RoboSuite, LIBERO, and VLABench teams for the framework
- OpenVLA, UniVLA, Openpi, and lerobot teams for pioneering VLA research
- All contributors and the robotics community
VLA-Arena: Advancing Vision-Language-Action Models Through Comprehensive Evaluation
Made with ❤️ by the VLA-Arena Team
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file vla_arena-1.0.0.tar.gz.
File metadata
- Download URL: vla_arena-1.0.0.tar.gz
- Upload date:
- Size: 1.4 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.18
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
92cab3d880ff626acb97e90ecc59ca77464ef6ea2a15a1fab973725b9a350baa
|
|
| MD5 |
24be06d51c125419301fb8100a77c208
|
|
| BLAKE2b-256 |
3921d5bcac3c5cceb8eec2ac006c772b23a28c4e225e88a3a8a8147d94a12acb
|
File details
Details for the file vla_arena-1.0.0-py3-none-any.whl.
File metadata
- Download URL: vla_arena-1.0.0-py3-none-any.whl
- Upload date:
- Size: 2.1 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.18
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
02fa846e67e5427d5240ae158b0cf731fd2c434f061fccf15b7fbcc688ded694
|
|
| MD5 |
2c3203d7a18558fe4c12400d97e9a3e2
|
|
| BLAKE2b-256 |
b971c68f3df034ec4170959ace9a533a0bff0c89ff891835097aaf98db109e9a
|