Everything you need to build state-of-the-art foundation multimodal desktop agent, end-to-end.
Project description
🚀 Open World Agents
Everything you need to build state-of-the-art foundation multimodal desktop agent, end-to-end.
⚠️ Active Development Notice: This codebase is under active development. APIs and components may change, and some may be moved to separate repositories. Documentation may be incomplete or reference features still in development.
📄 Research Paper: This project was first introduced and developed for the D2E project. For more details, see D2E: Scaling Vision-Action Pretraining on Desktop Data for Transfer to Embodied AI. If you find this work useful, please cite our paper.
Quick Start
💡 This is a conceptual overview. See the Quick Start Guide for detailed instructions.
# 1. Record desktop interaction
$ ocap my-session.mcap
# 2. Process to training format
$ python scripts/01_raw_to_event.py --train-dir ./
# 3. Train your model (coming soon)
$ python train.py --dataset ./event-dataset
Installation
# For video recording, install GStreamer first. Skip if you only need data processing.
$ conda install open-world-agents::gstreamer-bundle
# Install OWA
$ pip install owa
Documentation
| Resource | Description |
|---|---|
| 🏠 Full Documentation | Complete docs with all guides and references |
| 📖 Quick Start Guide | Complete tutorial: Record → Process → Train |
| 🤗 Community Datasets | Browse and share datasets |
Core Components
- 🌍 Environment Framework: "USB-C of desktop agents" - universal interface for native desktop automation with pre-built plugins for desktop control, high-performance screen capture, and zero-configuration plugin system
- 📊 Data Infrastructure: Complete desktop agent data pipeline from recording to training with
OWAMcapformat - a universal standard powered by MCAP - 🛠️ CLI Tools: Command-line utilities (
owl) for recording, analyzing, and managing agent data - 🤖 Examples: Complete implementations and training pipelines for multimodal agents
Contributing
We welcome contributions! See our Contributing Guide.
License
MIT License. See LICENSE.
Citation
@article{choi2025d2e,
title={D2E: Scaling Vision-Action Pretraining on Desktop Data for Transfer to Embodied AI},
author={Choi, Suwhan and Jung, Jaeyoon and Seong, Haebin and Kim, Minchan and Kim, Minyeong and Cho, Yongjun and Kim, Yoonshik and Park, Yubeen and Yu, Youngjae and Lee, Yunsung},
journal={arXiv preprint arXiv:2510.05684},
year={2025}
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file owa-0.6.5.tar.gz.
File metadata
- Download URL: owa-0.6.5.tar.gz
- Upload date:
- Size: 24.0 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.17 {"installer":{"name":"uv","version":"0.9.17","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":null,"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a404c1a52d98f938a1dbcedb444684fcbedca04a700fa371088112f70a190cdd
|
|
| MD5 |
d3bb134c0de6233d5c5240d1e4f277a3
|
|
| BLAKE2b-256 |
1f8310e44359940660dcbc8b961d7bc8948dd20be10b94c3164704628756ee33
|
File details
Details for the file owa-0.6.5-py3-none-any.whl.
File metadata
- Download URL: owa-0.6.5-py3-none-any.whl
- Upload date:
- Size: 3.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.17 {"installer":{"name":"uv","version":"0.9.17","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":null,"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b460b1d8565d7072dc4701d64a8e34fa7e305ffe7eb9c024e93f05999ddfe7fe
|
|
| MD5 |
3aae06e1cbe9f46baca7cc0e8dde41d7
|
|
| BLAKE2b-256 |
7e4d177e692d865c4dce67c7bbb49c363075366a632bc0aa97dcd76b566c3794
|