Skip to main content

Everything you need to build state-of-the-art foundation multimodal desktop agent, end-to-end.

Project description

Open World Agents

🚀 Open World Agents

Everything you need to build state-of-the-art foundation multimodal desktop agent, end-to-end.

Documentation License: MIT Python 3.11+ GitHub stars

⚠️ Active Development Notice: This codebase is under active development. APIs and components may change, and some may be moved to separate repositories. Documentation may be incomplete or reference features still in development.

📄 Research Paper: This project was first introduced and developed for the D2E project. For more details, see D2E: Scaling Vision-Action Pretraining on Desktop Data for Transfer to Embodied AI. If you find this work useful, please cite our paper.

Quick Start

💡 This is a conceptual overview. See the Quick Start Guide for detailed instructions.

# 1. Record desktop interaction
$ ocap my-session.mcap

# 2. Process to training format
$ python scripts/01_raw_to_event.py --train-dir ./

# 3. Train your model (coming soon)
$ python train.py --dataset ./event-dataset

Installation

# For video recording, install GStreamer first. Skip if you only need data processing.
$ conda install open-world-agents::gstreamer-bundle

# Install OWA
$ pip install owa

Documentation

Resource Description
🏠 Full Documentation Complete docs with all guides and references
📖 Quick Start Guide Complete tutorial: Record → Process → Train
🤗 Community Datasets Browse and share datasets

Core Components

  • 🌍 Environment Framework: "USB-C of desktop agents" - universal interface for native desktop automation with pre-built plugins for desktop control, high-performance screen capture, and zero-configuration plugin system
  • 📊 Data Infrastructure: Complete desktop agent data pipeline from recording to training with OWAMcap format - a universal standard powered by MCAP
  • 🛠️ CLI Tools: Command-line utilities (owl) for recording, analyzing, and managing agent data
  • 🤖 Examples: Complete implementations and training pipelines for multimodal agents

Contributing

We welcome contributions! See our Contributing Guide.

License

MIT License. See LICENSE.

Citation

@article{choi2025d2e,
  title={D2E: Scaling Vision-Action Pretraining on Desktop Data for Transfer to Embodied AI},
  author={Choi, Suwhan and Jung, Jaeyoon and Seong, Haebin and Kim, Minchan and Kim, Minyeong and Cho, Yongjun and Kim, Yoonshik and Park, Yubeen and Yu, Youngjae and Lee, Yunsung},
  journal={arXiv preprint arXiv:2510.05684},
  year={2025}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

owa-0.6.5.tar.gz (24.0 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

owa-0.6.5-py3-none-any.whl (3.6 kB view details)

Uploaded Python 3

File details

Details for the file owa-0.6.5.tar.gz.

File metadata

  • Download URL: owa-0.6.5.tar.gz
  • Upload date:
  • Size: 24.0 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.17 {"installer":{"name":"uv","version":"0.9.17","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":null,"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for owa-0.6.5.tar.gz
Algorithm Hash digest
SHA256 a404c1a52d98f938a1dbcedb444684fcbedca04a700fa371088112f70a190cdd
MD5 d3bb134c0de6233d5c5240d1e4f277a3
BLAKE2b-256 1f8310e44359940660dcbc8b961d7bc8948dd20be10b94c3164704628756ee33

See more details on using hashes here.

File details

Details for the file owa-0.6.5-py3-none-any.whl.

File metadata

  • Download URL: owa-0.6.5-py3-none-any.whl
  • Upload date:
  • Size: 3.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.17 {"installer":{"name":"uv","version":"0.9.17","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":null,"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for owa-0.6.5-py3-none-any.whl
Algorithm Hash digest
SHA256 b460b1d8565d7072dc4701d64a8e34fa7e305ffe7eb9c024e93f05999ddfe7fe
MD5 3aae06e1cbe9f46baca7cc0e8dde41d7
BLAKE2b-256 7e4d177e692d865c4dce67c7bbb49c363075366a632bc0aa97dcd76b566c3794

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page