Everything you need to build state-of-the-art foundation multimodal desktop agent, end-to-end.
Project description
๐ Open World Agents
Everything you need to build state-of-the-art foundation multimodal desktop agent, end-to-end.
Overview
Open World Agents is a comprehensive framework for building AI agents that interact with desktop applications through vision, keyboard, and mouse control. Complete toolkit from data capture to model training and evaluation:
- OWA Core & Environment: Asynchronous, event-driven interface for real-time agents with dynamic plugin activation
- Data Capture & Format: High-performance desktop recording with
OWAMcapformat - a specialized file format that captures screen recordings, keyboard/mouse events, and window information with nanosecond precision, powered by mcap - Environment Plugins: Pre-built plugins for desktop automation, screen capture, and more
- CLI Tools: Command-line utilities for recording, analyzing, and managing agent data
What Can You Build?
Anything that runs on desktop. If a human can do it on a computer, you can build an AI agent to automate it.
๐ค Desktop Automation: Navigate applications, automate workflows, interact with any software
๐ฎ Game AI: Master complex games through visual understanding and real-time decision making
๐ Training Datasets: Capture high-quality human-computer interaction data for foundation models
๐ค Community Datasets: Access and contribute to growing OWAMcap datasets on HuggingFace
๐ Benchmarks: Create and evaluate desktop agent performance across diverse tasks
Project Structure
The repository is organized as a monorepo with multiple sub-repositories under the projects/ directory. Each sub-repository is a self-contained Python package installable via pip or uv and follows namespace packaging conventions.
open-world-agents/
โโโ projects/
โ โโโ mcap-owa-support/ # OWAMcap format support
โ โโโ owa-core/ # Core framework and registry system
โ โโโ owa-msgs/ # Core message definitions with automatic discovery
โ โโโ owa-cli/ # Command-line tools (ocap, owl)
โ โโโ owa-env-desktop/ # Desktop environment plugin
โ โโโ owa-env-example/ # Example environment implementations
โ โโโ owa-env-gst/ # GStreamer-based screen capture
โ โโโ [your-plugin]/ # Contribute your own plugins!
โโโ docs/ # Documentation
โโโ README.md
Core Packages
The easiest way to get started is to install the owa meta-package, which includes all core components and environment plugins:
pip install owa
All OWA packages use namespace packaging and are installed in the owa namespace (e.g., owa.core, owa.cli, owa.env.desktop). For more detail, see Packaging namespace packages. We recommend using uv as the package manager.
| Name | Release in PyPI | Conda | Description |
|---|---|---|---|
owa.core |
Framework foundation with registry system | ||
owa.msgs |
Core message definitions with automatic discovery | ||
owa.cli |
Command-line tools (owl) for data analysis |
||
mcap-owa-support |
OWAMcap format support and utilities | ||
ocap ๐ฅ |
Desktop recorder for multimodal data capture | ||
owa.env.desktop |
Mouse, keyboard, window event handling | ||
owa.env.gst ๐ฅ |
GStreamer-powered screen capture (6x faster) | ||
owa.env.example |
- | - | Reference implementations for learning |
๐ฅ Video Processing Packages: Packages marked with ๐ฅ require GStreamer dependencies. Install
conda install open-world-agents::gstreamer-bundlefirst for full functionality.
๐ฆ Lockstep Versioning: All first-party OWA packages follow lockstep versioning, meaning they share the same version number to ensure compatibility and simplify dependency management.
๐ก Extensible Design: Built for the community! Easily create custom plugins like
owa-env-minecraftorowa-env-webto extend functionality.
Community Packages
Help us grow the ecosystem! ๐ฑ Community-contributed environment plugins extend OWA's capabilities to specialized domains.
Example plugin ideas from the community:
| Example Name | Description |
|---|---|
owa.env.minecraft |
Minecraft automation & bot framework |
owa.env.web |
Browser automation via WebDriver |
owa.env.mobile |
Android/iOS device control |
owa.env.cad |
CAD software automation (AutoCAD, SolidWorks) |
owa.env.trading |
Financial trading platform integration |
๐ก Want to contribute? Check our Plugin Development Guide to create your own
owa.env.*package!๐ญ These are just examples! The community decides what plugins to build. Propose your own ideas or create plugins for any domain you're passionate about.
Desktop Recording with ocap
ocap (Omnimodal CAPture) is a high-performance desktop recorder that captures screen video, audio, keyboard/mouse events, and window events in synchronized formats. Built with Windows APIs and GStreamer for hardware-accelerated recording with H265/HEVC encoding.
- Complete recording: Video + audio + keyboard/mouse + window events
- High performance: Hardware-accelerated, ~100MB/min for 1080p
- Simple usage:
ocap my-recording(stop with Ctrl+C) - Modern formats: MKV for video, MCAP for events
๐ Detailed Documentation: See Desktop Recording Guide for complete setup, usage examples, and troubleshooting.
Quick Start
Basic Environment Usage
import time
from owa.core import CALLABLES, LISTENERS, MESSAGES
# Components and messages automatically available - no activation needed!
def callback():
time_ns = CALLABLES["std/time_ns"]()
print(f"Current time: {time_ns}")
# Access message types through the global registry
KeyboardEvent = MESSAGES['desktop/KeyboardEvent']
print(f"Available message: {KeyboardEvent}")
# Create a listener for std/tick event (every 1 second)
tick = LISTENERS["std/tick"]().configure(callback=callback, interval=1)
# Start listening
tick.start()
time.sleep(2)
tick.stop(), tick.join()
High-Performance Screen Capture
import time
from owa.core import CALLABLES, LISTENERS, MESSAGES
# Components and messages automatically available - no activation needed!
def on_screen_update(frame, metrics):
print(f"๐ธ New frame: {frame.frame_arr.shape}")
print(f"โก Latency: {metrics.latency*1000:.1f}ms")
# Access screen message type from registry
ScreenCaptured = MESSAGES['desktop/ScreenCaptured']
print(f"Frame message type: {ScreenCaptured}")
# Start real-time screen capture
screen = LISTENERS["gst/screen"]().configure(
callback=on_screen_update, fps=60, show_cursor=True
)
with screen.session:
print("๐ฏ Agent is watching your screen...")
time.sleep(5)
Plugin Management with CLI
Explore and manage plugins using the enhanced owl env command:
# List all discovered plugins with enhanced display
$ owl env list --details --table
# Show detailed plugin information with component inspection
$ owl env show example --components --inspect add
# Search for components across all plugins
$ owl env search "mouse.*click" --table
# Quick exploration shortcuts
$ owl env ls desktop # Quick namespace exploration
$ owl env find keyboard # Quick component search
$ owl env namespaces # List all available namespaces
# Ecosystem analysis and health monitoring
$ owl env stats # Show ecosystem statistics
$ owl env health # Perform health check
Message Management with CLI
Explore and manage message types using the new owl messages command:
# List all available message types
$ owl messages list
# Show detailed message schema
$ owl messages show desktop/KeyboardEvent
# Search for specific message types
$ owl messages search keyboard
# Validate message definitions
$ owl messages validate
Powered by the powerful Gstreamer and Windows API, our implementation is 6x faster than comparatives.
| Library | Avg. Time per Frame | Relative Speed |
|---|---|---|
| owa.env.gst | 5.7 ms | โก 1ร (Fastest) |
pyscreenshot |
33 ms | ๐ถโโ๏ธ 5.8ร slower |
PIL |
34 ms | ๐ถโโ๏ธ 6.0ร slower |
MSS |
37 ms | ๐ถโโ๏ธ 6.5ร slower |
PyQt5 |
137 ms | ๐ข 24ร slower |
๐ Tested on: Intel i5-11400, GTX 1650
Not only does owa.env.gst achieve higher FPS, but it also maintains lower CPU/GPU usage, making it the ideal choice for screen recording. Same applies for ocap, since it internally imports owa.env.gst.
Desktop Recording & Dataset Sharing
Record your desktop usage data and share with the community:
# Install GStreamer dependencies (for video recording) and ocap
conda install open-world-agents::gstreamer-bundle && pip install ocap
# Record desktop activity (includes video, audio, events)
ocap my-session
# Upload to HuggingFace, browse community datasets!
# Visit: https://huggingface.co/datasets?other=OWA
Access Community Datasets
๐ง TODO: Community dataset access functionality is under development.
# Load datasets from HuggingFace
from owa.data import load_dataset
# Browse available OWAMcap datasets
datasets = load_dataset.list_available(format="OWA")
# Load a specific dataset
data = load_dataset("open-world-agents/example_dataset")
Data Format Preview
$ owl mcap info example.mcap
library: mcap-owa-support 0.3.2; mcap 1.2.2
profile: owa
messages: 1062
duration: 8.8121584s
start: 2025-05-23T20:04:01.7269392+09:00 (1747998241.726939200)
end: 2025-05-23T20:04:10.5390976+09:00 (1747998250.539097600)
compression:
zstd: [1/1 chunks] [113.42 KiB/17.52 KiB (84.55%)] [1.99 KiB/sec]
channels:
(1) keyboard/state 9 msgs (1.02 Hz) : desktop/KeyboardState [jsonschema]
(2) mouse/state 9 msgs (1.02 Hz) : desktop/MouseState [jsonschema]
(3) window 9 msgs (1.02 Hz) : desktop/WindowInfo [jsonschema]
(4) screen 523 msgs (59.35 Hz) : desktop/ScreenCaptured [jsonschema]
(5) mouse 510 msgs (57.87 Hz) : desktop/MouseEvent [jsonschema]
(6) keyboard 2 msgs (0.23 Hz) : desktop/KeyboardEvent [jsonschema]
channels: 6
attachments: 0
metadata: 0
Installation
Quick Start
# Install all OWA packages
pip install owa
# For video recording/processing, install GStreamer dependencies first:
conda install open-world-agents::gstreamer-bundle
pip install owa
๐ก When do you need GStreamer?
- Video recording with
ocapdesktop recorder- Real-time screen capture with
owa.env.gst- Video processing capabilities
Skip GStreamer if you only need:
- Data processing and analysis
- ML training on existing datasets
- Headless server environments
Editable Install (Development)
For development or contributing to the project, you can install packages in editable mode. For detailed development setup instructions, see the Installation Guide.
Features
- ๐ Asynchronous Processing: Real-time event handling with Callables, Listeners, and Runnables
- ๐งฉ Zero-Configuration Plugin System: Automatic plugin discovery via Entry Points
- ๐ High-Performance Data: 6x faster screen capture with GStreamer integration
- ๐ค HuggingFace Ecosystem: Access growing collection of community OWAMcap datasets
- ๐๏ธ OWAMcap Format: Specialized file format capturing complete desktop interactions (screen + keyboard + mouse + windows) with perfect synchronization
- ๐ ๏ธ Extensible: Community-driven plugin ecosystem
Documentation
- Full Documentation: https://open-world-agents.github.io/open-world-agents/
- Environment Guide: docs/env/
- Data Format: docs/data/
- Plugin Development: docs/env/custom_plugins.md
Contributing
We welcome contributions! Whether you're:
- Building new environment plugins
- Improving performance
- Adding documentation
- Reporting bugs
Please see our Contributing Guide for details.
License
This project is released under the MIT License. See the LICENSE file for details.
๐ง Work in Progress: We're actively developing this framework. Stay tuned for more updates and examples!
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file owa-0.5.0a3.tar.gz.
File metadata
- Download URL: owa-0.5.0a3.tar.gz
- Upload date:
- Size: 19.4 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.7.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ea76bd80b8eb688311e473e56d7879bace9cc90bb78b8ac179bb5bbf108f7b05
|
|
| MD5 |
e6e25717864122984f5a94ed2e1d3678
|
|
| BLAKE2b-256 |
936016a4daa9a279d19adbeb5a5dcfeebd82e46fb9205fb1f8877493ccf63923
|
File details
Details for the file owa-0.5.0a3-py3-none-any.whl.
File metadata
- Download URL: owa-0.5.0a3-py3-none-any.whl
- Upload date:
- Size: 7.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.7.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
25ff7b23e292c24e7aa515de7a23c13bfbbeaf0ff58d710591ef2b2ab02abd03
|
|
| MD5 |
19808553fd8bc6cc8e4f6465f1573cb3
|
|
| BLAKE2b-256 |
db970e2285978adcbecd18dd4d44abd57302cafb84994106e43ad96d481e2c3c
|