Skip to main content

Everything you need to build state-of-the-art foundation multimodal desktop agent, end-to-end.

Project description

Open World Agents

๐Ÿš€ Open World Agents

Everything you need to build state-of-the-art foundation multimodal desktop agent, end-to-end.

Documentation License: MIT Python 3.11+ GitHub stars

Overview

Everything you need to build state-of-the-art foundation multimodal desktop agents, end-to-end.

Open World Agents is a comprehensive framework for building AI agents that can interact with any desktop application through vision, keyboard, and mouse control. From data capture to model training and real-time evaluation, we provide the complete toolkit:

  • OWA Core & Environment: Asynchronous, event-driven interface for real-time agents with dynamic plugin activation
  • Data Capture & Format: High-performance desktop recording with the OWAMcap format powered by mcap
  • Environment Plugins: Pre-built plugins for desktop automation, screen capture, and more
  • CLI Tools: Command-line utilities for recording, analyzing, and managing agent data

What Can You Build?

Anything that runs on desktop. Open World Agents provides a universal interface to interact with any desktop application, game, or software through vision, keyboard, and mouse control. If a human can do it on a computer, you can build an AI agent to automate it.

๐Ÿค– Desktop Automation Agents: Navigate complex applications, automate workflows, and interact with any software interface

๐ŸŽฎ Game AI Agents: Master complex games by understanding visual interfaces, game mechanics, and real-time decision making

๐Ÿ“Š Multimodal Training Datasets: Capture high-quality human-computer interaction data for training foundation models

๐Ÿค— Community-Driven Datasets: Access and contribute to a growing collection of open-source OWAMcap datasets on HuggingFace

๐Ÿ“ˆ Real-Time Benchmarks: Create and evaluate desktop agent performance across diverse applications and tasks

Project Structure

The repository is organized as a monorepo with multiple sub-repositories under the projects/ directory. Each sub-repository is a self-contained Python package installable via pip or uv and follows namespace packaging conventions.

open-world-agents/
โ”œโ”€โ”€ projects/
โ”‚   โ”œโ”€โ”€ mcap-owa-support/     # OWAMcap format support
โ”‚   โ”œโ”€โ”€ owa-core/             # Core framework and registry system
โ”‚   โ”œโ”€โ”€ owa-cli/              # Command-line tools (ocap, owl)
โ”‚   โ”œโ”€โ”€ owa-env-desktop/      # Desktop environment plugin
โ”‚   โ”œโ”€โ”€ owa-env-example/      # Example environment implementations
โ”‚   โ”œโ”€โ”€ owa-env-gst/          # GStreamer-based screen capture
โ”‚   โ””โ”€โ”€ [your-plugin]/        # Contribute your own plugins!
โ”œโ”€โ”€ docs/                     # Documentation
โ””โ”€โ”€ README.md

Python Packages

All OWA packages are installed in the owa namespace (e.g., owa.core, owa.cli, owa.env.desktop). We recommend using uv as the package manager.

๐Ÿ“ฆ Lockstep Versioning: All first-party OWA packages follow lockstep versioning, meaning they share the same version number to ensure compatibility and simplify dependency management.

The owa meta-package

owa owa

The easiest way to get started is to install the owa meta-package, which includes all core components and environment plugins:

pip install owa
# or
conda install owa

This installs: mcap-owa-support, ocap, owa-cli, owa-core, owa-env-desktop, and owa-env-gst.

Core Packages

Name Release in PyPI Conda Description
owa.core owa-core owa-core Framework foundation with registry system
owa.cli owa-cli owa-cli Command-line tools (owl) for data analysis
mcap-owa-support mcap-owa-support mcap-owa-support OWAMcap format support and utilities

CLI Tools

Name Release in PyPI Conda Description
ocap ocap ocap Desktop recorder for multimodal data capture

โš ๏ธ GStreamer Required: ocap requires GStreamer for video processing. Use conda install owa-env-gst for easy setup.

ocap (Omnimodal CAPture) is a high-performance desktop recorder that captures screen video, audio, keyboard/mouse events, and window events in synchronized formats. Built with Windows APIs and GStreamer for hardware-accelerated recording with H265/HEVC encoding. Learn more...

  • Complete recording: Video + audio + keyboard/mouse + window events
  • High performance: Hardware-accelerated, ~100MB/min for 1080p
  • Simple usage: ocap my-recording (stop with Ctrl+C)
  • Modern formats: MKV for video, MCAP for events

Environment Plugins

Name Release in PyPI Conda Description
owa.env.desktop owa-env-desktop owa-env-desktop Mouse, keyboard, window event handling
owa.env.gst owa-env-gst owa-env-gst GStreamer-powered screen capture (6x faster)
owa.env.example - - Reference implementations for learning

โš ๏ธ GStreamer Required: Packages marked with video capabilities need GStreamer installed. To utilize full features, install with conda, not pip.

๐Ÿ’ก Extensible Design: Built for the community! Easily create custom plugins like owa-env-minecraft or owa-env-web to extend functionality.

Quick Start

Basic Environment Usage

import time
from owa.core.registry import CALLABLES, LISTENERS, activate_module

# Activate the standard environment module
activate_module("owa.env.std")

def callback():
    time_ns = CALLABLES["clock.time_ns"]()
    print(f"Current time: {time_ns}")

# Create a listener for clock/tick event (every 1 second)
tick = LISTENERS["clock/tick"]().configure(callback=callback, interval=1)

# Start listening
tick.start()
time.sleep(2)
tick.stop(), tick.join()

Desktop Recording & Dataset Sharing

Record your desktop usage data and share with the community:

# Install desktop recorder
conda install ocap

# Record desktop activity (includes video, audio, events)
ocap my-session

# Upload to HuggingFace, browse community datasets!
# Visit: https://huggingface.co/datasets?other=owamcap

Access Community Datasets

๐Ÿšง TODO: Community dataset access functionality is under development.

# Load datasets from HuggingFace
from owa.data import load_dataset

# Browse available OWAMcap datasets
datasets = load_dataset.list_available(format="owamcap")

# Load a specific dataset
data = load_dataset("username/desktop-workflow-v1")

Data Format Preview

$ owl mcap info example.mcap
library:   mcap-owa-support 0.3.2; mcap 1.2.2
profile:   owa
messages:  1062
duration:  8.8121584s
start:     2025-05-23T20:04:01.7269392+09:00 (1747998241.726939200)
end:       2025-05-23T20:04:10.5390976+09:00 (1747998250.539097600)
compression:
        zstd: [1/1 chunks] [113.42 KiB/17.52 KiB (84.55%)] [1.99 KiB/sec]
channels:
        (1) keyboard/state    9 msgs (1.02 Hz)    : owa.env.desktop.msg.KeyboardState [jsonschema]
        (2) mouse/state       9 msgs (1.02 Hz)    : owa.env.desktop.msg.MouseState [jsonschema]
        (3) window            9 msgs (1.02 Hz)    : owa.env.desktop.msg.WindowInfo [jsonschema]
        (4) screen          523 msgs (59.35 Hz)   : owa.env.gst.msg.ScreenEmitted [jsonschema]
        (5) mouse           510 msgs (57.87 Hz)   : owa.env.desktop.msg.MouseEvent [jsonschema]
        (6) keyboard          2 msgs (0.23 Hz)    : owa.env.desktop.msg.KeyboardEvent [jsonschema]
channels: 6
attachments: 0
metadata: 0

Installation

Quick Start

# Full installation with video processing capabilities and gstreamer
conda install owa

# For headless servers (data processing/ML training only)
pip install owa

๐Ÿ’ก GStreamer Dependencies:

  • Need video recording/processing? Use conda install owa or conda install owa-env-gst
  • Headless server/data processing only? pip install owa is sufficient
  • Why conda for GStreamer? GStreamer has complex native dependencies (pygobject, gst-python, gst-plugins, etc.) that conda handles automatically

Editable Install (Development)

For development or contributing to the project, you can install packages in editable mode. For detailed development setup instructions, see the Installation Guide.

Features

  • ๐Ÿ”„ Asynchronous Processing: Real-time event handling with Callables, Listeners, and Runnables
  • ๐Ÿงฉ Dynamic Plugin System: Runtime plugin activation and registration
  • ๐Ÿ“Š High-Performance Data: 6x faster screen capture with GStreamer integration
  • ๐Ÿค— HuggingFace Ecosystem: Access growing collection of community OWAMcap datasets
  • ๐Ÿ—‚๏ธ OWAMcap Format: Self-contained, flexible multimodal data containers
  • ๐Ÿ› ๏ธ Extensible: Community-driven plugin ecosystem

Documentation

Contributing

We welcome contributions! Whether you're:

  • Building new environment plugins
  • Improving performance
  • Adding documentation
  • Reporting bugs

Please see our Contributing Guide for details.

License

This project is released under the MIT License. See the LICENSE file for details.


๐Ÿšง Work in Progress: We're actively developing this framework. Stay tuned for more updates and examples!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

owa-0.3.9a6.tar.gz (12.6 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

owa-0.3.9a6-py3-none-any.whl (6.2 kB view details)

Uploaded Python 3

File details

Details for the file owa-0.3.9a6.tar.gz.

File metadata

  • Download URL: owa-0.3.9a6.tar.gz
  • Upload date:
  • Size: 12.6 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.7.3

File hashes

Hashes for owa-0.3.9a6.tar.gz
Algorithm Hash digest
SHA256 269886aa084ac3b7b315f59925835b3c4f0550ca8ac9f16cfcfa5bacd601988c
MD5 343a9f902ba9c46986ddd779766f5c5f
BLAKE2b-256 e4601f1c5f17c693c849fca227ff22d96bc09e75ca441abaf9fbd2415b3c8b89

See more details on using hashes here.

File details

Details for the file owa-0.3.9a6-py3-none-any.whl.

File metadata

  • Download URL: owa-0.3.9a6-py3-none-any.whl
  • Upload date:
  • Size: 6.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.7.3

File hashes

Hashes for owa-0.3.9a6-py3-none-any.whl
Algorithm Hash digest
SHA256 7400aea3f73b3ec067efbf3c7d8f9d0d73d623ab3784e2cdcd206a2588b97fba
MD5 ca5794bf2475b67781fd11686a43aceb
BLAKE2b-256 1fef25f6ccfc5c0ab5f26852a63e48925709dc605d44fb3d3c7e5b05f1650cef

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page