Skip to main content

Everything you need to build state-of-the-art foundation multimodal desktop agent, end-to-end.

Project description

Open World Agents

๐Ÿš€ Open World Agents

Everything you need to build state-of-the-art foundation multimodal desktop agent, end-to-end.

Documentation License: MIT Python 3.11+ GitHub stars

Overview

Open World Agents is a comprehensive framework for building AI agents that interact with desktop applications through vision, keyboard, and mouse control. Complete toolkit from data capture to model training and evaluation:

  • OWA Core & Environment: Asynchronous, event-driven interface for real-time agents with dynamic plugin activation
  • Data Capture & Format: High-performance desktop recording with OWAMcap format powered by mcap
  • Environment Plugins: Pre-built plugins for desktop automation, screen capture, and more
  • CLI Tools: Command-line utilities for recording, analyzing, and managing agent data

What Can You Build?

Anything that runs on desktop. If a human can do it on a computer, you can build an AI agent to automate it.

๐Ÿค– Desktop Automation: Navigate applications, automate workflows, interact with any software
๐ŸŽฎ Game AI: Master complex games through visual understanding and real-time decision making
๐Ÿ“Š Training Datasets: Capture high-quality human-computer interaction data for foundation models
๐Ÿค— Community Datasets: Access and contribute to growing OWAMcap datasets on HuggingFace
๐Ÿ“ˆ Benchmarks: Create and evaluate desktop agent performance across diverse tasks

Project Structure

The repository is organized as a monorepo with multiple sub-repositories under the projects/ directory. Each sub-repository is a self-contained Python package installable via pip or uv and follows namespace packaging conventions.

open-world-agents/
โ”œโ”€โ”€ projects/
โ”‚   โ”œโ”€โ”€ mcap-owa-support/     # OWAMcap format support
โ”‚   โ”œโ”€โ”€ owa-core/             # Core framework and registry system
โ”‚   โ”œโ”€โ”€ owa-cli/              # Command-line tools (ocap, owl)
โ”‚   โ”œโ”€โ”€ owa-env-desktop/      # Desktop environment plugin
โ”‚   โ”œโ”€โ”€ owa-env-example/      # Example environment implementations
โ”‚   โ”œโ”€โ”€ owa-env-gst/          # GStreamer-based screen capture
โ”‚   โ””โ”€โ”€ [your-plugin]/        # Contribute your own plugins!
โ”œโ”€โ”€ docs/                     # Documentation
โ””โ”€โ”€ README.md

Core Packages

owa owa

The easiest way to get started is to install the owa meta-package, which includes all core components and environment plugins:

pip install owa
# or
conda install owa

All OWA packages use namespace packaging and are installed in the owa namespace (e.g., owa.core, owa.cli, owa.env.desktop). For more detail, see Packaging namespace packages. We recommend using uv as the package manager.

Name Release in PyPI Conda Description
owa.core owa-core owa-core Framework foundation with registry system
owa.cli owa-cli owa-cli Command-line tools (owl) for data analysis
mcap-owa-support mcap-owa-support mcap-owa-support OWAMcap format support and utilities
ocap ๐ŸŽฅ ocap ocap Desktop recorder for multimodal data capture
owa.env.desktop owa-env-desktop owa-env-desktop Mouse, keyboard, window event handling
owa.env.gst ๐ŸŽฅ owa-env-gst owa-env-gst GStreamer-powered screen capture (6x faster)
owa.env.example - - Reference implementations for learning

๐ŸŽฅ Video Processing Packages: Packages marked with ๐ŸŽฅ (including owa) require GStreamer for full functionality (recording, real-time capture). For headless training/data processing only, GStreamer is optional. Use conda install for complete features, pip install works for basic functionality.

๐Ÿ“ฆ Lockstep Versioning: All first-party OWA packages follow lockstep versioning, meaning they share the same version number to ensure compatibility and simplify dependency management.

๐Ÿ’ก Extensible Design: Built for the community! Easily create custom plugins like owa-env-minecraft or owa-env-web to extend functionality.

Community Packages

Help us grow the ecosystem! ๐ŸŒฑ Community-contributed environment plugins extend OWA's capabilities to specialized domains.

Example plugin ideas from the community:

Example Name Description
owa.env.minecraft Minecraft automation & bot framework
owa.env.web Browser automation via WebDriver
owa.env.mobile Android/iOS device control
owa.env.cad CAD software automation (AutoCAD, SolidWorks)
owa.env.trading Financial trading platform integration

๐Ÿ’ก Want to contribute? Check our Plugin Development Guide to create your own owa.env.* package!

๐Ÿ’ญ These are just examples! The community decides what plugins to build. Propose your own ideas or create plugins for any domain you're passionate about.

Desktop Recording with ocap

ocap (Omnimodal CAPture) is a high-performance desktop recorder that captures screen video, audio, keyboard/mouse events, and window events in synchronized formats. Built with Windows APIs and GStreamer for hardware-accelerated recording with H265/HEVC encoding.

  • Complete recording: Video + audio + keyboard/mouse + window events
  • High performance: Hardware-accelerated, ~100MB/min for 1080p
  • Simple usage: ocap my-recording (stop with Ctrl+C)
  • Modern formats: MKV for video, MCAP for events

๐Ÿ“– Detailed Documentation: See Desktop Recording Guide for complete setup, usage examples, and troubleshooting.

Quick Start

Basic Environment Usage

import time
from owa.core.registry import CALLABLES, LISTENERS, activate_module

# Activate the standard environment module
activate_module("owa.env.std")

def callback():
    time_ns = CALLABLES["clock.time_ns"]()
    print(f"Current time: {time_ns}")

# Create a listener for clock/tick event (every 1 second)
tick = LISTENERS["clock/tick"]().configure(callback=callback, interval=1)

# Start listening
tick.start()
time.sleep(2)
tick.stop(), tick.join()

High-Performance Screen Capture

import time
from owa.core.registry import CALLABLES, LISTENERS, activate_module

# Activate gst environment
activate_module("owa.env.gst")

def on_screen_update(frame, metrics):
    print(f"๐Ÿ“ธ New frame: {frame.frame_arr.shape}")
    print(f"โšก Latency: {metrics.latency*1000:.1f}ms")

# Start real-time screen capture
screen = LISTENERS["screen"]().configure(
    callback=on_screen_update, fps=60, show_cursor=True
)

with screen.session:
    print("๐ŸŽฏ Agent is watching your screen...")
    time.sleep(5)

Powered by the powerful Gstreamer and Windows API, our implementation is 6x faster than comparatives.

Library Avg. Time per Frame Relative Speed
owa.env.gst 5.7 ms โšก 1ร— (Fastest)
pyscreenshot 33 ms ๐Ÿšถโ€โ™‚๏ธ 5.8ร— slower
PIL 34 ms ๐Ÿšถโ€โ™‚๏ธ 6.0ร— slower
MSS 37 ms ๐Ÿšถโ€โ™‚๏ธ 6.5ร— slower
PyQt5 137 ms ๐Ÿข 24ร— slower

๐Ÿ“Œ Tested on: Intel i5-11400, GTX 1650

Not only does owa.env.gst achieve higher FPS, but it also maintains lower CPU/GPU usage, making it the ideal choice for screen recording. Same applies for ocap, since it internally imports owa.env.gst.

Desktop Recording & Dataset Sharing

Record your desktop usage data and share with the community:

# Install desktop recorder
conda install ocap

# Record desktop activity (includes video, audio, events)
ocap my-session

# Upload to HuggingFace, browse community datasets!
# Visit: https://huggingface.co/datasets?other=owamcap

Access Community Datasets

๐Ÿšง TODO: Community dataset access functionality is under development.

# Load datasets from HuggingFace
from owa.data import load_dataset

# Browse available OWAMcap datasets
datasets = load_dataset.list_available(format="owamcap")

# Load a specific dataset
data = load_dataset("username/desktop-workflow-v1")

Data Format Preview

$ owl mcap info example.mcap
library:   mcap-owa-support 0.3.2; mcap 1.2.2
profile:   owa
messages:  1062
duration:  8.8121584s
start:     2025-05-23T20:04:01.7269392+09:00 (1747998241.726939200)
end:       2025-05-23T20:04:10.5390976+09:00 (1747998250.539097600)
compression:
        zstd: [1/1 chunks] [113.42 KiB/17.52 KiB (84.55%)] [1.99 KiB/sec]
channels:
        (1) keyboard/state    9 msgs (1.02 Hz)    : owa.env.desktop.msg.KeyboardState [jsonschema]
        (2) mouse/state       9 msgs (1.02 Hz)    : owa.env.desktop.msg.MouseState [jsonschema]
        (3) window            9 msgs (1.02 Hz)    : owa.env.desktop.msg.WindowInfo [jsonschema]
        (4) screen          523 msgs (59.35 Hz)   : owa.env.gst.msg.ScreenEmitted [jsonschema]
        (5) mouse           510 msgs (57.87 Hz)   : owa.env.desktop.msg.MouseEvent [jsonschema]
        (6) keyboard          2 msgs (0.23 Hz)    : owa.env.desktop.msg.KeyboardEvent [jsonschema]
channels: 6
attachments: 0
metadata: 0

Installation

Quick Start

# Full installation with video processing capabilities and gstreamer
conda install owa

# For headless servers (data processing/ML training only)
pip install owa

๐Ÿ’ก GStreamer Dependencies:

  • Need video recording/processing? Use conda install owa or conda install owa-env-gst
  • Headless server/data processing only? pip install owa is sufficient
  • Why conda for GStreamer? GStreamer has complex native dependencies (pygobject, gst-python, gst-plugins, etc.) that conda handles automatically

Editable Install (Development)

For development or contributing to the project, you can install packages in editable mode. For detailed development setup instructions, see the Installation Guide.

Features

  • ๐Ÿ”„ Asynchronous Processing: Real-time event handling with Callables, Listeners, and Runnables
  • ๐Ÿงฉ Dynamic Plugin System: Runtime plugin activation and registration
  • ๐Ÿ“Š High-Performance Data: 6x faster screen capture with GStreamer integration
  • ๐Ÿค— HuggingFace Ecosystem: Access growing collection of community OWAMcap datasets
  • ๐Ÿ—‚๏ธ OWAMcap Format: Self-contained, flexible multimodal data containers
  • ๐Ÿ› ๏ธ Extensible: Community-driven plugin ecosystem

Documentation

Contributing

We welcome contributions! Whether you're:

  • Building new environment plugins
  • Improving performance
  • Adding documentation
  • Reporting bugs

Please see our Contributing Guide for details.

License

This project is released under the MIT License. See the LICENSE file for details.


๐Ÿšง Work in Progress: We're actively developing this framework. Stay tuned for more updates and examples!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

owa-0.3.9.tar.gz (18.7 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

owa-0.3.9-py3-none-any.whl (7.0 kB view details)

Uploaded Python 3

File details

Details for the file owa-0.3.9.tar.gz.

File metadata

  • Download URL: owa-0.3.9.tar.gz
  • Upload date:
  • Size: 18.7 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.7.3

File hashes

Hashes for owa-0.3.9.tar.gz
Algorithm Hash digest
SHA256 c42d53be93ca79818eb61e5cd01cb124fbcbbb1a87fcded8e99f10a80e68a84f
MD5 9acf24cb5728aa75f5dfc8d992db23c5
BLAKE2b-256 e5063a478132c33375b9a37ea99275122214bff514355a81a89f1d724287e8c4

See more details on using hashes here.

File details

Details for the file owa-0.3.9-py3-none-any.whl.

File metadata

  • Download URL: owa-0.3.9-py3-none-any.whl
  • Upload date:
  • Size: 7.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.7.3

File hashes

Hashes for owa-0.3.9-py3-none-any.whl
Algorithm Hash digest
SHA256 f8cf89f696f1a9e2ea1179a018ac4d0855d7914c9b54d6c8d8000a6a4a4dcffe
MD5 57659ac7dcd1c8370ca5045102de397e
BLAKE2b-256 f60b0e4f4e16a69fd7f5da893b4b7e75eba5c4102f324d8ee819345918ddbdd0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page