GUI interaction capture - platform-agnostic event streams with time-aligned media
Project description
OpenAdapt Capture
OpenAdapt Capture is the data collection component of the OpenAdapt GUI automation ecosystem.
Capture platform-agnostic GUI interaction streams with time-aligned screenshots and audio for training ML models or replaying workflows.
Status: Pre-alpha. See docs/DESIGN.md for architecture discussion.
The OpenAdapt Ecosystem
OpenAdapt GUI Automation Pipeline
=================================
+-----------------+ +------------------+ +------------------+
| | | | | |
| openadapt- | ------> | openadapt-ml | ------> | Deploy |
| capture | Convert | (Train & Eval) | Export | (Inference) |
| | | | | |
+-----------------+ +------------------+ +------------------+
| | |
v v v
- Record GUI - Fine-tune VLMs - Run trained
interactions - Evaluate on agent on new
- Mouse, keyboard, benchmarks (WAA) tasks
screen, audio - Compare models - Real-time
- Privacy scrubbing - Cloud GPU training automation
| Component | Purpose | Repository |
|---|---|---|
| openadapt-capture | Record human demonstrations | GitHub |
| openadapt-ml | Train and evaluate GUI automation models | GitHub |
| openadapt-privacy | PII scrubbing for recordings | Coming soon |
Installation
uv add openadapt-capture
This includes everything needed to capture and replay GUI interactions (mouse, keyboard, screen recording).
For audio capture with Whisper transcription (large download):
uv add "openadapt-capture[audio]"
Quick Start
Capture
from openadapt_capture import Recorder
# Record GUI interactions
with Recorder("./my_capture", task_description="Demo task") as recorder:
# Captures mouse, keyboard, and screen until context exits
input("Press Enter to stop recording...")
print(f"Captured {recorder.event_count} events")
Replay / Analysis
from openadapt_capture import Capture
# Load and iterate over time-aligned events
capture = Capture.load("./my_capture")
for action in capture.actions():
# Each action has an associated screenshot
print(f"{action.timestamp}: {action.type} at ({action.x}, {action.y})")
screenshot = action.screenshot # PIL Image at time of action
Low-Level API
from openadapt_capture import (
create_capture, process_events,
MouseDownEvent, MouseButton,
)
# Create storage (platform and screen size auto-detected)
capture, storage = create_capture("./my_capture")
# Write raw events
storage.write_event(MouseDownEvent(timestamp=1.0, x=100, y=200, button=MouseButton.LEFT))
# Query and process
raw_events = storage.get_events()
actions = process_events(raw_events) # Merges clicks, drags, typed text
Event Types
Raw events (captured):
mouse.move,mouse.down,mouse.up,mouse.scrollkey.down,key.upscreen.frame,audio.chunk
Actions (processed):
mouse.singleclick,mouse.doubleclick,mouse.dragkey.type(merged keystrokes → text)
Architecture
capture_directory/
├── capture.db # SQLite: events, metadata
├── video.mp4 # Screen recording
└── audio.flac # Audio (optional)
Performance Statistics
Track event write latency and analyze capture performance:
from openadapt_capture import Recorder
with Recorder("./my_capture") as recorder:
input("Press Enter to stop...")
# Access performance statistics
summary = recorder.stats.summary()
print(f"Mean latency: {summary['mean_latency_ms']:.1f}ms")
# Generate performance plot
recorder.stats.plot(output_path="performance.png")
Frame Extraction Verification
Compare extracted video frames against original images to verify lossless capture:
from openadapt_capture import compare_video_to_images, plot_comparison
# Compare frames
report = compare_video_to_images(
"capture/video.mp4",
[(timestamp, image) for timestamp, image in captured_frames],
)
print(f"Mean diff: {report.mean_diff_overall:.2f}")
print(f"Lossless: {report.is_lossless}")
# Visualize comparison
plot_comparison(report, output_path="comparison.png")
Visualization
Generate animated demos and interactive viewers from recordings:
Animated GIF Demo
from openadapt_capture import Capture, create_demo
capture = Capture.load("./my_capture")
create_demo(capture, output="demo.gif", fps=10, max_duration=15)
Interactive HTML Viewer
from openadapt_capture import Capture, create_html
capture = Capture.load("./my_capture")
create_html(capture, output="viewer.html", include_audio=True)
The HTML viewer includes:
- Timeline scrubber with event markers
- Frame-by-frame navigation
- Synchronized audio playback
- Event list with details panel
- Keyboard shortcuts (Space, arrows, Home/End)
Generate Demo from Command Line
uv run python scripts/generate_readme_demo.py --duration 10
Optional Extras
| Extra | Features |
|---|---|
audio |
Audio capture + Whisper transcription |
privacy |
PII scrubbing (openadapt-privacy) |
all |
Everything |
Training with OpenAdapt-ML
Captured recordings can be used to train vision-language models with openadapt-ml.
End-to-End Workflow
# 1. Capture a workflow demonstration
uv run python -c "
from openadapt_capture import Recorder
with Recorder('./my_capture', task_description='Turn off Night Shift') as recorder:
input('Perform the task, then press Enter to stop...')
"
# 2. Train a model on the capture (requires openadapt-ml)
uv pip install openadapt-ml
uv run python -m openadapt_ml.cloud.local train \
--capture ./my_capture \
--open # Opens training dashboard
# 3. Compare human vs model predictions
uv run python -m openadapt_ml.scripts.compare \
--capture ./my_capture \
--checkpoint checkpoints/model \
--open
Cloud GPU Training
For faster training with cloud GPUs:
# Train on Lambda Labs A10 (~$0.75/hr)
uv run python -m openadapt_ml.cloud.lambda_labs train \
--capture ./my_capture \
--goal "Turn off Night Shift"
See the openadapt-ml documentation for cloud setup.
Data Format
OpenAdapt-ML converts captures to its Episode format automatically:
from openadapt_ml.ingest.capture import capture_to_episode
episode = capture_to_episode("./my_capture")
print(f"Loaded {len(episode.steps)} steps")
print(f"Instruction: {episode.instruction}")
The conversion maps capture event types to ML action types:
mouse.singleclick/mouse.click->CLICKmouse.doubleclick->DOUBLE_CLICKmouse.drag->DRAGmouse.scroll->SCROLLkey.type->TYPE
Development
uv sync --dev
uv run pytest
Related Projects
- openadapt-ml - Train and evaluate GUI automation models
- Windows Agent Arena - Benchmark for Windows GUI agents
License
MIT
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file openadapt_capture-0.3.0.tar.gz.
File metadata
- Download URL: openadapt_capture-0.3.0.tar.gz
- Upload date:
- Size: 10.9 MB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5f04cb816f1e22b17e0f82f405ea4f00ec04642707161490aae1f3a8ff28286e
|
|
| MD5 |
5aee5ef7769992f0fd7afa4bcdbc5b44
|
|
| BLAKE2b-256 |
8c984bb20e7b11e48ddb9f0d022188f889344997e50fdfc151fee3d16bcde48a
|
Provenance
The following attestation bundles were made for openadapt_capture-0.3.0.tar.gz:
Publisher:
release.yml on OpenAdaptAI/openadapt-capture
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
openadapt_capture-0.3.0.tar.gz -
Subject digest:
5f04cb816f1e22b17e0f82f405ea4f00ec04642707161490aae1f3a8ff28286e - Sigstore transparency entry: 925496315
- Sigstore integration time:
-
Permalink:
OpenAdaptAI/openadapt-capture@a25bc1f21e784773d917eb37bf3ad94e05432e9a -
Branch / Tag:
refs/heads/main - Owner: https://github.com/OpenAdaptAI
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@a25bc1f21e784773d917eb37bf3ad94e05432e9a -
Trigger Event:
push
-
Statement type:
File details
Details for the file openadapt_capture-0.3.0-py3-none-any.whl.
File metadata
- Download URL: openadapt_capture-0.3.0-py3-none-any.whl
- Upload date:
- Size: 71.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c58f663355510e9497af47c89a185d0e5cd7844481d0bff816ecd5e7039aeba1
|
|
| MD5 |
d2a16df3dd0301c6e2180931a9e38b02
|
|
| BLAKE2b-256 |
5ef6ed19451455843e427bad1b81fe0c45fa241bf48e98780d15f00d5d2e5773
|
Provenance
The following attestation bundles were made for openadapt_capture-0.3.0-py3-none-any.whl:
Publisher:
release.yml on OpenAdaptAI/openadapt-capture
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
openadapt_capture-0.3.0-py3-none-any.whl -
Subject digest:
c58f663355510e9497af47c89a185d0e5cd7844481d0bff816ecd5e7039aeba1 - Sigstore transparency entry: 925496320
- Sigstore integration time:
-
Permalink:
OpenAdaptAI/openadapt-capture@a25bc1f21e784773d917eb37bf3ad94e05432e9a -
Branch / Tag:
refs/heads/main - Owner: https://github.com/OpenAdaptAI
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@a25bc1f21e784773d917eb37bf3ad94e05432e9a -
Trigger Event:
push
-
Statement type: