GUI automation with ML - record, train, deploy, evaluate
Project description
OpenAdapt: AI-First Process Automation with Large Multimodal Models (LMMs)
OpenAdapt is the open source software adapter between Large Multimodal Models (LMMs) and traditional desktop and web GUIs.
Record GUI demonstrations, train ML models, and evaluate agents - all from a unified CLI.
Join us on Discord | Documentation | OpenAdapt.ai
Architecture
OpenAdapt v1.0+ uses a modular meta-package architecture. The main openadapt package provides a unified CLI and depends on focused sub-packages via PyPI:
| Package | Description | Repository |
|---|---|---|
openadapt |
Meta-package with unified CLI | This repo |
openadapt-capture |
Event recording and storage | openadapt-capture |
openadapt-ml |
ML engine, training, inference | openadapt-ml |
openadapt-evals |
Benchmark evaluation | openadapt-evals |
openadapt-viewer |
HTML visualization | openadapt-viewer |
openadapt-grounding |
UI element localization | openadapt-grounding |
openadapt-retrieval |
Multimodal demo retrieval | openadapt-retrieval |
openadapt-privacy |
PII/PHI scrubbing | openadapt-privacy |
openadapt-wright |
Dev automation | openadapt-wright |
openadapt-herald |
Social media from git history | openadapt-herald |
openadapt-crier |
Telegram approval bot | openadapt-crier |
openadapt-consilium |
Multi-model consensus | openadapt-consilium |
openadapt-desktop |
Desktop GUI application | openadapt-desktop |
openadapt-tray |
System tray app | openadapt-tray |
openadapt-agent |
Production execution engine | openadapt-agent |
openadapt-telemetry |
Error tracking | openadapt-telemetry |
Installation
Install what you need:
pip install openadapt # Minimal CLI only
pip install openadapt[capture] # GUI capture/recording
pip install openadapt[ml] # ML training and inference
pip install openadapt[evals] # Benchmark evaluation
pip install openadapt[privacy] # PII/PHI scrubbing
pip install openadapt[all] # Everything
Requirements: Python 3.10+
Quick Start
1. Record a demonstration
openadapt capture start --name my-task
# Perform actions in your GUI, then press Ctrl+C to stop
2. Train a model
openadapt train start --capture my-task --model qwen3vl-2b
3. Evaluate
openadapt eval run --checkpoint training_output/model.pt --benchmark waa
4. View recordings
openadapt capture view my-task
Ecosystem
Core Platform Components
| Package | Description | Repository |
|---|---|---|
openadapt |
Meta-package with unified CLI | This repo |
openadapt-capture |
Event recording and storage | openadapt-capture |
openadapt-ml |
ML engine, training, inference | openadapt-ml |
openadapt-evals |
Benchmark evaluation | openadapt-evals |
openadapt-viewer |
HTML visualization | openadapt-viewer |
openadapt-grounding |
UI element localization | openadapt-grounding |
openadapt-retrieval |
Multimodal demo retrieval | openadapt-retrieval |
openadapt-privacy |
PII/PHI scrubbing | openadapt-privacy |
Applications and Tools
| Package | Description | Repository |
|---|---|---|
openadapt-desktop |
Desktop GUI application | openadapt-desktop |
openadapt-tray |
System tray app | openadapt-tray |
openadapt-agent |
Production execution engine | openadapt-agent |
openadapt-wright |
Dev automation | openadapt-wright |
openadapt-herald |
Social media from git history | openadapt-herald |
openadapt-crier |
Telegram approval bot | openadapt-crier |
openadapt-consilium |
Multi-model consensus | openadapt-consilium |
openadapt-telemetry |
Error tracking | openadapt-telemetry |
CLI Reference
openadapt capture start --name <name> Start recording
openadapt capture stop Stop recording
openadapt capture list List captures
openadapt capture view <name> Open capture viewer
openadapt train start --capture <name> Train model on capture
openadapt train status Check training progress
openadapt train stop Stop training
openadapt eval run --checkpoint <path> Evaluate trained model
openadapt eval run --agent api-claude Evaluate API agent
openadapt eval mock --tasks 10 Run mock evaluation
openadapt serve --port 8080 Start dashboard server
openadapt version Show installed versions
openadapt doctor Check system requirements
How It Works
See the full Architecture Evolution for detailed documentation.
Three-Phase Pipeline
OpenAdapt follows a streamlined Demonstrate → Learn → Execute pipeline:
1. DEMONSTRATE (Observation Collection)
- Capture: Record user actions and screenshots with
openadapt-capture - Privacy: Scrub PII/PHI from recordings with
openadapt-privacy - Store: Build a searchable demonstration library
2. LEARN (Policy Acquisition)
- Retrieval Path: Embed demonstrations, index them, and enable semantic search
- Training Path: Load demonstrations and fine-tune Vision-Language Models (VLMs)
- Abstraction: Progress from literal replay to template-based automation
3. EXECUTE (Agent Deployment)
- Observe: Take screenshots and gather accessibility information
- Policy: Use demonstration context to decide actions via VLMs (Claude, GPT-4o, Qwen3-VL)
- Ground: Map intentions to specific UI coordinates with
openadapt-grounding - Act: Execute validated actions with safety gates
- Evaluate: Measure success with
openadapt-evalsand feed results back for improvement
Core Approach: Trajectory-Conditioned Disambiguation
Zero-shot VLMs fail on GUI tasks not due to lack of capability, but due to ambiguity in UI affordances. OpenAdapt resolves this by conditioning agents on human demonstrations — "show, don't tell."
| No Retrieval | With Retrieval | |
|---|---|---|
| No Fine-tuning | 46.7% (zero-shot baseline) | 100% (validated, n=45) |
| Fine-tuning | Standard SFT (baseline) | Demo-conditioned FT (planned) |
The bottom-right cell is OpenAdapt's unique value: training models to use demonstrations they haven't seen before, combining retrieval with fine-tuning for maximum accuracy. Phase 2 (retrieval-only prompting) is validated; Phase 3 (demo-conditioned fine-tuning) is in progress.
Validated result: On a controlled macOS benchmark (45 System Settings tasks sharing a common navigation entry point), demo-conditioned prompting improved first-action accuracy from 46.7% to 100%. A length-matched control (+11.1 pp only) confirms the benefit is semantic, not token-length. See the research thesis for methodology and the publication roadmap for limitations.
Industry validation: OpenCUA (NeurIPS 2025 Spotlight, XLANG Lab) reused OpenAdapt's macOS accessibility capture code in their AgentNetTool, but uses demos only for model training — not runtime conditioning. No open-source CUA framework currently does demo-conditioned inference, which remains OpenAdapt's architectural differentiator.
Key Concepts
- Policy/Grounding Separation: The Policy decides what to do; Grounding determines where to do it
- Safety Gate: Runtime validation layer before action execution (confirm mode for high-risk actions)
- Abstraction Ladder: Progressive generalization from literal replay to goal-level automation
- Evaluation-Driven Feedback: Success traces become new training data
Terminology
| Term | Description |
|---|---|
| Observation | What the agent perceives (screenshot, accessibility tree) |
| Action | What the agent does (click, type, scroll, etc.) |
| Trajectory | Sequence of observation-action pairs |
| Demonstration | Human-provided example trajectory |
| Policy | Decision-making component that maps observations to actions |
| Grounding | Mapping intent to specific UI elements (coordinates) |
Demos
Legacy Version (v0.46.0) Examples:
- Twitter Demo - Early OpenAdapt demonstration
- Loom Video - Process automation walkthrough
Note: These demos show the legacy monolithic version. For current v1.0+ modular architecture examples, see the documentation.
Permissions
macOS: Grant Accessibility, Screen Recording, and Input Monitoring permissions to your terminal. See permissions guide.
Windows: Run as Administrator if needed for input capture.
Legacy Version
The monolithic OpenAdapt codebase (v0.46.0) is preserved in the legacy/ directory.
To use the legacy version:
pip install openadapt==0.46.0
See docs/LEGACY_FREEZE.md for migration guide and details.
Contributing
- Join Discord
- Pick an issue from the relevant sub-package repository
- Submit a PR
For sub-package development:
git clone https://github.com/OpenAdaptAI/openadapt-ml # or other sub-package
cd openadapt-ml
pip install -e ".[dev]"
Related Projects
- OpenAdaptAI/SoM - Set-of-Mark prompting
- OpenAdaptAI/pynput - Input monitoring fork
- OpenAdaptAI/atomacos - macOS accessibility
Support
- Discord: https://discord.gg/yF527cQbDG
- Issues: Use the relevant sub-package repository
- Architecture docs: GitHub Wiki
License
MIT License - see LICENSE for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file openadapt-1.2.1.tar.gz.
File metadata
- Download URL: openadapt-1.2.1.tar.gz
- Upload date:
- Size: 4.5 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.3.2 CPython/3.10.19 Linux/6.14.0-1017-azure
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
99f44c109fa6d65b8d557c4629e9fa64647cde6f0f90ddf05b4c2ed32535420b
|
|
| MD5 |
973de36a7d60b254892e8702f975cc7a
|
|
| BLAKE2b-256 |
3c91889d3716f4e669c6cac6c186fbd5c0cca6f8bf5e3c5d9a69dcc160a8ffc6
|
File details
Details for the file openadapt-1.2.1-py3-none-any.whl.
File metadata
- Download URL: openadapt-1.2.1-py3-none-any.whl
- Upload date:
- Size: 13.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.3.2 CPython/3.10.19 Linux/6.14.0-1017-azure
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
75219a604bd2bceaeb579f6028afc51c5b459222b1fb2b8fbead57c4f99851eb
|
|
| MD5 |
feacc0423b77261a3da77cdac6f9e9c2
|
|
| BLAKE2b-256 |
91897b396b9762d2043d5a7612e382772b341b158bdadb0b53a216efb4c18410
|