Skip to main content

GUI automation with ML - record, train, deploy, evaluate

Project description

OpenAdapt: AI-First Process Automation with Large Multimodal Models (LMMs)

Build Status PyPI version Downloads License: MIT Python 3.10+ Discord

OpenAdapt is the open source software adapter between Large Multimodal Models (LMMs) and traditional desktop and web GUIs.

Record GUI demonstrations, train ML models, and evaluate agents - all from a unified CLI.

Join us on Discord | Documentation | OpenAdapt.ai


Architecture

OpenAdapt v1.0+ uses a modular meta-package architecture. The main openadapt package provides a unified CLI and depends on focused sub-packages via PyPI:

Package Description Repository
openadapt Meta-package with unified CLI This repo
openadapt-capture Event recording and storage openadapt-capture
openadapt-ml ML engine, training, inference openadapt-ml
openadapt-evals Benchmark evaluation openadapt-evals
openadapt-viewer HTML visualization openadapt-viewer
openadapt-grounding UI element localization openadapt-grounding
openadapt-retrieval Multimodal demo retrieval openadapt-retrieval
openadapt-privacy PII/PHI scrubbing openadapt-privacy
openadapt-wright Dev automation openadapt-wright
openadapt-herald Social media from git history openadapt-herald
openadapt-crier Telegram approval bot openadapt-crier
openadapt-consilium Multi-model consensus openadapt-consilium
openadapt-desktop Desktop GUI application openadapt-desktop
openadapt-tray System tray app openadapt-tray
openadapt-agent Production execution engine openadapt-agent
openadapt-telemetry Error tracking openadapt-telemetry

Installation

Install what you need:

pip install openadapt              # Minimal CLI only
pip install openadapt[capture]     # GUI capture/recording
pip install openadapt[ml]          # ML training and inference
pip install openadapt[evals]       # Benchmark evaluation
pip install openadapt[privacy]     # PII/PHI scrubbing
pip install openadapt[all]         # Everything

Requirements: Python 3.10+


Quick Start

1. Record a demonstration

openadapt capture start --name my-task
# Perform actions in your GUI, then press Ctrl+C to stop

2. Train a model

openadapt train start --capture my-task --model qwen3vl-2b

3. Evaluate

openadapt eval run --checkpoint training_output/model.pt --benchmark waa

4. View recordings

openadapt capture view my-task

Ecosystem

Core Platform Components

Package Description Repository
openadapt Meta-package with unified CLI This repo
openadapt-capture Event recording and storage openadapt-capture
openadapt-ml ML engine, training, inference openadapt-ml
openadapt-evals Benchmark evaluation openadapt-evals
openadapt-viewer HTML visualization openadapt-viewer
openadapt-grounding UI element localization openadapt-grounding
openadapt-retrieval Multimodal demo retrieval openadapt-retrieval
openadapt-privacy PII/PHI scrubbing openadapt-privacy

Applications and Tools

Package Description Repository
openadapt-desktop Desktop GUI application openadapt-desktop
openadapt-tray System tray app openadapt-tray
openadapt-agent Production execution engine openadapt-agent
openadapt-wright Dev automation openadapt-wright
openadapt-herald Social media from git history openadapt-herald
openadapt-crier Telegram approval bot openadapt-crier
openadapt-consilium Multi-model consensus openadapt-consilium
openadapt-telemetry Error tracking openadapt-telemetry

CLI Reference

openadapt capture start --name <name>    Start recording
openadapt capture stop                    Stop recording
openadapt capture list                    List captures
openadapt capture view <name>             Open capture viewer

openadapt train start --capture <name>    Train model on capture
openadapt train status                    Check training progress
openadapt train stop                      Stop training

openadapt eval run --checkpoint <path>    Evaluate trained model
openadapt eval run --agent api-claude     Evaluate API agent
openadapt eval mock --tasks 10            Run mock evaluation

openadapt serve --port 8080               Start dashboard server
openadapt version                         Show installed versions
openadapt doctor                          Check system requirements

How It Works

See the full Architecture Evolution for detailed documentation.

Three-Phase Pipeline

OpenAdapt follows a streamlined Demonstrate → Learn → Execute pipeline:

1. DEMONSTRATE (Observation Collection)

  • Capture: Record user actions and screenshots with openadapt-capture
  • Privacy: Scrub PII/PHI from recordings with openadapt-privacy
  • Store: Build a searchable demonstration library

2. LEARN (Policy Acquisition)

  • Retrieval Path: Embed demonstrations, index them, and enable semantic search
  • Training Path: Load demonstrations and fine-tune Vision-Language Models (VLMs)
  • Abstraction: Progress from literal replay to template-based automation

3. EXECUTE (Agent Deployment)

  • Observe: Take screenshots and gather accessibility information
  • Policy: Use demonstration context to decide actions via VLMs (Claude, GPT-4o, Qwen3-VL)
  • Ground: Map intentions to specific UI coordinates with openadapt-grounding
  • Act: Execute validated actions with safety gates
  • Evaluate: Measure success with openadapt-evals and feed results back for improvement

Core Approach: Trajectory-Conditioned Disambiguation

Zero-shot VLMs fail on GUI tasks not due to lack of capability, but due to ambiguity in UI affordances. OpenAdapt resolves this by conditioning agents on human demonstrations — "show, don't tell."

No Retrieval With Retrieval
No Fine-tuning 46.7% (zero-shot baseline) 100% (validated, n=45)
Fine-tuning Standard SFT (baseline) Demo-conditioned FT (planned)

The bottom-right cell is OpenAdapt's unique value: training models to use demonstrations they haven't seen before, combining retrieval with fine-tuning for maximum accuracy. Phase 2 (retrieval-only prompting) is validated; Phase 3 (demo-conditioned fine-tuning) is in progress.

Validated result: On a controlled macOS benchmark (45 System Settings tasks sharing a common navigation entry point), demo-conditioned prompting improved first-action accuracy from 46.7% to 100%. A length-matched control (+11.1 pp only) confirms the benefit is semantic, not token-length. See the research thesis for methodology and the publication roadmap for limitations.

Industry validation: OpenCUA (NeurIPS 2025 Spotlight, XLANG Lab) reused OpenAdapt's macOS accessibility capture code in their AgentNetTool, but uses demos only for model training — not runtime conditioning. No open-source CUA framework currently does demo-conditioned inference, which remains OpenAdapt's architectural differentiator.

Key Concepts

  • Policy/Grounding Separation: The Policy decides what to do; Grounding determines where to do it
  • Safety Gate: Runtime validation layer before action execution (confirm mode for high-risk actions)
  • Abstraction Ladder: Progressive generalization from literal replay to goal-level automation
  • Evaluation-Driven Feedback: Success traces become new training data

Terminology

Term Description
Observation What the agent perceives (screenshot, accessibility tree)
Action What the agent does (click, type, scroll, etc.)
Trajectory Sequence of observation-action pairs
Demonstration Human-provided example trajectory
Policy Decision-making component that maps observations to actions
Grounding Mapping intent to specific UI elements (coordinates)

Demos

Legacy Version (v0.46.0) Examples:

Note: These demos show the legacy monolithic version. For current v1.0+ modular architecture examples, see the documentation.


Permissions

macOS: Grant Accessibility, Screen Recording, and Input Monitoring permissions to your terminal. See permissions guide.

Windows: Run as Administrator if needed for input capture.


Legacy Version

The monolithic OpenAdapt codebase (v0.46.0) is preserved in the legacy/ directory.

To use the legacy version:

pip install openadapt==0.46.0

See docs/LEGACY_FREEZE.md for migration guide and details.


Contributing

  1. Join Discord
  2. Pick an issue from the relevant sub-package repository
  3. Submit a PR

For sub-package development:

git clone https://github.com/OpenAdaptAI/openadapt-ml  # or other sub-package
cd openadapt-ml
pip install -e ".[dev]"

Related Projects


Support


License

MIT License - see LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

openadapt-1.2.1.tar.gz (4.5 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

openadapt-1.2.1-py3-none-any.whl (13.7 kB view details)

Uploaded Python 3

File details

Details for the file openadapt-1.2.1.tar.gz.

File metadata

  • Download URL: openadapt-1.2.1.tar.gz
  • Upload date:
  • Size: 4.5 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.3.2 CPython/3.10.19 Linux/6.14.0-1017-azure

File hashes

Hashes for openadapt-1.2.1.tar.gz
Algorithm Hash digest
SHA256 99f44c109fa6d65b8d557c4629e9fa64647cde6f0f90ddf05b4c2ed32535420b
MD5 973de36a7d60b254892e8702f975cc7a
BLAKE2b-256 3c91889d3716f4e669c6cac6c186fbd5c0cca6f8bf5e3c5d9a69dcc160a8ffc6

See more details on using hashes here.

File details

Details for the file openadapt-1.2.1-py3-none-any.whl.

File metadata

  • Download URL: openadapt-1.2.1-py3-none-any.whl
  • Upload date:
  • Size: 13.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.3.2 CPython/3.10.19 Linux/6.14.0-1017-azure

File hashes

Hashes for openadapt-1.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 75219a604bd2bceaeb579f6028afc51c5b459222b1fb2b8fbead57c4f99851eb
MD5 feacc0423b77261a3da77cdac6f9e9c2
BLAKE2b-256 91897b396b9762d2043d5a7612e382772b341b158bdadb0b53a216efb4c18410

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page