GUI automation with ML - record, train, deploy, evaluate

These details have not been verified by PyPI

Project links

Project description

OpenAdapt: AI-First Process Automation with Large Multimodal Models (LMMs)

OpenAdapt is the open source software adapter between Large Multimodal Models (LMMs) and traditional desktop and web GUIs.

Record GUI demonstrations, train ML models, and evaluate agents - all from a unified CLI.

Join us on Discord | Documentation | OpenAdapt.ai

Architecture

OpenAdapt v1.0+ uses a modular meta-package architecture. The main openadapt package provides a unified CLI and depends on focused sub-packages via PyPI:

Package	Description	Repository
`openadapt`	Meta-package with unified CLI	This repo
`openadapt-capture`	Event recording and storage	openadapt-capture
`openadapt-ml`	ML engine, training, inference	openadapt-ml
`openadapt-evals`	Benchmark evaluation	openadapt-evals
`openadapt-viewer`	HTML visualization	openadapt-viewer
`openadapt-grounding`	UI element localization	openadapt-grounding
`openadapt-retrieval`	Multimodal demo retrieval	openadapt-retrieval
`openadapt-privacy`	PII/PHI scrubbing	openadapt-privacy
`openadapt-wright`	Dev automation	openadapt-wright
`openadapt-herald`	Social media from git history	openadapt-herald
`openadapt-crier`	Telegram approval bot	openadapt-crier
`openadapt-consilium`	Multi-model consensus	openadapt-consilium
`openadapt-desktop`	Desktop GUI application	openadapt-desktop
`openadapt-tray`	System tray app	openadapt-tray
`openadapt-agent`	Production execution engine	openadapt-agent
`openadapt-telemetry`	Error tracking	openadapt-telemetry

Installation

Install what you need:

pip install openadapt              # Minimal CLI only
pip install openadapt[capture]     # GUI capture/recording
pip install openadapt[ml]          # ML training and inference
pip install openadapt[evals]       # Benchmark evaluation
pip install openadapt[privacy]     # PII/PHI scrubbing
pip install openadapt[all]         # Everything

Requirements: Python 3.10+

Quick Start

1. Record a demonstration

openadapt capture start --name my-task
# Perform actions in your GUI, then press Ctrl+C to stop

2. Train a model

openadapt train start --capture my-task --model qwen3vl-2b

3. Evaluate

openadapt eval run --checkpoint training_output/model.pt --benchmark waa

4. View recordings

openadapt capture view my-task

Ecosystem

Core Platform Components

Package	Description	Repository
`openadapt`	Meta-package with unified CLI	This repo
`openadapt-capture`	Event recording and storage	openadapt-capture
`openadapt-ml`	ML engine, training, inference	openadapt-ml
`openadapt-evals`	Benchmark evaluation	openadapt-evals
`openadapt-viewer`	HTML visualization	openadapt-viewer
`openadapt-grounding`	UI element localization	openadapt-grounding
`openadapt-retrieval`	Multimodal demo retrieval	openadapt-retrieval
`openadapt-privacy`	PII/PHI scrubbing	openadapt-privacy

Applications and Tools

Package	Description	Repository
`openadapt-desktop`	Desktop GUI application	openadapt-desktop
`openadapt-tray`	System tray app	openadapt-tray
`openadapt-agent`	Production execution engine	openadapt-agent
`openadapt-wright`	Dev automation	openadapt-wright
`openadapt-herald`	Social media from git history	openadapt-herald
`openadapt-crier`	Telegram approval bot	openadapt-crier
`openadapt-consilium`	Multi-model consensus	openadapt-consilium
`openadapt-telemetry`	Error tracking	openadapt-telemetry

CLI Reference

openadapt capture start --name <name>    Start recording
openadapt capture stop                    Stop recording
openadapt capture list                    List captures
openadapt capture view <name>             Open capture viewer

openadapt train start --capture <name>    Train model on capture
openadapt train status                    Check training progress
openadapt train stop                      Stop training

openadapt eval run --checkpoint <path>    Evaluate trained model
openadapt eval run --agent api-claude     Evaluate API agent
openadapt eval mock --tasks 10            Run mock evaluation

openadapt serve --port 8080               Start dashboard server
openadapt version                         Show installed versions
openadapt doctor                          Check system requirements

How It Works

See the full Architecture Evolution for detailed documentation.

Three-Phase Pipeline

OpenAdapt follows a streamlined Demonstrate → Learn → Execute pipeline:

1. DEMONSTRATE (Observation Collection)

Capture: Record user actions and screenshots with openadapt-capture
Privacy: Scrub PII/PHI from recordings with openadapt-privacy
Store: Build a searchable demonstration library

2. LEARN (Policy Acquisition)

Retrieval Path: Embed demonstrations, index them, and enable semantic search
Training Path: Load demonstrations and fine-tune Vision-Language Models (VLMs)
Abstraction: Progress from literal replay to template-based automation

3. EXECUTE (Agent Deployment)

Observe: Take screenshots and gather accessibility information
Policy: Use demonstration context to decide actions via VLMs (Claude, GPT-4o, Qwen3-VL)
Ground: Map intentions to specific UI coordinates with openadapt-grounding
Act: Execute validated actions with safety gates
Evaluate: Measure success with openadapt-evals and feed results back for improvement

Core Approach: Trajectory-Conditioned Disambiguation

Zero-shot VLMs fail on GUI tasks not due to lack of capability, but due to ambiguity in UI affordances. OpenAdapt resolves this by conditioning agents on human demonstrations — "show, don't tell."

	No Retrieval	With Retrieval
No Fine-tuning	46.7% (zero-shot baseline)	100% (validated, n=45)
Fine-tuning	Standard SFT (baseline)	Demo-conditioned FT (planned)

The bottom-right cell is OpenAdapt's unique value: training models to use demonstrations they haven't seen before, combining retrieval with fine-tuning for maximum accuracy. Phase 2 (retrieval-only prompting) is validated; Phase 3 (demo-conditioned fine-tuning) is in progress.

Validated result: On a controlled macOS benchmark (45 System Settings tasks sharing a common navigation entry point), demo-conditioned prompting improved first-action accuracy from 46.7% to 100%. A length-matched control (+11.1 pp only) confirms the benefit is semantic, not token-length. See the research thesis for methodology and the publication roadmap for limitations.

Industry validation: OpenCUA (NeurIPS 2025 Spotlight, XLANG Lab) reused OpenAdapt's macOS accessibility capture code in their AgentNetTool, but uses demos only for model training — not runtime conditioning. No open-source CUA framework currently does demo-conditioned inference, which remains OpenAdapt's architectural differentiator.

Key Concepts

Policy/Grounding Separation: The Policy decides what to do; Grounding determines where to do it
Safety Gate: Runtime validation layer before action execution (confirm mode for high-risk actions)
Abstraction Ladder: Progressive generalization from literal replay to goal-level automation
Evaluation-Driven Feedback: Success traces become new training data

Terminology

Term	Description
Observation	What the agent perceives (screenshot, accessibility tree)
Action	What the agent does (click, type, scroll, etc.)
Trajectory	Sequence of observation-action pairs
Demonstration	Human-provided example trajectory
Policy	Decision-making component that maps observations to actions
Grounding	Mapping intent to specific UI elements (coordinates)

Demos

Legacy Version (v0.46.0) Examples:

Twitter Demo - Early OpenAdapt demonstration
Loom Video - Process automation walkthrough

Note: These demos show the legacy monolithic version. For current v1.0+ modular architecture examples, see the documentation.

Permissions

macOS: Grant Accessibility, Screen Recording, and Input Monitoring permissions to your terminal. See permissions guide.

Windows: Run as Administrator if needed for input capture.

Legacy Version

The monolithic OpenAdapt codebase (v0.46.0) is preserved in the legacy/ directory.

To use the legacy version:

pip install openadapt==0.46.0

See docs/LEGACY_FREEZE.md for migration guide and details.

Contributing

Join Discord
Pick an issue from the relevant sub-package repository
Submit a PR

For sub-package development:

git clone https://github.com/OpenAdaptAI/openadapt-ml  # or other sub-package
cd openadapt-ml
pip install -e ".[dev]"

Related Projects

OpenAdaptAI/SoM - Set-of-Mark prompting
OpenAdaptAI/pynput - Input monitoring fork
OpenAdaptAI/atomacos - macOS accessibility

Support

Discord: https://discord.gg/yF527cQbDG
Issues: Use the relevant sub-package repository
Architecture docs: GitHub Wiki

License

MIT License - see LICENSE for details.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

1.2.2

Mar 4, 2026

This version

1.2.1

Mar 4, 2026

1.2.0

Mar 4, 2026

1.1.0

Mar 3, 2026

1.0.6

Feb 18, 2026

1.0.5

Feb 13, 2026

1.0.0

Jan 17, 2026

0.46.0

Feb 20, 2025

0.45.0

Jan 2, 2025

0.44.0

Jan 2, 2025

0.43.1

Dec 30, 2024

0.43.0

Dec 30, 2024

0.42.5

Dec 9, 2024

0.42.4

Nov 15, 2024

0.42.3

Nov 8, 2024

0.42.2

Nov 6, 2024

0.42.1

Oct 30, 2024

0.42.0

Oct 25, 2024

0.41.0

Oct 24, 2024

0.40.0

Sep 11, 2024

0.39.3

Aug 7, 2024

0.39.2

Aug 6, 2024

0.39.1

Jul 25, 2024

0.26.2

Jun 3, 2024

0.26.1

May 28, 2024

0.26.0

May 27, 2024

0.25.5

May 24, 2024

0.25.4

May 22, 2024

0.25.3

May 22, 2024

0.25.2

May 19, 2024

0.25.1

May 17, 2024

0.25.0

May 14, 2024

0.24.1

May 14, 2024

0.24.0

May 12, 2024

0.23.1

May 11, 2024

0.23.0

May 11, 2024

0.21.0

May 10, 2024

0.20.1

May 3, 2024

0.20.0

May 2, 2024

0.19.0

Apr 27, 2024

0.18.3

Apr 22, 2024

0.18.2

Apr 22, 2024

0.18.1

Apr 22, 2024

0.18.0

Apr 16, 2024

0.17.1

Apr 15, 2024

0.17.0

Apr 11, 2024

0.16.2

Mar 8, 2024

0.16.1

Feb 29, 2024

0.16.0

Feb 29, 2024

0.15.1

Dec 12, 2023

0.15.0

Nov 11, 2023

0.14.0

Nov 10, 2023

0.13.2

Oct 16, 2023

0.13.1

Aug 30, 2023

0.13.0

Aug 29, 2023

0.12.0

Aug 29, 2023

0.11.0

Aug 29, 2023

0.10.0

Aug 29, 2023

0.9.0

Aug 28, 2023

0.8.1

Aug 21, 2023

0.8.0

Aug 10, 2023

0.7.1

Aug 10, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

openadapt-1.2.1.tar.gz (4.5 MB view details)

Uploaded Mar 4, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

openadapt-1.2.1-py3-none-any.whl (13.7 kB view details)

Uploaded Mar 4, 2026 Python 3

File details

Details for the file openadapt-1.2.1.tar.gz.

File metadata

Download URL: openadapt-1.2.1.tar.gz
Upload date: Mar 4, 2026
Size: 4.5 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: poetry/2.3.2 CPython/3.10.19 Linux/6.14.0-1017-azure

File hashes

Hashes for openadapt-1.2.1.tar.gz
Algorithm	Hash digest
SHA256	`99f44c109fa6d65b8d557c4629e9fa64647cde6f0f90ddf05b4c2ed32535420b`
MD5	`973de36a7d60b254892e8702f975cc7a`
BLAKE2b-256	`3c91889d3716f4e669c6cac6c186fbd5c0cca6f8bf5e3c5d9a69dcc160a8ffc6`

See more details on using hashes here.

File details

Details for the file openadapt-1.2.1-py3-none-any.whl.

File metadata

Download URL: openadapt-1.2.1-py3-none-any.whl
Upload date: Mar 4, 2026
Size: 13.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: poetry/2.3.2 CPython/3.10.19 Linux/6.14.0-1017-azure

File hashes

Hashes for openadapt-1.2.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`75219a604bd2bceaeb579f6028afc51c5b459222b1fb2b8fbead57c4f99851eb`
MD5	`feacc0423b77261a3da77cdac6f9e9c2`
BLAKE2b-256	`91897b396b9762d2043d5a7612e382772b341b158bdadb0b53a216efb4c18410`

See more details on using hashes here.

openadapt 1.2.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

OpenAdapt: AI-First Process Automation with Large Multimodal Models (LMMs)

Architecture

Installation

Quick Start

1. Record a demonstration

2. Train a model

3. Evaluate

4. View recordings

Ecosystem

Core Platform Components

Applications and Tools

CLI Reference

How It Works

Three-Phase Pipeline

Core Approach: Trajectory-Conditioned Disambiguation

Key Concepts

Terminology

Demos

Permissions

Legacy Version

Contributing

Related Projects

Support

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes