Skip to main content

GUI automation with ML - record, train, deploy, evaluate

Project description

OpenAdapt: AI-First Process Automation with Large Multimodal Models (LMMs)

Build Status PyPI version Downloads License: MIT Python 3.10+

OpenAdapt is the open source software adapter between Large Multimodal Models (LMMs) and traditional desktop and web GUIs.

Record GUI demonstrations, train ML models, and evaluate agents - all from a unified CLI.

Join us on Discord | Documentation | OpenAdapt.ai


Architecture

OpenAdapt v1.0+ uses a modular meta-package architecture. The main openadapt package provides a unified CLI and depends on focused sub-packages via PyPI:

Package Description Repository
openadapt Meta-package with unified CLI This repo
openadapt-capture Event recording and storage openadapt-capture
openadapt-ml ML engine, training, inference openadapt-ml
openadapt-evals Benchmark evaluation openadapt-evals
openadapt-viewer HTML visualization openadapt-viewer
openadapt-grounding UI element localization openadapt-grounding
openadapt-retrieval Multimodal demo retrieval openadapt-retrieval
openadapt-privacy PII/PHI scrubbing openadapt-privacy

Installation

Install what you need:

pip install openadapt              # Minimal CLI only
pip install openadapt[capture]     # GUI capture/recording
pip install openadapt[ml]          # ML training and inference
pip install openadapt[evals]       # Benchmark evaluation
pip install openadapt[privacy]     # PII/PHI scrubbing
pip install openadapt[all]         # Everything

Requirements: Python 3.10+


Quick Start

1. Record a demonstration

openadapt capture start --name my-task
# Perform actions in your GUI, then press Ctrl+C to stop

2. Train a model

openadapt train start --capture my-task --model qwen3vl-2b

3. Evaluate

openadapt eval run --checkpoint training_output/model.pt --benchmark waa

4. View recordings

openadapt capture view my-task

CLI Reference

openadapt capture start --name <name>    Start recording
openadapt capture stop                    Stop recording
openadapt capture list                    List captures
openadapt capture view <name>             Open capture viewer

openadapt train start --capture <name>    Train model on capture
openadapt train status                    Check training progress
openadapt train stop                      Stop training

openadapt eval run --checkpoint <path>    Evaluate trained model
openadapt eval run --agent api-claude     Evaluate API agent
openadapt eval mock --tasks 10            Run mock evaluation

openadapt serve --port 8080               Start dashboard server
openadapt version                         Show installed versions
openadapt doctor                          Check system requirements

How It Works

See the full Architecture Documentation for detailed diagrams.

flowchart TB
    %% Main workflow phases
    subgraph Record["1. RECORD"]
        direction TB
        DEMO[User Demo] --> CAPTURE[openadapt-capture]
    end

    subgraph Train["2. TRAIN"]
        direction TB
        DATA[Captured Data] --> ML[openadapt-ml]
    end

    subgraph Deploy["3. DEPLOY"]
        direction TB
        MODEL[Trained Model] --> AGENT[Agent Policy]
        AGENT --> REPLAY[Action Replay]
    end

    subgraph Evaluate["4. EVALUATE"]
        direction TB
        BENCH[Benchmarks] --> EVALS[openadapt-evals]
        EVALS --> METRICS[Metrics]
    end

    %% Main flow connections
    CAPTURE --> DATA
    ML --> MODEL
    AGENT --> BENCH

    %% Viewer - independent component
    VIEWER[openadapt-viewer]
    VIEWER -.->|"view at any phase"| Record
    VIEWER -.->|"view at any phase"| Train
    VIEWER -.->|"view at any phase"| Deploy
    VIEWER -.->|"view at any phase"| Evaluate

    %% Optional packages with integration points
    PRIVACY[openadapt-privacy]
    RETRIEVAL[openadapt-retrieval]
    GROUNDING[openadapt-grounding]

    PRIVACY -.->|"PII/PHI scrubbing"| CAPTURE
    RETRIEVAL -.->|"demo retrieval"| ML
    GROUNDING -.->|"UI localization"| REPLAY

    %% Styling
    classDef corePhase fill:#e1f5fe,stroke:#01579b
    classDef optionalPkg fill:#fff3e0,stroke:#e65100,stroke-dasharray: 5 5
    classDef viewerPkg fill:#e8f5e9,stroke:#2e7d32,stroke-dasharray: 3 3

    class Record,Train,Deploy,Evaluate corePhase
    class PRIVACY,RETRIEVAL,GROUNDING optionalPkg
    class VIEWER viewerPkg

OpenAdapt:

  • Records screenshots and user input events
  • Trains ML models on demonstrations
  • Generates and replays synthetic input via model completions
  • Evaluates agents on GUI automation benchmarks

Key differentiators:

  1. Model agnostic - works with any LMM
  2. Auto-prompted from human demonstration (not user-prompted)
  3. Works with all desktop GUIs including virtualized and web
  4. Open source (MIT license)

Key Concepts

Meta-Package Structure

OpenAdapt v1.0+ uses a modular architecture where the main openadapt package acts as a meta-package that coordinates focused sub-packages:

  • Core Packages: Essential for the main workflow

    • openadapt-capture - Records screenshots and input events
    • openadapt-ml - Trains models on demonstrations
    • openadapt-evals - Evaluates agents on benchmarks
  • Optional Packages: Enhance specific workflow phases

    • openadapt-privacy - Integrates at Record phase for PII/PHI scrubbing
    • openadapt-retrieval - Integrates at Train phase for multimodal demo retrieval
    • openadapt-grounding - Integrates at Deploy phase for UI element localization
  • Independent Components:

    • openadapt-viewer - HTML visualization that works with any phase

Two Paths to Automation

  1. Custom Training Path: Record demonstrations -> Train your own model -> Deploy agent

    • Best for: Repetitive tasks specific to your workflow
    • Requires: openadapt[core]
  2. API Agent Path: Use pre-trained LMM APIs (Claude, GPT-4V, etc.) -> Evaluate on benchmarks

    • Best for: General-purpose automation, rapid prototyping
    • Requires: openadapt[evals]

Installation Paths

Choose your installation based on your use case:

What do you want to do?
|
+-- Just evaluate API agents on benchmarks?
|   +-- pip install openadapt[evals]
|
+-- Train custom models on your demonstrations?
|   +-- pip install openadapt[core]
|
+-- Full suite with all optional packages?
|   +-- pip install openadapt[all]
|
+-- Minimal CLI only (add packages later)?
    +-- pip install openadapt
Installation Included Packages Use Case
openadapt CLI only Start minimal, add what you need
openadapt[evals] + evals Benchmark API agents (Claude, GPT-4V)
openadapt[core] + capture, ml, viewer Full training workflow
openadapt[all] + privacy, retrieval, grounding Everything including optional enhancements

Demos


Permissions

macOS: Grant Accessibility, Screen Recording, and Input Monitoring permissions to your terminal. See permissions guide.

Windows: Run as Administrator if needed for input capture.


Legacy Version

The monolithic OpenAdapt codebase (v0.46.0) is preserved in the legacy/ directory.

To use the legacy version:

pip install openadapt==0.46.0

See docs/LEGACY_FREEZE.md for migration guide and details.


Contributing

  1. Join Discord
  2. Pick an issue from the relevant sub-package repository
  3. Submit a PR

For sub-package development:

git clone https://github.com/OpenAdaptAI/openadapt-ml  # or other sub-package
cd openadapt-ml
pip install -e ".[dev]"

Related Projects


Support


License

MIT License - see LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

openadapt-1.0.0.tar.gz (4.1 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

openadapt-1.0.0-py3-none-any.whl (13.0 kB view details)

Uploaded Python 3

File details

Details for the file openadapt-1.0.0.tar.gz.

File metadata

  • Download URL: openadapt-1.0.0.tar.gz
  • Upload date:
  • Size: 4.1 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.2.1 CPython/3.10.19 Linux/6.11.0-1018-azure

File hashes

Hashes for openadapt-1.0.0.tar.gz
Algorithm Hash digest
SHA256 d4cf56316699a262c8e3b10d151c7cc7edb01c1d341863ab6563820e643e5197
MD5 ea2fcf7b7f978762788e498e346b7062
BLAKE2b-256 a6e831bd3f23a74910710f7d1965454169bf1a458b88aed957086cae63d4ae48

See more details on using hashes here.

File details

Details for the file openadapt-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: openadapt-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 13.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.2.1 CPython/3.10.19 Linux/6.11.0-1018-azure

File hashes

Hashes for openadapt-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 d052967d7c027821776b192b3c5759ba6579aa48dcc9fce9387ad63257885a6a
MD5 5b0f44db335d4c0ff05dc7a661b30117
BLAKE2b-256 4be51fd2542a0c4e630a1e3aa57fb1ed73d5ca1043ecc71681c53fde7e88e90e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page