Skip to main content

GUI automation with ML - record, train, deploy, evaluate

Project description

OpenAdapt: AI-First Process Automation with Large Multimodal Models (LMMs)

Build Status PyPI version Downloads License: MIT Python 3.10+

OpenAdapt is the open source software adapter between Large Multimodal Models (LMMs) and traditional desktop and web GUIs.

Record GUI demonstrations, train ML models, and evaluate agents - all from a unified CLI.

Join us on Discord | Documentation | OpenAdapt.ai


Architecture

OpenAdapt v1.0+ uses a modular meta-package architecture. The main openadapt package provides a unified CLI and depends on focused sub-packages via PyPI:

Package Description Repository
openadapt Meta-package with unified CLI This repo
openadapt-capture Event recording and storage openadapt-capture
openadapt-ml ML engine, training, inference openadapt-ml
openadapt-evals Benchmark evaluation openadapt-evals
openadapt-viewer HTML visualization openadapt-viewer
openadapt-grounding UI element localization openadapt-grounding
openadapt-retrieval Multimodal demo retrieval openadapt-retrieval
openadapt-privacy PII/PHI scrubbing openadapt-privacy

Installation

Install what you need:

pip install openadapt              # Minimal CLI only
pip install openadapt[capture]     # GUI capture/recording
pip install openadapt[ml]          # ML training and inference
pip install openadapt[evals]       # Benchmark evaluation
pip install openadapt[privacy]     # PII/PHI scrubbing
pip install openadapt[all]         # Everything

Requirements: Python 3.10+


Quick Start

1. Record a demonstration

openadapt capture start --name my-task
# Perform actions in your GUI, then press Ctrl+C to stop

2. Train a model

openadapt train start --capture my-task --model qwen3vl-2b

3. Evaluate

openadapt eval run --checkpoint training_output/model.pt --benchmark waa

4. View recordings

openadapt capture view my-task

CLI Reference

openadapt capture start --name <name>    Start recording
openadapt capture stop                    Stop recording
openadapt capture list                    List captures
openadapt capture view <name>             Open capture viewer

openadapt train start --capture <name>    Train model on capture
openadapt train status                    Check training progress
openadapt train stop                      Stop training

openadapt eval run --checkpoint <path>    Evaluate trained model
openadapt eval run --agent api-claude     Evaluate API agent
openadapt eval mock --tasks 10            Run mock evaluation

openadapt serve --port 8080               Start dashboard server
openadapt version                         Show installed versions
openadapt doctor                          Check system requirements

How It Works

See the full Architecture Evolution for detailed documentation.

Three-Phase Pipeline

flowchart TB
    %% ═══════════════════════════════════════════════════════════════════════
    %% DATA SOURCES (Multi-Source Ingestion)
    %% ═══════════════════════════════════════════════════════════════════════
    subgraph DataSources["Data Sources"]
        direction LR
        HUMAN["Human Demos"]
        SYNTH["Synthetic Data"]:::future
        BENCH_DATA["Benchmark Tasks"]
    end

    %% ═══════════════════════════════════════════════════════════════════════
    %% PHASE 1: DEMONSTRATE (Observation Collection)
    %% ═══════════════════════════════════════════════════════════════════════
    subgraph Demonstrate["1. DEMONSTRATE (Observation Collection)"]
        direction TB
        CAP["Capture<br/>openadapt-capture"]
        PRIV["Privacy<br/>openadapt-privacy"]
        STORE[("Demo Library")]

        CAP --> PRIV
        PRIV --> STORE
    end

    %% ═══════════════════════════════════════════════════════════════════════
    %% PHASE 2: LEARN (Policy Acquisition)
    %% ═══════════════════════════════════════════════════════════════════════
    subgraph Learn["2. LEARN (Policy Acquisition)"]
        direction TB

        subgraph RetrievalPath["Retrieval Path"]
            EMB["Embed"]
            IDX["Index"]
            SEARCH["Search"]
            EMB --> IDX --> SEARCH
        end

        subgraph TrainingPath["Training Path"]
            LOADER["Load"]
            TRAIN["Train"]
            CKPT[("Checkpoint")]
            LOADER --> TRAIN --> CKPT
        end

        subgraph ProcessMining["Process Mining"]
            ABSTRACT["Abstract"]:::future
            PATTERNS["Patterns"]:::future
            ABSTRACT --> PATTERNS
        end
    end

    %% ═══════════════════════════════════════════════════════════════════════
    %% PHASE 3: EXECUTE (Agent Deployment)
    %% ═══════════════════════════════════════════════════════════════════════
    subgraph Execute["3. EXECUTE (Agent Deployment)"]
        direction TB

        subgraph AgentCore["Agent Core"]
            OBS["Observe"]
            POLICY["Policy<br/>(Demo-Conditioned)"]
            GROUND["Grounding<br/>openadapt-grounding"]
            ACT["Act"]

            OBS --> POLICY
            POLICY --> GROUND
            GROUND --> ACT
        end

        subgraph SafetyGate["Safety Gate"]
            VALIDATE["Validate"]
            CONFIRM["Confirm"]:::future
            VALIDATE --> CONFIRM
        end

        subgraph Evaluation["Evaluation"]
            EVALS["Evals<br/>openadapt-evals"]
            METRICS["Metrics"]
            EVALS --> METRICS
        end

        ACT --> VALIDATE
        VALIDATE --> EVALS
    end

    %% ═══════════════════════════════════════════════════════════════════════
    %% THE ABSTRACTION LADDER (Side Panel)
    %% ═══════════════════════════════════════════════════════════════════════
    subgraph AbstractionLadder["Abstraction Ladder"]
        direction TB
        L0["Literal<br/>(Raw Events)"]
        L1["Symbolic<br/>(Semantic Actions)"]
        L2["Template<br/>(Parameterized)"]
        L3["Semantic<br/>(Intent)"]:::future
        L4["Goal<br/>(Task Spec)"]:::future

        L0 --> L1
        L1 --> L2
        L2 -.-> L3
        L3 -.-> L4
    end

    %% ═══════════════════════════════════════════════════════════════════════
    %% MODEL LAYER
    %% ═══════════════════════════════════════════════════════════════════════
    subgraph Models["Model Layer (VLMs)"]
        direction TB
        subgraph APIModels["API Models"]
            direction LR
            CLAUDE["Claude"]
            GPT["GPT-4o"]
            GEMINI["Gemini"]
        end
        subgraph OpenSource["Open Source / Fine-tuned"]
            direction LR
            QWEN3["Qwen3-VL"]
            UITARS["UI-TARS"]
            OPENCUA["OpenCUA"]
        end
    end

    %% ═══════════════════════════════════════════════════════════════════════
    %% MAIN DATA FLOW
    %% ═══════════════════════════════════════════════════════════════════════

    %% Data sources feed into phases
    HUMAN --> CAP
    SYNTH -.-> LOADER
    BENCH_DATA --> EVALS

    %% Demo library feeds learning
    STORE --> EMB
    STORE --> LOADER
    STORE -.-> ABSTRACT

    %% Learning outputs feed execution
    SEARCH -->|"demo context"| POLICY
    CKPT -->|"trained policy"| POLICY
    PATTERNS -.->|"templates"| POLICY

    %% Model connections
    POLICY --> Models
    GROUND --> Models

    %% ═══════════════════════════════════════════════════════════════════════
    %% FEEDBACK LOOPS (Evaluation-Driven)
    %% ═══════════════════════════════════════════════════════════════════════
    METRICS -->|"success traces"| STORE
    METRICS -.->|"training signal"| TRAIN

    %% Retrieval in BOTH training AND evaluation
    SEARCH -->|"eval conditioning"| EVALS

    %% ═══════════════════════════════════════════════════════════════════════
    %% STYLING
    %% ═══════════════════════════════════════════════════════════════════════

    %% Phase colors
    classDef phase1 fill:#3498DB,stroke:#1A5276,color:#fff
    classDef phase2 fill:#27AE60,stroke:#1E8449,color:#fff
    classDef phase3 fill:#9B59B6,stroke:#6C3483,color:#fff

    %% Component states
    classDef implemented fill:#2ECC71,stroke:#1E8449,color:#fff
    classDef future fill:#95A5A6,stroke:#707B7C,color:#fff,stroke-dasharray: 5 5
    classDef futureBlock fill:#f5f5f5,stroke:#95A5A6,stroke-dasharray: 5 5
    classDef safetyBlock fill:#E74C3C,stroke:#A93226,color:#fff

    %% Model layer
    classDef models fill:#F39C12,stroke:#B7950B,color:#fff

    %% Apply styles
    class CAP,PRIV,STORE phase1
    class EMB,IDX,SEARCH,LOADER,TRAIN,CKPT phase2
    class OBS,POLICY,GROUND,ACT,VALIDATE,EVALS,METRICS phase3
    class CLAUDE,GPT,GEMINI,QWEN models
    class L0,L1,L2 implemented

Core Approach: Demo-Conditioned Prompting

OpenAdapt explores demonstration-conditioned automation - "show, don't tell":

Traditional Agent OpenAdapt Agent
User writes prompts User records demonstration
Ambiguous instructions Grounded in actual UI
Requires prompt engineering Reduced prompt engineering
Context-free Context from similar demos

Retrieval powers BOTH training AND evaluation: Similar demonstrations are retrieved as context for the VLM. In early experiments on a controlled macOS benchmark, this improved first-action accuracy from 46.7% to 100% - though all 45 tasks in that benchmark share the same navigation entry point. See the publication roadmap for methodology and limitations.

Key Concepts

  • Policy/Grounding Separation: The Policy decides what to do; Grounding determines where to do it
  • Safety Gate: Runtime validation layer before action execution (confirm mode for high-risk actions)
  • Abstraction Ladder: Progressive generalization from literal replay to goal-level automation
  • Evaluation-Driven Feedback: Success traces become new training data

Legend: Solid = Implemented | Dashed = Future


Terminology

Term Description
Observation What the agent perceives (screenshot, accessibility tree)
Action What the agent does (click, type, scroll, etc.)
Trajectory Sequence of observation-action pairs
Demonstration Human-provided example trajectory
Policy Decision-making component that maps observations to actions
Grounding Mapping intent to specific UI elements (coordinates)

Demos


Permissions

macOS: Grant Accessibility, Screen Recording, and Input Monitoring permissions to your terminal. See permissions guide.

Windows: Run as Administrator if needed for input capture.


Legacy Version

The monolithic OpenAdapt codebase (v0.46.0) is preserved in the legacy/ directory.

To use the legacy version:

pip install openadapt==0.46.0

See docs/LEGACY_FREEZE.md for migration guide and details.


Contributing

  1. Join Discord
  2. Pick an issue from the relevant sub-package repository
  3. Submit a PR

For sub-package development:

git clone https://github.com/OpenAdaptAI/openadapt-ml  # or other sub-package
cd openadapt-ml
pip install -e ".[dev]"

Related Projects


Support


License

MIT License - see LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

openadapt-1.0.6.tar.gz (4.2 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

openadapt-1.0.6-py3-none-any.whl (14.1 kB view details)

Uploaded Python 3

File details

Details for the file openadapt-1.0.6.tar.gz.

File metadata

  • Download URL: openadapt-1.0.6.tar.gz
  • Upload date:
  • Size: 4.2 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.3.2 CPython/3.10.19 Linux/6.14.0-1017-azure

File hashes

Hashes for openadapt-1.0.6.tar.gz
Algorithm Hash digest
SHA256 b1b800f5b454a466948b25869202107826f6fef27b247d83ef52b0559c8b00a9
MD5 e40e5939dbc7f2c875ebfa20bc244523
BLAKE2b-256 ced63f834f0ac847d21ab7e4a87f6d737a2b52798e9d5a3d5a3a932196cf18ad

See more details on using hashes here.

File details

Details for the file openadapt-1.0.6-py3-none-any.whl.

File metadata

  • Download URL: openadapt-1.0.6-py3-none-any.whl
  • Upload date:
  • Size: 14.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.3.2 CPython/3.10.19 Linux/6.14.0-1017-azure

File hashes

Hashes for openadapt-1.0.6-py3-none-any.whl
Algorithm Hash digest
SHA256 0c0b6e9ddc0407d73729f95b6ca1a2b8337386d7a73a180ba90d3b0ff4900b3b
MD5 d0eed7f1741441fe715e38ecdc977551
BLAKE2b-256 a0136d7d83606d3d4c674036584c60496197970fa16a6abceb7fbfa242ffed61

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page