Skip to main content

Model-agnostic, domain-agnostic ML engine for GUI automation agents

Project description

OpenAdapt-ML

Tests PyPI version Downloads Python 3.10+ License: MIT

The ML engine for OpenAdapt -- open-source desktop automation with demo-conditioned AI agents.

OpenAdapt-ML provides the GUI-specific ML layer for training and running vision-language model (VLM) agents that automate desktop tasks. It handles everything between raw screen recordings and a production policy API: canonical schemas for GUI trajectories, VLM adapters, supervised fine-tuning with TRL + Unsloth, grounding, and demo-conditioned inference.

Demos

Synthetic Login -- Qwen3-VL-2B fine-tuned on synthetic UI scenarios:

Login Demo Registration Demo

Key Features

  • GUI trajectory schemas -- Pydantic models for Episodes, Steps, Actions, and Observations with JSON Schema export and format converters (WAA, WebArena)
  • VLM adapters -- Unified interface for Qwen3-VL, Qwen2.5-VL, Claude, GPT, and Gemini with automatic device selection (CUDA / MPS / CPU)
  • Supervised fine-tuning -- TRL SFTTrainer with Unsloth optimizations for 2x faster training and 50% less VRAM via LoRA adapters
  • Runtime policy API -- AgentPolicy that predicts the next GUI action (CLICK, TYPE, DONE) from a screenshot and goal
  • Demo-conditioned inference -- Retrieval-augmented prompting using recorded demonstrations for trajectory-conditioned disambiguation
  • Grounding module -- Locate UI elements via Gemini vision API, oracle bounding boxes, or Set-of-Marks (SoM) overlays
  • Cloud GPU training -- One-command training pipelines for Lambda Labs and Azure
  • Synthetic data generation -- Configurable UI scenarios (login, registration) with layout jitter for rapid iteration

Installation

# Core package
pip install openadapt-ml

# With training dependencies (TRL + datasets)
pip install openadapt-ml[training]

# With API-backed VLMs (Claude, GPT)
pip install openadapt-ml[api]

# Development (from source)
git clone https://github.com/OpenAdaptAI/openadapt-ml.git
cd openadapt-ml
uv sync

Quick Start

Run a smoke test

# Model-free policy demo (no GPU required)
uv run python -m openadapt_ml.scripts.demo_policy --backend dummy

Train on synthetic data

# Fine-tune Qwen3-VL on synthetic login scenario
uv run python -m openadapt_ml.scripts.train \
  --config configs/qwen3vl_synthetic.yaml

Train on real recordings

# Record a workflow with openadapt-capture, then train
uv run python -m openadapt_ml.scripts.train \
  --config configs/qwen3vl_capture.yaml \
  --capture ~/captures/my-workflow \
  --open  # Opens training dashboard in browser

End-to-end benchmark (train + eval + plot)

uv run python -m openadapt_ml.scripts.run_qwen_login_benchmark \
  --config configs/qwen3vl_synthetic_dev.yaml \
  --out-dir experiments/qwen_login/2b_dev

Use the policy API

from openadapt_ml.runtime.policy import AgentPolicy
from openadapt_ml.models.qwen_vl import QwenVLAdapter

adapter = QwenVLAdapter(model_name="Qwen/Qwen3-VL-2B-Instruct")
policy = AgentPolicy(adapter)

# Given an SFT-style sample (screenshot + goal + chat history):
output = policy.predict(sample)
print(output.action)   # Action(type=CLICK, coordinates={"x": 0.45, "y": 0.71})
print(output.thought)  # "Click the Login button"

Use the schema

from openadapt_ml.schema import Episode, Step, Action, Observation, ActionType

episode = Episode(
    episode_id="demo_001",
    instruction="Open Notepad and type Hello World",
    steps=[
        Step(
            step_index=0,
            observation=Observation(screenshot_path="step_0.png"),
            action=Action(type=ActionType.CLICK, coordinates={"x": 100, "y": 200}),
        ),
        Step(
            step_index=1,
            observation=Observation(screenshot_path="step_1.png"),
            action=Action(type=ActionType.TYPE, text="Hello World"),
        ),
    ],
    success=True,
)

Architecture

openadapt_ml/
├── schema/              # Episode, Step, Action, Observation (Pydantic models)
│   ├── episode.py       #   Core dataclasses + JSON Schema export
│   └── converters.py    #   WAA/WebArena format converters
├── models/              # VLM adapters
│   ├── base_adapter.py  #   BaseVLMAdapter ABC
│   ├── qwen_vl.py       #   Qwen3-VL, Qwen2.5-VL
│   ├── api_adapter.py   #   Claude, GPT (inference-only)
│   └── dummy_adapter.py #   Fake adapter for testing
├── training/            # Fine-tuning pipeline
│   ├── trl_trainer.py   #   TRL SFTTrainer + Unsloth
│   ├── trainer.py       #   Training orchestration
│   └── viewer.py        #   Training dashboard (HTML)
├── runtime/             # Inference
│   ├── policy.py        #   AgentPolicy (screenshot -> action)
│   └── safety_gate.py   #   Action safety checks
├── datasets/            # Data loading
│   └── next_action.py   #   Episodes -> SFT chat samples
├── ingest/              # Data ingestion
│   ├── synthetic.py     #   Synthetic UI generation
│   ├── capture.py       #   openadapt-capture loader
│   └── loader.py        #   Generic episode loader
├── grounding/           # UI element localization
│   ├── base.py          #   OracleGrounder, GroundingModule ABC
│   └── detector.py      #   GeminiGrounder, SoM overlays
├── retrieval/           # Demo-conditioned inference
│   ├── retriever.py     #   Demo retrieval for RAG prompting
│   └── embeddings.py    #   Screenshot/action embeddings
├── benchmarks/          # ML-specific benchmark agents
│   └── agent.py         #   PolicyAgent, APIBenchmarkAgent, UnifiedBaselineAgent
├── cloud/               # Cloud GPU training
│   ├── lambda_labs.py   #   Lambda Labs integration
│   ├── local.py         #   Local training (CUDA/MPS)
│   └── ssh_tunnel.py    #   SSH tunnel management
├── segmentation/        # Recording segmentation pipeline
├── evals/               # Evaluation metrics (grounding, trajectory matching)
├── config.py            # Settings via pydantic-settings
└── scripts/             # CLI entry points (train, eval, compare, demo)

Benchmark Results

Synthetic Login (Qwen3-VL-2B with Set-of-Marks)

Metric Score
Action Type Accuracy 100%
Element Accuracy 100%
Episode Success Rate 100%

Multi-Model Comparison (Synthetic Login, coordinate mode)

Model Action Accuracy Coord Error Click Hit Rate
Qwen3-VL-2B FT 0.469 0.051 0.850
Qwen3-VL-8B FT 0.286 0.004 1.000
Claude Sonnet 4.5 0.121 0.757 0.000
GPT-5.1 0.183 0.057 0.600

These are results on a controlled synthetic benchmark with ~3 UI elements. They validate that the training pipeline works, not real-world performance. Evaluation on standard benchmarks (WAA, WebArena) is ongoing via openadapt-evals.

Cloud GPU Training

Lambda Labs

export LAMBDA_API_KEY=your_key_here

# One-command: launch, train, download, terminate
uv run python -m openadapt_ml.cloud.lambda_labs train \
  --capture ~/captures/my-workflow \
  --goal "Turn off Night Shift in System Settings"

Local (CUDA / Apple Silicon)

uv run python -m openadapt_ml.cloud.local train \
  --capture ~/captures/my-workflow --open

Ecosystem

OpenAdapt-ML is one component in the OpenAdapt stack:

Package Purpose
openadapt-ml ML engine: schemas, VLM adapters, training, inference, grounding
openadapt-evals Evaluation infrastructure: VM management, pool orchestration, benchmark runners, oa-vm CLI
openadapt-capture Lightweight GUI recording and demo sharing
OpenAdapt Desktop automation platform (end-user application)

Looking for benchmark evaluation, Azure VM management, or the oa-vm CLI? Those live in openadapt-evals.

Documentation

Contributing

# Clone and install dev dependencies
git clone https://github.com/OpenAdaptAI/openadapt-ml.git
cd openadapt-ml
uv sync --extra dev --extra training

# Run tests
uv run pytest

# Lint
uv run ruff check .

We use Angular-style commits (feat:, fix:, docs:, etc.) with Python Semantic Release for automated versioning and PyPI publishing.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

openadapt_ml-0.7.0.tar.gz (5.5 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

openadapt_ml-0.7.0-py3-none-any.whl (447.6 kB view details)

Uploaded Python 3

File details

Details for the file openadapt_ml-0.7.0.tar.gz.

File metadata

  • Download URL: openadapt_ml-0.7.0.tar.gz
  • Upload date:
  • Size: 5.5 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for openadapt_ml-0.7.0.tar.gz
Algorithm Hash digest
SHA256 3135d0b7c4c947c1338dd6532f48fe3c8f53b882692a8e3f168c3c51c4df7428
MD5 5829bd48a59b534f994ccedb93d799c3
BLAKE2b-256 745a334bd41b5f7bb8a0795811bad074b24eb4b91696232dff139b9ee65b4c22

See more details on using hashes here.

Provenance

The following attestation bundles were made for openadapt_ml-0.7.0.tar.gz:

Publisher: release.yml on OpenAdaptAI/openadapt-ml

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file openadapt_ml-0.7.0-py3-none-any.whl.

File metadata

  • Download URL: openadapt_ml-0.7.0-py3-none-any.whl
  • Upload date:
  • Size: 447.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for openadapt_ml-0.7.0-py3-none-any.whl
Algorithm Hash digest
SHA256 52ebf7e10e79539f26b3d8d41b6fa06858414ac244e6de5662765a3aa25239eb
MD5 86e26e592d7f940b86239e6d6780b17c
BLAKE2b-256 8ad31c490cc6be6f874d07d7cd29e58833c62a066af96f54a0b8308306388a88

See more details on using hashes here.

Provenance

The following attestation bundles were made for openadapt_ml-0.7.0-py3-none-any.whl:

Publisher: release.yml on OpenAdaptAI/openadapt-ml

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page