Model-agnostic, domain-agnostic ML engine for GUI automation agents

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

abrichr

These details have not been verified by PyPI

Project description

OpenAdapt-ML

The ML engine for OpenAdapt -- open-source desktop automation with demo-conditioned AI agents.

OpenAdapt-ML provides the GUI-specific ML layer for training and running vision-language model (VLM) agents that automate desktop tasks. It handles everything between raw screen recordings and a production policy API: canonical schemas for GUI trajectories, VLM adapters, supervised fine-tuning with TRL + Unsloth, grounding, and demo-conditioned inference.

Demos

Synthetic Login -- Qwen3-VL-2B fine-tuned on synthetic UI scenarios:

Key Features

GUI trajectory schemas -- Pydantic models for Episodes, Steps, Actions, and Observations with JSON Schema export and format converters (WAA, WebArena)
VLM adapters -- Unified interface for Qwen3-VL, Qwen2.5-VL, Claude, GPT, and Gemini with automatic device selection (CUDA / MPS / CPU)
Supervised fine-tuning -- TRL SFTTrainer with Unsloth optimizations for 2x faster training and 50% less VRAM via LoRA adapters
Runtime policy API -- AgentPolicy that predicts the next GUI action (CLICK, TYPE, DONE) from a screenshot and goal
Demo-conditioned inference -- Retrieval-augmented prompting using recorded demonstrations for trajectory-conditioned disambiguation
Grounding module -- Locate UI elements via Gemini vision API, oracle bounding boxes, or Set-of-Marks (SoM) overlays
Cloud GPU training -- One-command training pipelines for Lambda Labs and Azure
Synthetic data generation -- Configurable UI scenarios (login, registration) with layout jitter for rapid iteration

Installation

# Core package
pip install openadapt-ml

# With training dependencies (TRL + datasets)
pip install openadapt-ml[training]

# With API-backed VLMs (Claude, GPT)
pip install openadapt-ml[api]

# Development (from source)
git clone https://github.com/OpenAdaptAI/openadapt-ml.git
cd openadapt-ml
uv sync

Quick Start

Run a smoke test

# Model-free policy demo (no GPU required)
uv run python -m openadapt_ml.scripts.demo_policy --backend dummy

Train on synthetic data

# Fine-tune Qwen3-VL on synthetic login scenario
uv run python -m openadapt_ml.scripts.train \
  --config configs/qwen3vl_synthetic.yaml

Train on real recordings

# Record a workflow with openadapt-capture, then train
uv run python -m openadapt_ml.scripts.train \
  --config configs/qwen3vl_capture.yaml \
  --capture ~/captures/my-workflow \
  --open  # Opens training dashboard in browser

End-to-end benchmark (train + eval + plot)

uv run python -m openadapt_ml.scripts.run_qwen_login_benchmark \
  --config configs/qwen3vl_synthetic_dev.yaml \
  --out-dir experiments/qwen_login/2b_dev

Use the policy API

from openadapt_ml.runtime.policy import AgentPolicy
from openadapt_ml.models.qwen_vl import QwenVLAdapter

adapter = QwenVLAdapter(model_name="Qwen/Qwen3-VL-2B-Instruct")
policy = AgentPolicy(adapter)

# Given an SFT-style sample (screenshot + goal + chat history):
output = policy.predict(sample)
print(output.action)   # Action(type=CLICK, coordinates={"x": 0.45, "y": 0.71})
print(output.thought)  # "Click the Login button"

Use the schema

from openadapt_ml.schema import Episode, Step, Action, Observation, ActionType

episode = Episode(
    episode_id="demo_001",
    instruction="Open Notepad and type Hello World",
    steps=[
        Step(
            step_index=0,
            observation=Observation(screenshot_path="step_0.png"),
            action=Action(type=ActionType.CLICK, coordinates={"x": 100, "y": 200}),
        ),
        Step(
            step_index=1,
            observation=Observation(screenshot_path="step_1.png"),
            action=Action(type=ActionType.TYPE, text="Hello World"),
        ),
    ],
    success=True,
)

Architecture

openadapt_ml/
├── schema/              # Episode, Step, Action, Observation (Pydantic models)
│   ├── episode.py       #   Core dataclasses + JSON Schema export
│   └── converters.py    #   WAA/WebArena format converters
├── models/              # VLM adapters
│   ├── base_adapter.py  #   BaseVLMAdapter ABC
│   ├── qwen_vl.py       #   Qwen3-VL, Qwen2.5-VL
│   ├── api_adapter.py   #   Claude, GPT (inference-only)
│   └── dummy_adapter.py #   Fake adapter for testing
├── training/            # Fine-tuning pipeline
│   ├── trl_trainer.py   #   TRL SFTTrainer + Unsloth
│   ├── trainer.py       #   Training orchestration
│   └── viewer.py        #   Training dashboard (HTML)
├── runtime/             # Inference
│   ├── policy.py        #   AgentPolicy (screenshot -> action)
│   └── safety_gate.py   #   Action safety checks
├── datasets/            # Data loading
│   └── next_action.py   #   Episodes -> SFT chat samples
├── ingest/              # Data ingestion
│   ├── synthetic.py     #   Synthetic UI generation
│   ├── capture.py       #   openadapt-capture loader
│   └── loader.py        #   Generic episode loader
├── grounding/           # UI element localization
│   ├── base.py          #   OracleGrounder, GroundingModule ABC
│   └── detector.py      #   GeminiGrounder, SoM overlays
├── retrieval/           # Demo-conditioned inference
│   ├── retriever.py     #   Demo retrieval for RAG prompting
│   └── embeddings.py    #   Screenshot/action embeddings
├── benchmarks/          # ML-specific benchmark agents
│   └── agent.py         #   PolicyAgent, APIBenchmarkAgent, UnifiedBaselineAgent
├── cloud/               # Cloud GPU training
│   ├── lambda_labs.py   #   Lambda Labs integration
│   ├── local.py         #   Local training (CUDA/MPS)
│   └── ssh_tunnel.py    #   SSH tunnel management
├── segmentation/        # Recording segmentation pipeline
├── evals/               # Evaluation metrics (grounding, trajectory matching)
├── config.py            # Settings via pydantic-settings
└── scripts/             # CLI entry points (train, eval, compare, demo)

Benchmark Results

Synthetic Login (Qwen3-VL-2B with Set-of-Marks)

Metric	Score
Action Type Accuracy	100%
Element Accuracy	100%
Episode Success Rate	100%

Multi-Model Comparison (Synthetic Login, coordinate mode)

Model	Action Accuracy	Coord Error	Click Hit Rate
Qwen3-VL-2B FT	0.469	0.051	0.850
Qwen3-VL-8B FT	0.286	0.004	1.000
Claude Sonnet 4.5	0.121	0.757	0.000
GPT-5.1	0.183	0.057	0.600

These are results on a controlled synthetic benchmark with ~3 UI elements. They validate that the training pipeline works, not real-world performance. Evaluation on standard benchmarks (WAA, WebArena) is ongoing via openadapt-evals.

Cloud GPU Training

Lambda Labs

export LAMBDA_API_KEY=your_key_here

# One-command: launch, train, download, terminate
uv run python -m openadapt_ml.cloud.lambda_labs train \
  --capture ~/captures/my-workflow \
  --goal "Turn off Night Shift in System Settings"

Local (CUDA / Apple Silicon)

uv run python -m openadapt_ml.cloud.local train \
  --capture ~/captures/my-workflow --open

Ecosystem

OpenAdapt-ML is one component in the OpenAdapt stack:

Package	Purpose
openadapt-ml	ML engine: schemas, VLM adapters, training, inference, grounding
openadapt-evals	Evaluation infrastructure: VM management, pool orchestration, benchmark runners, `oa-vm` CLI
openadapt-capture	Lightweight GUI recording and demo sharing
OpenAdapt	Desktop automation platform (end-user application)

Looking for benchmark evaluation, Azure VM management, or the oa-vm CLI? Those live in openadapt-evals.

Documentation

docs/design.md -- System design (schemas, adapters, training, runtime)
docs/cloud_gpu_training.md -- Lambda Labs and Azure training guide
docs/qwen_login_experiment.md -- Synthetic benchmark reproduction
docs/gemini_grounding.md -- Grounding module documentation

Contributing

# Clone and install dev dependencies
git clone https://github.com/OpenAdaptAI/openadapt-ml.git
cd openadapt-ml
uv sync --extra dev --extra training

# Run tests
uv run pytest

# Lint
uv run ruff check .

We use Angular-style commits (feat:, fix:, docs:, etc.) with Python Semantic Release for automated versioning and PyPI publishing.

License

MIT

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

abrichr

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.15.1

Mar 21, 2026

0.15.0

Mar 19, 2026

0.14.1

Mar 4, 2026

0.14.0

Mar 4, 2026

This version

0.13.0

Mar 3, 2026

0.12.0

Mar 3, 2026

0.11.2

Feb 25, 2026

0.11.1

Feb 24, 2026

0.11.0

Feb 24, 2026

0.10.1

Feb 24, 2026

0.10.0

Feb 24, 2026

0.9.0

Feb 24, 2026

0.8.0

Feb 24, 2026

0.7.1

Feb 18, 2026

0.7.0

Feb 18, 2026

0.6.0

Feb 17, 2026

0.5.0

Feb 13, 2026

0.4.2

Feb 13, 2026

0.4.1

Feb 13, 2026

0.4.0

Feb 6, 2026

0.3.1

Feb 5, 2026

0.3.0

Feb 5, 2026

0.2.2

Jan 29, 2026

0.2.1

Jan 29, 2026

0.2.0

Jan 9, 2026

0.1.0

Dec 16, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

openadapt_ml-0.13.0.tar.gz (5.8 MB view details)

Uploaded Mar 3, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

openadapt_ml-0.13.0-py3-none-any.whl (493.8 kB view details)

Uploaded Mar 3, 2026 Python 3

File details

Details for the file openadapt_ml-0.13.0.tar.gz.

File metadata

Download URL: openadapt_ml-0.13.0.tar.gz
Upload date: Mar 3, 2026
Size: 5.8 MB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for openadapt_ml-0.13.0.tar.gz
Algorithm	Hash digest
SHA256	`cdd444fec25de37b2ce211f59d9a16ade5703b3d14928b4a97e4cf7a8fdcabe2`
MD5	`b638f3bbe9d32bb24678eb95b79afdce`
BLAKE2b-256	`2d476071b4f3e6dc4d77195080dc30859f2b9ae5f7230ac00eb2b3178f6e1b76`

See more details on using hashes here.

Provenance

The following attestation bundles were made for openadapt_ml-0.13.0.tar.gz:

Publisher: release.yml on OpenAdaptAI/openadapt-ml

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: openadapt_ml-0.13.0.tar.gz
- Subject digest: cdd444fec25de37b2ce211f59d9a16ade5703b3d14928b4a97e4cf7a8fdcabe2
- Sigstore transparency entry: 1021025830
- Sigstore integration time: Mar 3, 2026
Source repository:
- Permalink: OpenAdaptAI/openadapt-ml@dff678ac7a217fb76e2a17eadd9d946d40fde59a
- Branch / Tag: refs/heads/main
- Owner: https://github.com/OpenAdaptAI
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@dff678ac7a217fb76e2a17eadd9d946d40fde59a
- Trigger Event: push

File details

Details for the file openadapt_ml-0.13.0-py3-none-any.whl.

File metadata

Download URL: openadapt_ml-0.13.0-py3-none-any.whl
Upload date: Mar 3, 2026
Size: 493.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for openadapt_ml-0.13.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`2f91e33ae4fbe0b61943beed5bedd1975e4d7856d08b07be0416d15bb2fc0675`
MD5	`0fcc54279e8d03f85450e38c400af2d1`
BLAKE2b-256	`bef3c9b19853a715afb1f6d92f6344d6fee121ee81490edc90ce10382a077e26`

See more details on using hashes here.

Provenance

The following attestation bundles were made for openadapt_ml-0.13.0-py3-none-any.whl:

Publisher: release.yml on OpenAdaptAI/openadapt-ml

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: openadapt_ml-0.13.0-py3-none-any.whl
- Subject digest: 2f91e33ae4fbe0b61943beed5bedd1975e4d7856d08b07be0416d15bb2fc0675
- Sigstore transparency entry: 1021025906
- Sigstore integration time: Mar 3, 2026
Source repository:
- Permalink: OpenAdaptAI/openadapt-ml@dff678ac7a217fb76e2a17eadd9d946d40fde59a
- Branch / Tag: refs/heads/main
- Owner: https://github.com/OpenAdaptAI
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@dff678ac7a217fb76e2a17eadd9d946d40fde59a
- Trigger Event: push

openadapt-ml 0.13.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

OpenAdapt-ML

Demos

Key Features

Installation

Quick Start

Run a smoke test

Train on synthetic data

Train on real recordings

End-to-end benchmark (train + eval + plot)

Use the policy API

Use the schema

Architecture

Benchmark Results

Synthetic Login (Qwen3-VL-2B with Set-of-Marks)

Multi-Model Comparison (Synthetic Login, coordinate mode)

Cloud GPU Training

Lambda Labs

Local (CUDA / Apple Silicon)

Ecosystem

Documentation

Contributing

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance