Model-agnostic, domain-agnostic ML engine for GUI automation agents
Project description
OpenAdapt-ML
The ML engine for OpenAdapt -- open-source desktop automation with demo-conditioned AI agents.
OpenAdapt-ML provides the GUI-specific ML layer for training and running vision-language model (VLM) agents that automate desktop tasks. It handles everything between raw screen recordings and a production policy API: canonical schemas for GUI trajectories, VLM adapters, supervised fine-tuning with TRL + Unsloth, grounding, and demo-conditioned inference.
Demos
Synthetic Login -- Qwen3-VL-2B fine-tuned on synthetic UI scenarios:
Key Features
- GUI trajectory schemas -- Pydantic models for Episodes, Steps, Actions, and Observations with JSON Schema export and format converters (WAA, WebArena)
- VLM adapters -- Unified interface for Qwen3-VL, Qwen2.5-VL, Claude, GPT, and Gemini with automatic device selection (CUDA / MPS / CPU)
- Supervised fine-tuning -- TRL SFTTrainer with Unsloth optimizations for 2x faster training and 50% less VRAM via LoRA adapters
- Runtime policy API --
AgentPolicythat predicts the next GUI action (CLICK,TYPE,DONE) from a screenshot and goal - Demo-conditioned inference -- Retrieval-augmented prompting using recorded demonstrations for trajectory-conditioned disambiguation
- Grounding module -- Locate UI elements via Gemini vision API, oracle bounding boxes, or Set-of-Marks (SoM) overlays
- Cloud GPU training -- One-command training pipelines for Lambda Labs and Azure
- Synthetic data generation -- Configurable UI scenarios (login, registration) with layout jitter for rapid iteration
Installation
# Core package
pip install openadapt-ml
# With training dependencies (TRL + datasets)
pip install openadapt-ml[training]
# With API-backed VLMs (Claude, GPT)
pip install openadapt-ml[api]
# Development (from source)
git clone https://github.com/OpenAdaptAI/openadapt-ml.git
cd openadapt-ml
uv sync
Quick Start
Run a smoke test
# Model-free policy demo (no GPU required)
uv run python -m openadapt_ml.scripts.demo_policy --backend dummy
Train on synthetic data
# Fine-tune Qwen3-VL on synthetic login scenario
uv run python -m openadapt_ml.scripts.train \
--config configs/qwen3vl_synthetic.yaml
Train on real recordings
# Record a workflow with openadapt-capture, then train
uv run python -m openadapt_ml.scripts.train \
--config configs/qwen3vl_capture.yaml \
--capture ~/captures/my-workflow \
--open # Opens training dashboard in browser
End-to-end benchmark (train + eval + plot)
uv run python -m openadapt_ml.scripts.run_qwen_login_benchmark \
--config configs/qwen3vl_synthetic_dev.yaml \
--out-dir experiments/qwen_login/2b_dev
Use the policy API
from openadapt_ml.runtime.policy import AgentPolicy
from openadapt_ml.models.qwen_vl import QwenVLAdapter
adapter = QwenVLAdapter(model_name="Qwen/Qwen3-VL-2B-Instruct")
policy = AgentPolicy(adapter)
# Given an SFT-style sample (screenshot + goal + chat history):
output = policy.predict(sample)
print(output.action) # Action(type=CLICK, coordinates={"x": 0.45, "y": 0.71})
print(output.thought) # "Click the Login button"
Use the schema
from openadapt_ml.schema import Episode, Step, Action, Observation, ActionType
episode = Episode(
episode_id="demo_001",
instruction="Open Notepad and type Hello World",
steps=[
Step(
step_index=0,
observation=Observation(screenshot_path="step_0.png"),
action=Action(type=ActionType.CLICK, coordinates={"x": 100, "y": 200}),
),
Step(
step_index=1,
observation=Observation(screenshot_path="step_1.png"),
action=Action(type=ActionType.TYPE, text="Hello World"),
),
],
success=True,
)
Architecture
openadapt_ml/
├── schema/ # Episode, Step, Action, Observation (Pydantic models)
│ ├── episode.py # Core dataclasses + JSON Schema export
│ └── converters.py # WAA/WebArena format converters
├── models/ # VLM adapters
│ ├── base_adapter.py # BaseVLMAdapter ABC
│ ├── qwen_vl.py # Qwen3-VL, Qwen2.5-VL
│ ├── api_adapter.py # Claude, GPT (inference-only)
│ └── dummy_adapter.py # Fake adapter for testing
├── training/ # Fine-tuning pipeline
│ ├── trl_trainer.py # TRL SFTTrainer + Unsloth
│ ├── trainer.py # Training orchestration
│ └── viewer.py # Training dashboard (HTML)
├── runtime/ # Inference
│ ├── policy.py # AgentPolicy (screenshot -> action)
│ └── safety_gate.py # Action safety checks
├── datasets/ # Data loading
│ └── next_action.py # Episodes -> SFT chat samples
├── ingest/ # Data ingestion
│ ├── synthetic.py # Synthetic UI generation
│ ├── capture.py # openadapt-capture loader
│ └── loader.py # Generic episode loader
├── grounding/ # UI element localization
│ ├── base.py # OracleGrounder, GroundingModule ABC
│ └── detector.py # GeminiGrounder, SoM overlays
├── retrieval/ # Demo-conditioned inference
│ ├── retriever.py # Demo retrieval for RAG prompting
│ └── embeddings.py # Screenshot/action embeddings
├── benchmarks/ # ML-specific benchmark agents
│ └── agent.py # PolicyAgent, APIBenchmarkAgent, UnifiedBaselineAgent
├── cloud/ # Cloud GPU training
│ ├── lambda_labs.py # Lambda Labs integration
│ ├── local.py # Local training (CUDA/MPS)
│ └── ssh_tunnel.py # SSH tunnel management
├── segmentation/ # Recording segmentation pipeline
├── evals/ # Evaluation metrics (grounding, trajectory matching)
├── config.py # Settings via pydantic-settings
└── scripts/ # CLI entry points (train, eval, compare, demo)
Benchmark Results
Synthetic Login (Qwen3-VL-2B with Set-of-Marks)
| Metric | Score |
|---|---|
| Action Type Accuracy | 100% |
| Element Accuracy | 100% |
| Episode Success Rate | 100% |
Multi-Model Comparison (Synthetic Login, coordinate mode)
| Model | Action Accuracy | Coord Error | Click Hit Rate |
|---|---|---|---|
| Qwen3-VL-2B FT | 0.469 | 0.051 | 0.850 |
| Qwen3-VL-8B FT | 0.286 | 0.004 | 1.000 |
| Claude Sonnet 4.5 | 0.121 | 0.757 | 0.000 |
| GPT-5.1 | 0.183 | 0.057 | 0.600 |
These are results on a controlled synthetic benchmark with ~3 UI elements. They validate that the training pipeline works, not real-world performance. Evaluation on standard benchmarks (WAA, WebArena) is ongoing via openadapt-evals.
Cloud GPU Training
Lambda Labs
export LAMBDA_API_KEY=your_key_here
# One-command: launch, train, download, terminate
uv run python -m openadapt_ml.cloud.lambda_labs train \
--capture ~/captures/my-workflow \
--goal "Turn off Night Shift in System Settings"
Local (CUDA / Apple Silicon)
uv run python -m openadapt_ml.cloud.local train \
--capture ~/captures/my-workflow --open
Ecosystem
OpenAdapt-ML is one component in the OpenAdapt stack:
| Package | Purpose |
|---|---|
| openadapt-ml | ML engine: schemas, VLM adapters, training, inference, grounding |
| openadapt-evals | Evaluation infrastructure: VM management, pool orchestration, benchmark runners, oa-vm CLI |
| openadapt-capture | Lightweight GUI recording and demo sharing |
| OpenAdapt | Desktop automation platform (end-user application) |
Looking for benchmark evaluation, Azure VM management, or the
oa-vmCLI? Those live in openadapt-evals.
Documentation
docs/design.md-- System design (schemas, adapters, training, runtime)docs/cloud_gpu_training.md-- Lambda Labs and Azure training guidedocs/qwen_login_experiment.md-- Synthetic benchmark reproductiondocs/gemini_grounding.md-- Grounding module documentation
Contributing
# Clone and install dev dependencies
git clone https://github.com/OpenAdaptAI/openadapt-ml.git
cd openadapt-ml
uv sync --extra dev --extra training
# Run tests
uv run pytest
# Lint
uv run ruff check .
We use Angular-style commits (feat:, fix:, docs:, etc.) with Python Semantic Release for automated versioning and PyPI publishing.
License
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file openadapt_ml-0.13.0.tar.gz.
File metadata
- Download URL: openadapt_ml-0.13.0.tar.gz
- Upload date:
- Size: 5.8 MB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
cdd444fec25de37b2ce211f59d9a16ade5703b3d14928b4a97e4cf7a8fdcabe2
|
|
| MD5 |
b638f3bbe9d32bb24678eb95b79afdce
|
|
| BLAKE2b-256 |
2d476071b4f3e6dc4d77195080dc30859f2b9ae5f7230ac00eb2b3178f6e1b76
|
Provenance
The following attestation bundles were made for openadapt_ml-0.13.0.tar.gz:
Publisher:
release.yml on OpenAdaptAI/openadapt-ml
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
openadapt_ml-0.13.0.tar.gz -
Subject digest:
cdd444fec25de37b2ce211f59d9a16ade5703b3d14928b4a97e4cf7a8fdcabe2 - Sigstore transparency entry: 1021025830
- Sigstore integration time:
-
Permalink:
OpenAdaptAI/openadapt-ml@dff678ac7a217fb76e2a17eadd9d946d40fde59a -
Branch / Tag:
refs/heads/main - Owner: https://github.com/OpenAdaptAI
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@dff678ac7a217fb76e2a17eadd9d946d40fde59a -
Trigger Event:
push
-
Statement type:
File details
Details for the file openadapt_ml-0.13.0-py3-none-any.whl.
File metadata
- Download URL: openadapt_ml-0.13.0-py3-none-any.whl
- Upload date:
- Size: 493.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2f91e33ae4fbe0b61943beed5bedd1975e4d7856d08b07be0416d15bb2fc0675
|
|
| MD5 |
0fcc54279e8d03f85450e38c400af2d1
|
|
| BLAKE2b-256 |
bef3c9b19853a715afb1f6d92f6344d6fee121ee81490edc90ce10382a077e26
|
Provenance
The following attestation bundles were made for openadapt_ml-0.13.0-py3-none-any.whl:
Publisher:
release.yml on OpenAdaptAI/openadapt-ml
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
openadapt_ml-0.13.0-py3-none-any.whl -
Subject digest:
2f91e33ae4fbe0b61943beed5bedd1975e4d7856d08b07be0416d15bb2fc0675 - Sigstore transparency entry: 1021025906
- Sigstore integration time:
-
Permalink:
OpenAdaptAI/openadapt-ml@dff678ac7a217fb76e2a17eadd9d946d40fde59a -
Branch / Tag:
refs/heads/main - Owner: https://github.com/OpenAdaptAI
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@dff678ac7a217fb76e2a17eadd9d946d40fde59a -
Trigger Event:
push
-
Statement type: