Skip to main content

Local UI-grounding specialist for hybrid AI agents. Drop in a screenshot + text target, get a strict JSON bbox. 2B params, MLX-native, Apache 2.0.

Project description

browserground (Python)

The local UI-grounding specialist for hybrid AI agents. Drop in a screenshot + text target, get a strict JSON bbox. 2B params. MLX-native on Apple Silicon. Apache 2.0.

This is the Python entry point. For the full-featured CLI (daemon, HTTP server, batch mode, eval), install the npm package: npm install -g browserground.

Install

# Apple Silicon (recommended) — uses the MLX 4-bit build, ~1-2s/call
pip install "browserground[mlx]"

# Or, CUDA / CPU (slower, ~10-14s/call on M-series via MPS)
pip install "browserground[transformers]"

Use

from browserground import ground, ground_bbox, click_xy

# Full result with timing + raw text
res = ground("screenshot.png", "the green Subscribe button")
print(res)
# {'bbox_2d': [344, 612, 478, 658], 'model_elapsed_s': 1.4, 'backend': 'mlx', ...}

# Just the bbox
bbox = ground_bbox("screenshot.png", "Submit button")

# Center coords for browser-use / Playwright / etc.
x, y = click_xy("screenshot.png", "the back arrow")

How it works

browserground is a Qwen3-VL-2B base + a LoRA fine-tune for UI grounding (rank 32, 26k training examples across macOS / Android / UIBert / web). Output is strict JSON ({"bbox_2d": [x1, y1, x2, y2]}), 100% parseable on the held-out eval. 60.0% on ScreenSpot-v2 (300 items, vs SeeClick's 55.1% at 9.6B params — that's 4.8× smaller).

browser-use / Skyvern integration

from browserground import click_xy

# Inside your browser-use action:
xy = click_xy("/tmp/page.png", "the green Subscribe button")
if xy:
    await page.mouse.click(*xy)

Plug-in templates: https://github.com/renezander030/browserground/tree/main/plugins.

Why this exists

Most agents send every screenshot to a frontier vision model just to find click coordinates. That's a $0.01–0.05 multimodal call, 20–50× per run. A 2B local specialist costs $0/call, runs on a laptop, doesn't send your screenshots anywhere. The hybrid pattern: cheap fast local specialist for the parser-style task, frontier model only for reasoning.

Links

License: Apache 2.0.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

browserground-0.3.0.tar.gz (6.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

browserground-0.3.0-py3-none-any.whl (8.1 kB view details)

Uploaded Python 3

File details

Details for the file browserground-0.3.0.tar.gz.

File metadata

  • Download URL: browserground-0.3.0.tar.gz
  • Upload date:
  • Size: 6.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for browserground-0.3.0.tar.gz
Algorithm Hash digest
SHA256 6d7e00436584315a155959ce8f3cfd5f096f25d969c12996cc02fc67adf88815
MD5 d9212be413a4e8d82b0844b7e5193101
BLAKE2b-256 0ef326fdd339ab05758064073e3c1b7b6019ca383d272779ab345fce0e77e7a1

See more details on using hashes here.

File details

Details for the file browserground-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: browserground-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 8.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for browserground-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 4397a7f17800985e647e66a1ba61b921fa4bded100a9f16bb4922f49ac889397
MD5 69bf2f899366aa3587681264a3ec914a
BLAKE2b-256 0151d4184c93f80622a91e658e4e36d54f47a1fbb533b7cdd4d86bffa3e86a29

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page