Skip to main content

Vision first browser agents based on Websight-7B, a custom model

Project description

Websight

Minimal Python package for calling the Websight VLM and a thin browser Agent.

Layout

src/websight/
  __init__.py
  agent/
    agent.py         # Agent(run/execute_action)
    browser.py       # Playwright wrapper
  model/
    websight.py      # websight_call(prompt, history, image)
    actions.py       # Action + parse_action
    prompts.py       # system prompt for websight_call
    llm.py           # simple OpenRouter LLM helpers
scripts/
  manual_image_demo.py
eval/
  showdown/...
tests/

Install (editable) and run tests

uv run --frozen -- python -V  # ensure Python is available
PYTHONPATH=src uv run --group test pytest -q tests

Quickstart

Programmatic use:

from websight import websight_call

# image_base64 may include the 'data:image/png;base64,' prefix or raw base64
action = websight_call(
    prompt="Click the Login button",
    history=[],  # list of (reasoning, action_str) pairs from prior steps
    image_base64="data:image/png;base64,<...>",
)
print(action.action, action.args)

Agent (with a real browser via Playwright):

PYTHONPATH=src uv run python websight.py --task "Go to https://example.com and click More" --show-browser

Manual image demo (no browser):

PYTHONPATH=src uv run python scripts/manual_image_demo.py \
  --image data/showdown_clicks/images/0b1c958b929acdbf.png \
  --max-new-tokens 512

Environment

Web requests to LLMs use OpenRouter. Set:

export OPENROUTER_API_KEY=...

The Websight model will be loaded from Hugging Face via transformers pipeline: tanvirb/websight-7B.

Packaging (src layout)

This repository is configured for a src layout with setuptools.

[build-system]
requires = ["setuptools>=68", "wheel"]
build-backend = "setuptools.build_meta"

[tool.setuptools]
packages = {find = {where = ["src"]}}

Build locally (artifacts in dist/):

uv build

Do not publish yet. When ready, you can publish with uv publish after setting the appropriate token.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

websight-0.1.0.tar.gz (9.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

websight-0.1.0-py3-none-any.whl (9.1 kB view details)

Uploaded Python 3

File details

Details for the file websight-0.1.0.tar.gz.

File metadata

  • Download URL: websight-0.1.0.tar.gz
  • Upload date:
  • Size: 9.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.6.4

File hashes

Hashes for websight-0.1.0.tar.gz
Algorithm Hash digest
SHA256 a819d4529b4dd02ca501e9e54e37b72029c43e587ae679fb069132ad798f444b
MD5 32f24ae3d7412e100d4037dc2c12e07b
BLAKE2b-256 8202fed1407b6102b7d08477260a4e32872904336de41496406ef6a403c501c8

See more details on using hashes here.

File details

Details for the file websight-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: websight-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 9.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.6.4

File hashes

Hashes for websight-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 e0f1506a0fed9049ff2fede8085263101629d2f2b13e85f1bdb66f5776095755
MD5 2458ea51428cfa7171c0a1b786589791
BLAKE2b-256 79133372e08a9a54b18cd19b6152c7a07bd91d652988e09d50da52657eb7d6db

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page