Skip to main content

Image to Layout — screenshot OCR and semantic UI reconstruction

Project description

imgl

AI Cost Tracking

PyPI Version Python License AI Cost Human Time Model

  • 🤖 LLM usage: $2.0516 (2 commits)
  • 👤 Human dev: ~$200 (2.0h @ $100/h, 30min dedup)

Generated on 2026-06-08 using openrouter/qwen/qwen3-coder-next


Image to Layout — convert screenshots into semantic UI models with OCR text and element bounding boxes.

Installation

pip install -e .              # from repo
pip install -e ".[capture]"   # + screen capture (mss)
pip install -e ".[diagnose]"   # numpy for img2nl (install img2nl locally)
pip install -e ".[full]"      # capture + dev (no PyPI vql/img2vql)

# Local siblings (not on PyPI):
pip install -e ~/github/wronai/img2nl[analyze]
pip install -e ~/github/oqlos/vql
pip install -e ~/github/oqlos/vql/packages/img2vql

For uri2vql adopt-imgl, install imgl in the same venv as uri2vql:

pip install -e ~/github/semcod/imgl
# or: pip install -e ~/github/oqlos/vql/packages/uri2vql[imgl]

System dependency for OCR:

# Debian/Ubuntu
sudo apt install tesseract-ocr tesseract-ocr-pol

# macOS
brew install tesseract tesseract-lang

Development install:

pip install -e ".[dev]"

Usage

Python API

from imgl import analyze, scene_to_json

scene = analyze("screen.png", lang="eng+pol")
print(scene_to_json(scene))

CLI

# Use an existing screenshot (recommended on GNOME/Wayland):
imgl diagnose /tmp/screen.png
imgl vql /tmp/screen.png -o layout.vql.json

# Or capture (needs vql portal on Wayland — mss alone gives black screen):
pip install -e ~/github/oqlos/vql   # portal/grim backends
imgl capture --interactive -o screen.png
imgl diagnose screen.png            # must show worth_analyzing: true

# analyze / export (aborts on blank unless --allow-blank)
imgl analyze /tmp/screen.png --json
imgl analyze screen.png -o screen.imgl.json --lang eng+pol
imgl html screen.png -o screen.html --embed-image
imgl svg screen.png --mode overlay -o screen.svg
imgl svg screen.png --mode wireframe -o screen.svg
imgl vql screen.png -o layout.vql.json --with-grid

Interactive shell (pick action from catalog)

imgl interact /tmp/screen.png -o layout.vql.json
# numer opcji, NL: "kliknij Save", "mapa", "lista", "quit"
# obraz z numerami jak w shellu:
imgl annotate screen.png --open
imgl interact screen.png --annotate --open
# lepsza lista (filtr szumu OCR):
imgl interact screen.png
# vision LLM (wymaga OPENROUTER_API_KEY + pip install -e ".[llm]"):
imgl interact screen.png --llm --annotate --open
# opcjonalnie wykonaj na pulpicie:
imgl interact /tmp/screen.png --execute   # wymaga xdotool lub ydotool

URI DSL (vql://window/imgl?action=...):

action opis
analyze OCR + layout → VQL JSON (domyślne)
list lista elementów interaktywnych
annotate PNG ze zrzutu + numerowane ramki
click text=, element_id=, window=
type value=, label=, text=

Via uri2vql (when installed):

uri2vql query 'vql://window/imgl?image=/tmp/screen.png&file=layout.vql.json&lang=eng'
uri2vql query 'vql://window/imgl?image=/tmp/screen.png&file=layout.vql.json&action=list'
uri2vql query 'vql://window/imgl?image=/tmp/screen.png&file=layout.vql.json&action=click&text=Save'
# For Polish+English OCR in URI use encoded plus: lang=eng%2Bpol

NL → URI (nlp2uri / imgl built-in):

# w shellu imgl interact: "kliknij Save", "wpisz test w search", "2", "lista"

HTML / SVG export

from imgl import analyze, scene_to_html, scene_to_svg

scene = analyze("screen.png")
html = scene_to_html(scene, embed_image=True)
svg = scene_to_svg(scene, mode="overlay", background="screen.png")

HTML uses absolutely positioned elements with data-type, data-id, data-text attributes for text-based automation (button[data-text="Save"]).

SVG supports wireframe (flat debug view) and overlay (boxes on top of screenshot).

Output format

analyze() returns a Scene with:

  • windows — detected UI windows/panels (local heuristics or optional img2vql)
  • elements — classified UI elements: button, input, label, text, toolbar
  • ocr_boxes — raw OCR word boxes with confidence scores

Example JSON:

{
  "version": "1.0",
  "scene": {"width": 800, "height": 600, "source_image": "screen.png"},
  "windows": [{
    "id": "win-screen",
    "bbox": {"x": 0, "y": 0, "w": 800, "h": 600},
    "title": null,
    "z": 0,
    "elements": [
      {"id": "text-0", "type": "text", "text": "Save", "bbox": {"x": 100, "y": 50, "w": 40, "h": 16}}
    ]
  }],
  "ocr_boxes": [],
  "metadata": {"ocr_backend": "tesseract", "lang": "eng+pol"}
}

Configuration

from imgl import ImglConfig, analyze

scene = analyze("screen.png", config=ImglConfig(
    lang="eng+pol",
    use_img2vql=True,      # use img2vql when installed, else local detect
    detect_inputs=True,
    label_proximity_px=40,
))

VQL export

from imgl import analyze, scene_to_vql, write_vql_program

scene = analyze("screen.png")
program = scene_to_vql(scene, include_grid=True, grid=12)
write_vql_program(scene, "layout.vql.json")

Layers: windows, ui_elements (with OCR text in metadata), text_regions, optional screen_regions.

Text-based actions

from imgl import analyze, actions

scene = analyze("screen.png")
ui = actions(scene)

ui.click("button", text="Save")
# {"action": "click", "x": 310, "y": 206, ...}

ui.type_into("alice", label="Username")
# {"action": "type", "x": 245, "y": 99, "text": "alice", ...}

CLI:

imgl find screen.png --type button --text Save --click
imgl find screen.png --label Username --type-into alice
imgl find screen.png --list

Roadmap

  • nlp2uri phrases for vql://window/imgl
  • koru desktop bridge for action execution

License

Licensed under Apache-2.0.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

imgl-0.7.1.tar.gz (72.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

imgl-0.7.1-py3-none-any.whl (70.8 kB view details)

Uploaded Python 3

File details

Details for the file imgl-0.7.1.tar.gz.

File metadata

  • Download URL: imgl-0.7.1.tar.gz
  • Upload date:
  • Size: 72.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for imgl-0.7.1.tar.gz
Algorithm Hash digest
SHA256 d4ef6755e982ab1a885f84161d324bd5e9daeefce9e42cad90413ff32fbde992
MD5 042ad54b05295cc20e0fc559d3edd41d
BLAKE2b-256 14720c39e034870644a0725e6f8c05827c0ea7f886b7adf9a8dd2a642ae6ad73

See more details on using hashes here.

File details

Details for the file imgl-0.7.1-py3-none-any.whl.

File metadata

  • Download URL: imgl-0.7.1-py3-none-any.whl
  • Upload date:
  • Size: 70.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for imgl-0.7.1-py3-none-any.whl
Algorithm Hash digest
SHA256 ccd7df1ac805a5ccf9cf3a25c665868705d9e7ae489706726a28f5caa7551eb5
MD5 ee5a216a06506c97840d2e2dbb0acb1a
BLAKE2b-256 dec2c5d0c266895374c1cfc9659347684caf93fd8a646123de77d60fe2056fac

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page