Image to Layout — screenshot OCR and semantic UI reconstruction
Project description
imgl
AI Cost Tracking
- 🤖 LLM usage: $2.0516 (2 commits)
- 👤 Human dev: ~$200 (2.0h @ $100/h, 30min dedup)
Generated on 2026-06-08 using openrouter/qwen/qwen3-coder-next
Image to Layout — convert screenshots into semantic UI models with OCR text and element bounding boxes.
Installation
pip install -e . # from repo
pip install -e ".[capture]" # + screen capture (mss)
pip install -e ".[diagnose]" # numpy for img2nl (install img2nl locally)
pip install -e ".[full]" # capture + dev (no PyPI vql/img2vql)
# Local siblings (not on PyPI):
pip install -e ~/github/wronai/img2nl[analyze]
pip install -e ~/github/oqlos/vql
pip install -e ~/github/oqlos/vql/packages/img2vql
For uri2vql adopt-imgl, install imgl in the same venv as uri2vql:
pip install -e ~/github/semcod/imgl
# or: pip install -e ~/github/oqlos/vql/packages/uri2vql[imgl]
System dependency for OCR:
# Debian/Ubuntu
sudo apt install tesseract-ocr tesseract-ocr-pol
# macOS
brew install tesseract tesseract-lang
Development install:
pip install -e ".[dev]"
Usage
Python API
from imgl import analyze, scene_to_json
scene = analyze("screen.png", lang="eng+pol")
print(scene_to_json(scene))
CLI
# Use an existing screenshot (recommended on GNOME/Wayland):
imgl diagnose /tmp/screen.png
imgl vql /tmp/screen.png -o layout.vql.json
# Or capture (needs vql portal on Wayland — mss alone gives black screen):
pip install -e ~/github/oqlos/vql # portal/grim backends
imgl capture --interactive -o screen.png
imgl diagnose screen.png # must show worth_analyzing: true
# analyze / export (aborts on blank unless --allow-blank)
imgl analyze /tmp/screen.png --json
imgl analyze screen.png -o screen.imgl.json --lang eng+pol
imgl html screen.png -o screen.html --embed-image
imgl svg screen.png --mode overlay -o screen.svg
imgl svg screen.png --mode wireframe -o screen.svg
imgl vql screen.png -o layout.vql.json --with-grid
Interactive shell (pick action from catalog)
imgl interact /tmp/screen.png -o layout.vql.json
# numer opcji, NL: "kliknij Save", "mapa", "lista", "quit"
# obraz z numerami jak w shellu:
imgl annotate screen.png --open
imgl interact screen.png --annotate --open
# lepsza lista (filtr szumu OCR):
imgl interact screen.png
# vision LLM (wymaga OPENROUTER_API_KEY + pip install -e ".[llm]"):
imgl interact screen.png --llm --annotate --open
# opcjonalnie wykonaj na pulpicie:
imgl interact /tmp/screen.png --execute # wymaga xdotool lub ydotool
URI DSL (vql://window/imgl?action=...):
| action | opis |
|---|---|
analyze |
OCR + layout → VQL JSON (domyślne) |
list |
lista elementów interaktywnych |
annotate |
PNG ze zrzutu + numerowane ramki |
click |
text=, element_id=, window= |
type |
value=, label=, text= |
Via uri2vql (when installed):
uri2vql query 'vql://window/imgl?image=/tmp/screen.png&file=layout.vql.json&lang=eng'
uri2vql query 'vql://window/imgl?image=/tmp/screen.png&file=layout.vql.json&action=list'
uri2vql query 'vql://window/imgl?image=/tmp/screen.png&file=layout.vql.json&action=click&text=Save'
# For Polish+English OCR in URI use encoded plus: lang=eng%2Bpol
NL → URI (nlp2uri / imgl built-in):
# w shellu imgl interact: "kliknij Save", "wpisz test w search", "2", "lista"
HTML / SVG export
from imgl import analyze, scene_to_html, scene_to_svg
scene = analyze("screen.png")
html = scene_to_html(scene, embed_image=True)
svg = scene_to_svg(scene, mode="overlay", background="screen.png")
HTML uses absolutely positioned elements with data-type, data-id, data-text attributes
for text-based automation (button[data-text="Save"]).
SVG supports wireframe (flat debug view) and overlay (boxes on top of screenshot).
Output format
analyze() returns a Scene with:
windows— detected UI windows/panels (local heuristics or optionalimg2vql)elements— classified UI elements:button,input,label,text,toolbarocr_boxes— raw OCR word boxes with confidence scores
Example JSON:
{
"version": "1.0",
"scene": {"width": 800, "height": 600, "source_image": "screen.png"},
"windows": [{
"id": "win-screen",
"bbox": {"x": 0, "y": 0, "w": 800, "h": 600},
"title": null,
"z": 0,
"elements": [
{"id": "text-0", "type": "text", "text": "Save", "bbox": {"x": 100, "y": 50, "w": 40, "h": 16}}
]
}],
"ocr_boxes": [],
"metadata": {"ocr_backend": "tesseract", "lang": "eng+pol"}
}
Configuration
from imgl import ImglConfig, analyze
scene = analyze("screen.png", config=ImglConfig(
lang="eng+pol",
use_img2vql=True, # use img2vql when installed, else local detect
detect_inputs=True,
label_proximity_px=40,
))
VQL export
from imgl import analyze, scene_to_vql, write_vql_program
scene = analyze("screen.png")
program = scene_to_vql(scene, include_grid=True, grid=12)
write_vql_program(scene, "layout.vql.json")
Layers: windows, ui_elements (with OCR text in metadata), text_regions, optional screen_regions.
Text-based actions
from imgl import analyze, actions
scene = analyze("screen.png")
ui = actions(scene)
ui.click("button", text="Save")
# {"action": "click", "x": 310, "y": 206, ...}
ui.type_into("alice", label="Username")
# {"action": "type", "x": 245, "y": 99, "text": "alice", ...}
CLI:
imgl find screen.png --type button --text Save --click
imgl find screen.png --label Username --type-into alice
imgl find screen.png --list
Roadmap
nlp2uriphrases forvql://window/imgl- koru desktop bridge for action execution
License
Licensed under Apache-2.0.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file imgl-0.7.1.tar.gz.
File metadata
- Download URL: imgl-0.7.1.tar.gz
- Upload date:
- Size: 72.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d4ef6755e982ab1a885f84161d324bd5e9daeefce9e42cad90413ff32fbde992
|
|
| MD5 |
042ad54b05295cc20e0fc559d3edd41d
|
|
| BLAKE2b-256 |
14720c39e034870644a0725e6f8c05827c0ea7f886b7adf9a8dd2a642ae6ad73
|
File details
Details for the file imgl-0.7.1-py3-none-any.whl.
File metadata
- Download URL: imgl-0.7.1-py3-none-any.whl
- Upload date:
- Size: 70.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ccd7df1ac805a5ccf9cf3a25c665868705d9e7ae489706726a28f5caa7551eb5
|
|
| MD5 |
ee5a216a06506c97840d2e2dbb0acb1a
|
|
| BLAKE2b-256 |
dec2c5d0c266895374c1cfc9659347684caf93fd8a646123de77d60fe2056fac
|