Image to Layout — screenshot OCR and semantic UI reconstruction
Project description
ImgL - Image to Layout — convert screenshots into semantic UI models with OCR text and element bounding boxes.
AI Cost Tracking
- 🤖 LLM usage: $9.8356 (6 commits)
- 👤 Human dev: ~$438 (4.4h @ $100/h, 30min dedup)
Generated on 2026-06-09 using openrouter/qwen/qwen3-coder-next
Installation
pip install -e . # from repo
pip install -e ".[capture]" # mss (X11 fallback)
pip install -e ".[diagnose]" # numpy for img2nl (install img2nl locally)
pip install -e ".[full]" # capture + diagnose + dev + llm + web
# Local siblings (not on PyPI) — mirror capture on Wayland:
make install-dev # .[dev,llm,capture] + vdisplay when ~/github/wronai/vdisplay exists
imgl install vdisplay # pip install -e ~/github/wronai/vdisplay[pillow]
pip install -e ~/github/wronai/vdisplay[pillow] # same as above
pip install -e ~/github/wronai/img2nl[analyze]
pip install -e ~/github/oqlos/vql
pip install -e ~/github/oqlos/vql/packages/img2vql
For uri2vql adopt-imgl, install imgl in the same venv as uri2vql:
pip install -e ~/github/semcod/imgl
# or: pip install -e ~/github/oqlos/vql/packages/uri2vql[imgl]
System dependency for OCR:
# Debian/Ubuntu
sudo apt install tesseract-ocr tesseract-ocr-pol
# macOS
brew install tesseract tesseract-lang
Development install:
pip install -e ".[dev]"
pip install -e ".[llm]" # vision LLM catalog (OpenRouter)
Makefile (szybki start)
make help # lista komend
make install-full # imgl + capture + llm + control + web
make capture-interactive # vdisplay mirror → screen.png (portal fallback na Wayland)
make doctor-full FORMAT=markdown
make execute-llm PROMPT='wpisz test w Chat input'
make demo-key # dsl2imgl KEY ctrl+Return (dry-run)
make demo-chat # wpisz w Chat input + ctrl+enter (dry-run)
make serve-rest # rest2imgl :8219
make serve-web # imgl serve :8008
make test-dsl2imgl # testy Fazy 4 (Schema/Protobuf/ES)
Integracja z Koru: cd ~/github/semcod/koru && make install-imgl-bridge
Documentation
| Temat | Link |
|---|---|
| Indeks | docs/README.md |
| Capture (mirror, portal, Wayland) | docs/capture.md |
| Architektura (imgl / vql / nlp2uri) | docs/architecture.md |
Warstwa kontroli *2imgl |
docs/control-layer.md |
| NL ze shell (chat input, Enter/Ctrl+Enter) | docs/nl-shell-examples.md |
| Głos + przeglądarka | docs/voice-browser.md |
| Web UI (port 8008) | docs/web-ui.md |
| Paczki kontroli | packages/README.md |
Examples
Pełna dokumentacja z przykładami dla różnych systemów, aplikacji i konfiguracji:
| Temat | Link |
|---|---|
| GNOME/Wayland | examples/platforms/gnome-wayland |
| Wybór okna / wycinki | examples/workflows/window-picker |
| GitHub w przeglądarce | examples/applications/github-browser |
| IDE (Windsurf/VS Code) | examples/applications/ide-editor |
| LLM per okno | examples/configurations/per-window-llm |
| NL → URI (nlp2uri) | examples/integrations/nlp2uri |
| Pętla agenta | examples/workflows/multi-step-agent |
| Web UI (port 8008) | examples/workflows/web-ui |
Szybkie demo:
examples/scripts/demo-windows.sh screen.png
examples/scripts/demo-nlp2uri.py screen.png region-top
Usage
Python API
from imgl import analyze, scene_to_json
scene = analyze("screen.png", lang="eng+pol")
print(scene_to_json(scene))
CLI
# Use an existing screenshot (recommended on GNOME/Wayland):
imgl diagnose /tmp/screen.png
imgl vql /tmp/screen.png -o layout.vql.json
# Capture (vdisplay mirror wbudowany w imgl[capture] — bez dialogu GNOME):
make install-dev # vdisplay + mss w extra capture
make capture-interactive # mirror capture → screen.png
imgl capture -o screen.png --verify # to samo bez make
imgl capture --portal -o screen.png # fallback: GNOME region picker
imgl diagnose screen.png # must show worth_analyzing: true
# analyze / export (aborts on blank unless --allow-blank)
imgl analyze /tmp/screen.png --json
imgl analyze screen.png -o screen.imgl.json --lang eng+pol
imgl html screen.png -o screen.html --embed-image
imgl svg screen.png --mode overlay -o screen.svg
imgl svg screen.png --mode wireframe -o screen.svg
imgl vql screen.png -o layout.vql.json --with-grid
Web UI (manual + agent, port 8008)
pip install -e ".[web,llm,capture]"
imgl serve --port 8008
# z wykonaniem na pulpicie i LLM:
imgl serve --port 8008 --execute --llm --capture-on-start
Otwórz http://127.0.0.1:8008 — podgląd zrzutu z numerami, lista akcji z miniaturkami, NL i pętla agenta (capture → act → capture).
Szczegóły: docs/web-ui.md, docs/voice-browser.md.
Control layer (REST / DSL / NL, port 8219)
Sterowanie z zewnątrz (shell, curl, MCP, asystent głosowy):
make install-control # imgl install control
make capture-interactive # lub: imgl capture -o screen.png --verify
make serve-rest # http://127.0.0.1:8219
# DSL
dsl2imgl exec 'KEY ctrl+Return EXECUTE 0'
dsl2imgl exec 'TYPE "hello" IN "Chat input" IMAGE screen.png WINDOW region-bottom EXECUTE 0'
# NL
nlp2imgl apply "wpisz opisz projekt w Chat input" --image screen.png --window region-bottom
nlp2imgl apply "naciśnij ctrl+enter" --execute
Z Koru (w koru/.venv, nie imgl/.venv):
cd ~/github/semcod/koru && make install-imgl-bridge
make imgl-capture imgl-chat
koru imgl execute "wpisz test w Chat input" --window region-bottom --dry-run
Pełne przykłady: docs/nl-shell-examples.md, docs/control-layer.md.
Window discovery (regiony na zrzucie)
Na złożonych zrzutach (przeglądarka + IDE) najpierw wybierz region:
imgl windows screen.png --export-crops --annotate --open
# → screen.region-top.png, screen.region-bottom.png (+ .numbered.png)
imgl interact screen.png --llm --window region-top # GitHub
imgl interact screen.png --llm --window region-bottom # IDE
Interaktywny wybór okna (gdy jest >1 region):
imgl interact screen.png --llm
# → lista okien → wpisz numer (1, 2) lub "podglad"
Interactive shell (pick action from catalog)
imgl interact /tmp/screen.png -o layout.vql.json
# numer opcji, NL: "kliknij Save", "mapa", "lista", "okna", "quit"
# obraz z numerami:
imgl annotate screen.png --open
imgl interact screen.png --annotate --open
# filtr szumu OCR (domyślnie włączony):
imgl interact screen.png
# vision LLM (OPENROUTER_API_KEY + pip install -e ".[llm]"):
imgl interact screen.png --llm --window region-top --annotate --open
# wykonanie na pulpicie (Linux, xdotool/ydotool):
imgl interact /tmp/screen.png --execute
URI DSL (vql://window/imgl?action=...):
| action | opis |
|---|---|
analyze |
OCR + layout → VQL JSON (domyślne) |
list |
lista elementów interaktywnych |
annotate |
PNG ze zrzutu + numerowane ramki |
click |
text=, element_id=, window= |
type |
value=, label=, text= |
Via uri2vql (when installed):
uri2vql query 'vql://window/imgl?image=/tmp/screen.png&file=layout.vql.json&lang=eng'
uri2vql query 'vql://window/imgl?image=/tmp/screen.png&file=layout.vql.json&action=list'
uri2vql query 'vql://window/imgl?image=/tmp/screen.png&file=layout.vql.json&action=click&text=Save'
# For Polish+English OCR in URI use encoded plus: lang=eng%2Bpol
NL → URI (nlp2uri / imgl built-in):
# w shellu imgl interact: "kliknij Save", "wpisz test w search", "2", "lista"
HTML / SVG export
from imgl import analyze, scene_to_html, scene_to_svg
scene = analyze("screen.png")
html = scene_to_html(scene, embed_image=True)
svg = scene_to_svg(scene, mode="overlay", background="screen.png")
HTML uses absolutely positioned elements with data-type, data-id, data-text attributes
for text-based automation (button[data-text="Save"]).
SVG supports wireframe (flat debug view) and overlay (boxes on top of screenshot).
Output format
analyze() returns a Scene with:
windows— detected UI windows/panels (local heuristics or optionalimg2vql)elements— classified UI elements:button,input,label,text,toolbarocr_boxes— raw OCR word boxes with confidence scores
Example JSON:
{
"version": "1.0",
"scene": {"width": 800, "height": 600, "source_image": "screen.png"},
"windows": [{
"id": "win-screen",
"bbox": {"x": 0, "y": 0, "w": 800, "h": 600},
"title": null,
"z": 0,
"elements": [
{"id": "text-0", "type": "text", "text": "Save", "bbox": {"x": 100, "y": 50, "w": 40, "h": 16}}
]
}],
"ocr_boxes": [],
"metadata": {"ocr_backend": "tesseract", "lang": "eng+pol"}
}
Configuration
from imgl import ImglConfig, analyze
scene = analyze("screen.png", config=ImglConfig(
lang="eng+pol",
use_img2vql=True, # use img2vql when installed, else local detect
detect_inputs=True,
label_proximity_px=40,
))
VQL export
from imgl import analyze, scene_to_vql, write_vql_program
scene = analyze("screen.png")
program = scene_to_vql(scene, include_grid=True, grid=12)
write_vql_program(scene, "layout.vql.json")
Layers: windows, ui_elements (with OCR text in metadata), text_regions, optional screen_regions.
Text-based actions
from imgl import analyze, actions
scene = analyze("screen.png")
ui = actions(scene)
ui.click("button", text="Save")
# {"action": "click", "x": 310, "y": 206, ...}
ui.type_into("alice", label="Username")
# {"action": "type", "x": 245, "y": 99, "text": "alice", ...}
CLI:
imgl find screen.png --type button --text Save --click
imgl find screen.png --label Username --type-into alice
imgl find screen.png --list
Roadmap
Zobacz TODO.md.
- uri2vql:
window_scopew handlerzevql://window/imgl dsl2imglFaza 4: JSON Schema + Protobuf + EventStore- Web UI: mikrofon (Web Speech API), akcja KEY w panelu
- koru desktop bridge for action execution
License
Licensed under Apache-2.0.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file imgl-0.7.5.tar.gz.
File metadata
- Download URL: imgl-0.7.5.tar.gz
- Upload date:
- Size: 122.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8e6c835445cc06d3281dc9069c4bfbdb9165123dc92aec7da070d702e99ff449
|
|
| MD5 |
a6ed2fb79f036ffda544b45c33c6d89f
|
|
| BLAKE2b-256 |
2a3ea788e4dda93a1237cad6b61f97bee295b055d11c613a38f8872457d6e47a
|
File details
Details for the file imgl-0.7.5-py3-none-any.whl.
File metadata
- Download URL: imgl-0.7.5-py3-none-any.whl
- Upload date:
- Size: 115.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
709617e48c67ad65c39b4d17622f69db143ab04d2e60e729c691edf222305172
|
|
| MD5 |
9dd980ba5f36e83ea99b4ff3125da8a8
|
|
| BLAKE2b-256 |
9dd4eb9870f83e94099ba6050d8324d0d9ef6ed39661c4225c001a5b28a7e9c6
|