Eyes for AI Agents — a machine-graded visual feedback loop coding agents consume to self-correct before claiming done.
Project description
AgentVision — Eyes for AI Agents 👁️
Problem: AI coding agents are blind — they write a UI, chart, SVG or PDF and never see the result, shipping breakage they can't perceive.
Result: AgentVision gives them eyes — render → see → report → fix — catching overflow, low contrast, broken images and typos.
So your agent self-corrects before it claims done.
AgentVision is a provider-agnostic framework that closes the visual feedback loop for AI coding agents:
render → perceive → report → (agent fixes) → re-render → diff
It is not human-reviewed visual regression (Percy/Applitools/Argos) and not browser
automation (browser-use/Playwright). It is a machine-graded visual critique loop an agent
consumes to self-correct before claiming done — with a verdict (pass/warn/fail) and
actionable, coordinate-grounded issues.
The 60-second pitch
pip install "agentvision[render]"
playwright install chromium # see `agentvision doctor` if Chromium won't launch
agentvision demo # no API key required
agentvision demo renders a deliberately broken page, prints a FAIL report (overflow +
low-contrast + a 404 image — all DOM/CV-grounded, no LLM key needed), then loops against the
fixed version and prints "what changed: 3 issues resolved → PASS." That command is the
product.
What makes it trustworthy
Findings are grounded in sources we can actually trust:
- DOM geometry (
getBoundingClientRect+ scroll offset) — precise element boxes. - Computed-style contrast (
getComputedStyle) — real WCAG ratios, with aconfidenceflag (it degrades honestly over gradients/images/pseudo-elements rather than lying). - OCR word boxes (Tesseract) — precise text locations.
- Console / network / 4xx capture — the #1 "looks fine in code, broken live" cause.
A vision LLM (Claude/OpenAI/Gemini) adds semantic critique on top. Its pixel boxes are
treated as advisory (bbox_precise: false), never marketed as pixel-accurate.
Match the intent, not just avoid defects
A typo-free, well-laid-out artifact can still be the wrong thing — an infographic that shows the wrong stages, a page missing the panel you asked for, a generated image that ignored half the prompt. Give AgentVision the intent and it grades the render against it, so PASS means "matches what I set out to build," not merely "defect-free":
# Does the render match the thought? (text claims grade deterministically via OCR)
agentvision conform ./infographic.png \
--brief "launch infographic for AgentVision" \
--expect 'must: title reads "AgentVision"' \
--expect 'should: shows 4 stages left to right'
For AI-generated artifacts the fix is a better prompt, not code — so the generative loop generate → see → grade vs intent → refine prompt → regenerate runs until it matches. The image generator is a hook you supply; AgentVision never bundles an image-gen dependency:
agentvision generate --generator mypkg.gen:make_image \
--brief "minimalist infographic, dark background, no typos" --max-iter 4 -o final.png
See docs/conformance.md. Express intent three ways — a free-text
brief (eyes extract the checklist), an explicit checklist (--expect, deterministic),
or a reference image (--reference). Claims are must: / should: / nice:.
Eyes → brain: the handoff
In anatomy the eyes are only the afferent half — the retina perceives, the optic nerve
carries the signal to the brain, the brain decides, the hand acts, the eyes look again.
AgentVision is that afferent pathway for an agent: it perceives and hands a clean signal back
to the brain (whatever does your reasoning/planning/memory) — it deliberately doesn't
decide for you. Any perception call distills to a Handoff:
agentvision analyze ./page.html --handoff
{ "perceived": "fail", "next_action": "revise", "matches_intent": false,
"todo": ["[overflow] hero text overflows on the right",
"[intent/must] a \"Checkout\" button is visible"],
"open_questions": ["Verify: uses the brand's dark theme"] }
next_action (done / revise / review) drives the brain's loop; todo is the work-list;
open_questions is what perception couldn't confirm (never dropped). Available as
report.to_handoff(), the MCP perceive_handoff tool, POST /handoff, and a handoff.json
per loop iteration — provider- and brain-agnostic. See docs/handoff.md.
Eyes & Brain — AgentVision × Verel
AgentVision is the eyes. It pairs with Verel, the brain — an agent framework where nothing is "done" until a grader returns a verdict. The eyes perceive and grade intent; the brain decides with attestation and compounds only verified work into memory; then the eyes look again.
They ship and version independently (pip install agentvision, pip install verel) yet work
in sync: AgentVision plugs into Verel as its verel.senses perception organ — mapped onto a
unified verdict bus (vision alongside tests, lint and types), with intent conformance
recorded in the brain's memory each iteration. AgentVision stays brain-agnostic; Verel is the
reference brain. See docs/handoff.md.
Many faces, one core
| Surface | Who it's for |
|---|---|
Library (import agentvision) |
Python apps, custom harnesses |
CLI (agentvision …) |
Any agent that can run a shell command; CI |
| Claude Code Skill | Claude agents — auto-invokes the loop before claiming done |
MCP server (agentvision-mcp) |
Cursor, Claude, any MCP-capable host |
REST service (agentvision-serve) |
Non-MCP / networked / CI agents |
| Integration recipes | Cursor rules, Aider, generic "agent contract" |
⚠️ "Provider-agnostic" describes the API surface, not behavior. The framework can't force a non-Claude agent into the loop — it gives every agent the means. The Claude Code Skill is the one surface that makes an agent use it proactively; MCP is the first-class cross-host path; the recipes cover the rest.
Vision backends
Pluggable and selectable via --backend / AGENTVISION_VISION_BACKEND:
anthropic(default modelclaude-haiku-4-5, upgradable to Sonnet/Opus)openai,geminilocal— CV/OCR heuristics only, no API key, no egress (great for CI / air-gapped)
Install
pip install "agentvision[all]" # everything
pip install "agentvision[render]" # just rendering + the no-key local loop
pip install "agentvision[render,anthropic]" # + Claude analysis
System dependencies (Chromium, Tesseract, poppler) and a doctor that checks them:
agentvision doctor # attempts a real Chromium launch; lists every missing lib
agentvision doctor --fix # installs the Chromium browser binary
On a bare RHEL/CentOS box, playwright install-deps does not work (apt-only). See
docs/quickstart.md for the dnf line, or use the bundled
Dockerfile which bakes the deps in.
Usage
# Analyze a file/URL/HTML string and print a structured report
agentvision analyze ./index.html --backend local --json
# Run the self-correcting loop
agentvision loop ./dashboard.html --max-iter 3
# Responsive contact sheet across breakpoints
agentvision sheet ./index.html --breakpoints 375,768,1280,1920
# Visual regression against a named baseline
agentvision baseline ./index.html --name home
agentvision regress ./index.html --name home
Live pages, SPAs & dashboards (polling, websockets, canvas/WebGL):
# localhost dev server, wait for the data to render, freeze animation, machine output
agentvision analyze http://localhost:5173 --allow-local \
--wait-for "#dashboard" --settle-ms 800 --quiet
--nav-wait defaults to load (polling pages never go idle); --freeze (default on) pauses
animations + requestAnimationFrame so canvas/WebGL pages capture without hanging; --quiet
prints only JSON (logs to stderr, exit codes 0 pass/warn · 2 fail · 3 error).
Library:
import asyncio
from agentvision import load_settings
from agentvision.core.loop import LoopSession
async def main():
settings = load_settings(vision_backend="local")
session = LoopSession("examples/broken_layout.html", settings=settings)
result = await session.iterate()
print(result.report.verdict, [i.message for i in result.report.issues])
asyncio.run(main())
Documentation
- Quickstart · The Loop · Conformance · Handoff (eyes→brain) · Backends · Adapters · Integrations · Vision
What we do not claim (honesty)
- Pixel-accurate vision-model bounding boxes (they're advisory).
- WCAG verdicts on rasterized non-HTML (heuristic only).
- Bit-reproducible screenshots / deterministic LLM reports.
- Uniform provider-agnostic behavior (only the API surface is uniform).
License
MIT © Amit Patole
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file agentvision-0.4.0.tar.gz.
File metadata
- Download URL: agentvision-0.4.0.tar.gz
- Upload date:
- Size: 2.4 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.11.21 {"installer":{"name":"uv","version":"0.11.21","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Rocky Linux","version":"9.5","id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1f66866c1e2803f862d018a16dcc7dc14093ef59951ba0f9b68817ca1b8dd850
|
|
| MD5 |
45ac350e41e6b463848752854bf453fb
|
|
| BLAKE2b-256 |
999225a9999a6f6309c004d31fd56a2efeb9a52845b1b210f4bffec8c3bb368d
|
File details
Details for the file agentvision-0.4.0-py3-none-any.whl.
File metadata
- Download URL: agentvision-0.4.0-py3-none-any.whl
- Upload date:
- Size: 89.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.11.21 {"installer":{"name":"uv","version":"0.11.21","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Rocky Linux","version":"9.5","id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ae5a0b8e45549a6047da6c973bf4d9ae92c76090a672762a5a836dad20824928
|
|
| MD5 |
e8ee2d840090d58448776f4de24eb2e9
|
|
| BLAKE2b-256 |
39fc53d7a456ecbe341a90f5c404a66f4e53fe9aeb0288edebac5da03bb68e20
|