Grounded agentic framework for medical vision-language models: viewer-level image tools, multi-turn tool use, and PubMed/Open-i literature retrieval
Project description
GAZE (Grounded Agentic Zero-shot Evaluation) is a modular Python framework for multi-turn agentic vision-language model (VLM) systems, built for medical image analysis.
A radiologist rarely reads a scan in a single glance: they zoom, adjust the window, compare regions, and consult the literature before writing a report. A vision-language model, by contrast, reads an image once and produces text in a single forward pass. GAZE closes that gap by giving a VLM viewer-level tools (zoom, windowing, contrast, edge detection) and literature retrieval (PubMed, Open-i), then running it as a multi-turn loop with schema-validated outputs and full tool-call traces for auditability. It applies to any visual reasoning task, not only medical imaging.
Features
- Multi-turn agentic loop -- JSON-structured tool-calling with configurable turn limits, schema validation, and automatic error recovery
- 25 built-in tools (23 visual + 2 search) -- visual manipulation (zoom, crop, contrast, threshold, flip, rotate, etc.) and literature/image retrieval (PubMed, Open-i)
- Task processors -- abstract base class with dependency injection for prompts, schemas, and validation
- Model adapters -- OpenAI API (including OpenRouter), LM Studio for local models, HuggingFace Transformers
- Verifiers integration -- reward functions and multi-turn environments for RL training via verifiers
Tools at a glance
The model can call these during reasoning (multi-turn mode); the full set of 25 is in the tool reference.
| Category | Representative tools |
|---|---|
| Inspect | zoom, crop, rotate, flip_horizontal |
| Enhance | adjust_contrast, adjust_brightness, window_level, equalize_histogram |
| Analyze | threshold, detect_edges, morphological, symmetry_diff |
| Retrieve | search_web (PubMed), search_images (Open-i) |
Installation
pip install gaze-vlm
With extras for specific examples:
pip install gaze-vlm[nova] # NOVA brain-MRI benchmark
pip install gaze-vlm[gemex] # GEMeX visual grounding
pip install gaze-vlm[agentclinic] # AgentClinic diagnostic reasoning
pip install gaze-vlm[pubmedqa] # PubMedQA text-only QA
pip install gaze-vlm[vqa-rad] # VQA-RAD radiology VQA
pip install gaze-vlm[medmarks] # MedMarks-compatible NOVA environment
pip install gaze-vlm[verifiers] # RL reward functions
For development:
git clone https://github.com/liamchalcroft/gaze.git
cd gaze
uv sync
Quick start
Subclass AgenticProcessorBase and implement four methods:
import asyncio
from pathlib import Path
from gaze import AgenticProcessorBase
class MyProcessor(AgenticProcessorBase):
def get_system_prompt(self, images, metadata):
return "You are a medical imaging expert."
def get_user_message(self, images, metadata):
return f"Analyze this scan. History: {metadata.get('history', '')}"
def get_response_schema(self):
return {
"type": "json_schema",
"json_schema": {
"name": "analysis",
"strict": True,
"schema": {
"type": "object",
"properties": {
"findings": {"type": "string"},
"continue": {"type": "boolean"},
},
"required": ["findings", "continue"],
"additionalProperties": False,
},
},
}
def validate_response(self, response):
return "findings" in response
async def main():
# `async with` releases shared search/HTTP resources on exit.
async with MyProcessor(model_name="openai/gpt-4o", use_tools=True) as processor:
result = await processor.analyze(
images=Path("scan.jpg"),
metadata={"modality": "MRI", "history": "Patient presents with headache"},
)
print(result.final_response)
asyncio.run(main())
The model returns JSON each turn with "continue": true to keep reasoning or "continue": false when done. result.final_response is the validated JSON from the last turn, for example:
{"findings": "No acute intracranial abnormality.", "continue": false}
Architecture
src/gaze/
base.py AgenticProcessorBase -- subclass this
types.py ToolCall, ToolResult, Turn, AgenticResult (all frozen)
config.py Frozen dataclasses: GazeConfig, SearchConfig, etc.
exceptions.py GazeError hierarchy
models/ AdapterProtocol, OpenAIAdapter, LMStudioAdapter, HuggingFaceAdapter
tools/ Tool, ToolRegistry, 23 visual tools, 2 search tools
retrieval/ PubMed (NCBI E-utilities), Open-i image search
prompts/ Jinja2 templates via minijinja
verifiers/ RL reward functions and multi-turn environments
utils/ IoU, JSON extraction, type coercion, confidence clamping
The import path is gaze (the package lives under src/gaze/).
Examples
Five complete example applications are included:
| Example | Task | Dataset |
|---|---|---|
nova/ |
Brain MRI analysis (caption + diagnosis + localization) | c-i-ber/Nova |
gemex_thinkvg/ |
Visual grounding with chain-of-thought | MIMIC-CXR (PhysioNet) |
agentclinic_nejm/ |
Multi-turn diagnostic reasoning | AgentClinic NEJM |
pubmedqa/ |
Medical Q&A (text-only) | PubMedQA |
vqa_rad/ |
Radiology VQA | VQA-RAD |
Each example includes a CLI, evaluation metrics, and run scripts for local models.
Local models (LM Studio)
All examples support local model inference via LM Studio:
uv run python -m examples.nova.src.cli \
--model qwen3.5-a3b \
--base-url http://localhost:1234/v1 \
--mode single_turn \
--max-samples 5
Environment variables
| Variable | Required | Description |
|---|---|---|
OPENROUTER_API_KEY or OPENAI_API_KEY |
Yes (for cloud models) | Model API access |
NCBI_API_KEY |
No | Higher PubMed rate limits |
NCBI_EMAIL |
No | PubMed API compliance |
GAZE_ALLOW_CUSTOM_BASE_URL |
No | Set to 1 to send API keys to a non-allowlisted model host |
Development
uv sync # Install dependencies
make check # Quality gate: lint + format + typecheck + lockfile + tests
make check-nova # Torch-gated + example tests (installs the nova extra)
uv run ruff check . # Lint
uv run ruff format . # Format
uv run pyright src/ # Type check
uv run pytest tests/ -x # Run tests
Stability and versioning
GAZE follows Semantic Versioning. While the project is pre-1.0, minor releases may include breaking changes to the public API; each is recorded in the Changelog. The public API is the set of names exported from the top-level gaze package (gaze.__all__); anything underscore-prefixed or imported from a submodule is internal and may change without notice. From 1.0 onward, removals will ship with a deprecation warning for at least one minor release.
Documentation
- Documentation site
- Getting started
- Tool reference
- Configuration
- Verifiers integration
- Contributing
- Changelog
Citation
If you use GAZE in your research, please cite:
@article{alim2026gaze,
title = {{GAZE}: Grounded Agentic Zero-shot Evaluation with Viewer-Level Tools and Literature Retrieval on Rare Brain {MRI}},
author = {Alim, Duaa and Alim, Mogtaba and Chalcroft, Liam},
journal = {arXiv preprint arXiv:2605.00876},
year = {2026},
note = {Accepted at AIiH 2026},
}
The preprint is available at arXiv:2605.00876.
License
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file gaze_vlm-0.1.0.tar.gz.
File metadata
- Download URL: gaze_vlm-0.1.0.tar.gz
- Upload date:
- Size: 303.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bc81e09315dee9d6aaa1f58b2b79546b0d62f39d1bbfd1612af13fb28507d7bf
|
|
| MD5 |
5b644a52b71091096db05d8f464961f7
|
|
| BLAKE2b-256 |
52fbc40e270a1b9cbc799c80fa90cc57d889f9ddd0216bff32f5e7940b4e698a
|
Provenance
The following attestation bundles were made for gaze_vlm-0.1.0.tar.gz:
Publisher:
publish.yml on liamchalcroft/gaze
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
gaze_vlm-0.1.0.tar.gz -
Subject digest:
bc81e09315dee9d6aaa1f58b2b79546b0d62f39d1bbfd1612af13fb28507d7bf - Sigstore transparency entry: 1712839743
- Sigstore integration time:
-
Permalink:
liamchalcroft/gaze@bb2782df6e762c6eb82d376a24a85767407cf595 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/liamchalcroft
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@bb2782df6e762c6eb82d376a24a85767407cf595 -
Trigger Event:
release
-
Statement type:
File details
Details for the file gaze_vlm-0.1.0-py3-none-any.whl.
File metadata
- Download URL: gaze_vlm-0.1.0-py3-none-any.whl
- Upload date:
- Size: 137.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a5fa3646fa97b3f289494ea644ff9cf22098dfdf06f0e4be5a1d00cc39da24b4
|
|
| MD5 |
0f0191e187183a7b8439e5faeed7c038
|
|
| BLAKE2b-256 |
08d8391ab856a309e2a59cb22d7031559e23c481eb8559c340caa399ae2e1875
|
Provenance
The following attestation bundles were made for gaze_vlm-0.1.0-py3-none-any.whl:
Publisher:
publish.yml on liamchalcroft/gaze
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
gaze_vlm-0.1.0-py3-none-any.whl -
Subject digest:
a5fa3646fa97b3f289494ea644ff9cf22098dfdf06f0e4be5a1d00cc39da24b4 - Sigstore transparency entry: 1712839796
- Sigstore integration time:
-
Permalink:
liamchalcroft/gaze@bb2782df6e762c6eb82d376a24a85767407cf595 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/liamchalcroft
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@bb2782df6e762c6eb82d376a24a85767407cf595 -
Trigger Event:
release
-
Statement type: