gaze-vlm

Grounded agentic framework for medical vision-language models: viewer-level image tools, multi-turn tool use, and PubMed/Open-i literature retrieval

These details have not been verified by PyPI

Project description

GAZE: Grounded Agentic Zero-shot Evaluation

GAZE (Grounded Agentic Zero-shot Evaluation) is a modular Python framework for multi-turn agentic vision-language model (VLM) systems, built for medical image analysis.

A radiologist rarely reads a scan in a single glance: they zoom, adjust the window, compare regions, and consult the literature before writing a report. A vision-language model, by contrast, reads an image once and produces text in a single forward pass. GAZE closes that gap by giving a VLM viewer-level tools (zoom, windowing, contrast, edge detection) and literature retrieval (PubMed, Open-i), then running it as a multi-turn loop with schema-validated outputs and full tool-call traces for auditability. It applies to any visual reasoning task, not only medical imaging.

Features

Multi-turn agentic loop -- JSON-structured tool-calling with configurable turn limits, schema validation, and automatic error recovery
25 built-in tools (23 visual + 2 search) -- visual manipulation (zoom, crop, contrast, threshold, flip, rotate, etc.) and literature/image retrieval (PubMed, Open-i)
Task processors -- abstract base class with dependency injection for prompts, schemas, and validation
Model adapters -- OpenAI API (including OpenRouter), LM Studio for local models, HuggingFace Transformers
Verifiers integration -- reward functions and multi-turn environments for RL training via verifiers

Tools at a glance

The model can call these during reasoning (multi-turn mode); the full set of 25 is in the tool reference.

Category	Representative tools
Inspect	`zoom`, `crop`, `rotate`, `flip_horizontal`
Enhance	`adjust_contrast`, `adjust_brightness`, `window_level`, `equalize_histogram`
Analyze	`threshold`, `detect_edges`, `morphological`, `symmetry_diff`
Retrieve	`search_web` (PubMed), `search_images` (Open-i)

Installation

pip install gaze-vlm

With extras for specific examples:

pip install gaze-vlm[nova]          # NOVA brain-MRI benchmark
pip install gaze-vlm[gemex]         # GEMeX visual grounding
pip install gaze-vlm[agentclinic]   # AgentClinic diagnostic reasoning
pip install gaze-vlm[pubmedqa]      # PubMedQA text-only QA
pip install gaze-vlm[vqa-rad]       # VQA-RAD radiology VQA
pip install gaze-vlm[medmarks]      # MedMarks-compatible NOVA environment
pip install gaze-vlm[verifiers]     # RL reward functions

For development:

git clone https://github.com/liamchalcroft/gaze.git
cd gaze
uv sync

Quick start

Subclass AgenticProcessorBase and implement four methods:

import asyncio
from pathlib import Path
from gaze import AgenticProcessorBase

class MyProcessor(AgenticProcessorBase):
    def get_system_prompt(self, images, metadata):
        return "You are a medical imaging expert."

    def get_user_message(self, images, metadata):
        return f"Analyze this scan. History: {metadata.get('history', '')}"

    def get_response_schema(self):
        return {
            "type": "json_schema",
            "json_schema": {
                "name": "analysis",
                "strict": True,
                "schema": {
                    "type": "object",
                    "properties": {
                        "findings": {"type": "string"},
                        "continue": {"type": "boolean"},
                    },
                    "required": ["findings", "continue"],
                    "additionalProperties": False,
                },
            },
        }

    def validate_response(self, response):
        return "findings" in response

async def main():
    # `async with` releases shared search/HTTP resources on exit.
    async with MyProcessor(model_name="openai/gpt-4o", use_tools=True) as processor:
        result = await processor.analyze(
            images=Path("scan.jpg"),
            metadata={"modality": "MRI", "history": "Patient presents with headache"},
        )
        print(result.final_response)

asyncio.run(main())

The model returns JSON each turn with "continue": true to keep reasoning or "continue": false when done. result.final_response is the validated JSON from the last turn, for example:

{"findings": "No acute intracranial abnormality.", "continue": false}

Architecture

src/gaze/
    base.py          AgenticProcessorBase -- subclass this
    types.py         ToolCall, ToolResult, Turn, AgenticResult (all frozen)
    config.py        Frozen dataclasses: GazeConfig, SearchConfig, etc.
    exceptions.py    GazeError hierarchy
    models/          AdapterProtocol, OpenAIAdapter, LMStudioAdapter, HuggingFaceAdapter
    tools/           Tool, ToolRegistry, 23 visual tools, 2 search tools
    retrieval/       PubMed (NCBI E-utilities), Open-i image search
    prompts/         Jinja2 templates via minijinja
    verifiers/       RL reward functions and multi-turn environments
    utils/           IoU, JSON extraction, type coercion, confidence clamping

The import path is gaze (the package lives under src/gaze/).

Examples

Five complete example applications are included:

Example	Task	Dataset
`nova/`	Brain MRI analysis (caption + diagnosis + localization)	c-i-ber/Nova
`gemex_thinkvg/`	Visual grounding with chain-of-thought	MIMIC-CXR (PhysioNet)
`agentclinic_nejm/`	Multi-turn diagnostic reasoning	AgentClinic NEJM
`pubmedqa/`	Medical Q&A (text-only)	PubMedQA
`vqa_rad/`	Radiology VQA	VQA-RAD

Each example includes a CLI, evaluation metrics, and run scripts for local models.

Local models (LM Studio)

All examples support local model inference via LM Studio:

uv run python -m examples.nova.src.cli \
  --model qwen3.5-a3b \
  --base-url http://localhost:1234/v1 \
  --mode single_turn \
  --max-samples 5

Environment variables

Variable	Required	Description
`OPENROUTER_API_KEY` or `OPENAI_API_KEY`	Yes (for cloud models)	Model API access
`NCBI_API_KEY`	No	Higher PubMed rate limits
`NCBI_EMAIL`	No	PubMed API compliance
`GAZE_ALLOW_CUSTOM_BASE_URL`	No	Set to `1` to send API keys to a non-allowlisted model host

Development

uv sync                          # Install dependencies
make check                       # Quality gate: lint + format + typecheck + lockfile + tests
make check-nova                  # Torch-gated + example tests (installs the nova extra)
uv run ruff check .              # Lint
uv run ruff format .             # Format
uv run pyright src/              # Type check
uv run pytest tests/ -x          # Run tests

Stability and versioning

GAZE follows Semantic Versioning. While the project is pre-1.0, minor releases may include breaking changes to the public API; each is recorded in the Changelog. The public API is the set of names exported from the top-level gaze package (gaze.__all__); anything underscore-prefixed or imported from a submodule is internal and may change without notice. From 1.0 onward, removals will ship with a deprecation warning for at least one minor release.

Documentation

Citation

If you use GAZE in your research, please cite:

@article{alim2026gaze,
  title   = {{GAZE}: Grounded Agentic Zero-shot Evaluation with Viewer-Level Tools and Literature Retrieval on Rare Brain {MRI}},
  author  = {Alim, Duaa and Alim, Mogtaba and Chalcroft, Liam},
  journal = {arXiv preprint arXiv:2605.00876},
  year    = {2026},
  note    = {Accepted at AIiH 2026},
}

The preprint is available at arXiv:2605.00876.

License

MIT

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.1.1

Jun 3, 2026

0.1.0

Jun 3, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gaze_vlm-0.1.1.tar.gz (304.2 kB view details)

Uploaded Jun 3, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

gaze_vlm-0.1.1-py3-none-any.whl (137.9 kB view details)

Uploaded Jun 3, 2026 Python 3

File details

Details for the file gaze_vlm-0.1.1.tar.gz.

File metadata

Download URL: gaze_vlm-0.1.1.tar.gz
Upload date: Jun 3, 2026
Size: 304.2 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.13

File hashes

Hashes for gaze_vlm-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`78de4f36b43da36ac9661784b8f2728ddbd920c9dd8ac528d1a086c81e460bdb`
MD5	`dc27d7595e744b47bd6cd3012fe1e6ba`
BLAKE2b-256	`563ec3cc352b7bf12868e887678960ba4e4f067762c2b99bd02299973706f197`

See more details on using hashes here.

Provenance

The following attestation bundles were made for gaze_vlm-0.1.1.tar.gz:

Publisher: publish.yml on liamchalcroft/gaze

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: gaze_vlm-0.1.1.tar.gz
- Subject digest: 78de4f36b43da36ac9661784b8f2728ddbd920c9dd8ac528d1a086c81e460bdb
- Sigstore transparency entry: 1712972635
- Sigstore integration time: Jun 3, 2026
Source repository:
- Permalink: liamchalcroft/gaze@e69f342dc13ff563839cee5140cc40939c70d0c3
- Branch / Tag: refs/tags/v0.1.1
- Owner: https://github.com/liamchalcroft
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@e69f342dc13ff563839cee5140cc40939c70d0c3
- Trigger Event: release

File details

Details for the file gaze_vlm-0.1.1-py3-none-any.whl.

File metadata

Download URL: gaze_vlm-0.1.1-py3-none-any.whl
Upload date: Jun 3, 2026
Size: 137.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.13

File hashes

Hashes for gaze_vlm-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`35c213719a6f553c9fece74f47a63862adad5807d547be4bcb6da54e14b9818c`
MD5	`30f36d694a3cdc48b75dfa075465a941`
BLAKE2b-256	`0f86213adcc862c4b9944dc6e0fb885c9468e6ec6ec05e62f451ff9bcc2f5854`

See more details on using hashes here.

Provenance

The following attestation bundles were made for gaze_vlm-0.1.1-py3-none-any.whl:

Publisher: publish.yml on liamchalcroft/gaze

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: gaze_vlm-0.1.1-py3-none-any.whl
- Subject digest: 35c213719a6f553c9fece74f47a63862adad5807d547be4bcb6da54e14b9818c
- Sigstore transparency entry: 1712972657
- Sigstore integration time: Jun 3, 2026
Source repository:
- Permalink: liamchalcroft/gaze@e69f342dc13ff563839cee5140cc40939c70d0c3
- Branch / Tag: refs/tags/v0.1.1
- Owner: https://github.com/liamchalcroft
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@e69f342dc13ff563839cee5140cc40939c70d0c3
- Trigger Event: release

gaze-vlm 0.1.1

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

Features

Tools at a glance

Installation

Quick start

Architecture

Examples

Local models (LM Studio)

Environment variables

Development

Stability and versioning

Documentation

Citation

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance