Skip to main content

Grounded agentic framework for medical vision-language models: viewer-level image tools, multi-turn tool use, and PubMed/Open-i literature retrieval

Project description

GAZE: Grounded Agentic Zero-shot Evaluation

CI PyPI Python 3.10+ License: MIT codecov OpenSSF Scorecard Docs

GAZE (Grounded Agentic Zero-shot Evaluation) is a modular Python framework for multi-turn agentic vision-language model (VLM) systems, built for medical image analysis.

A radiologist rarely reads a scan in a single glance: they zoom, adjust the window, compare regions, and consult the literature before writing a report. A vision-language model, by contrast, reads an image once and produces text in a single forward pass. GAZE closes that gap by giving a VLM viewer-level tools (zoom, windowing, contrast, edge detection) and literature retrieval (PubMed, Open-i), then running it as a multi-turn loop with schema-validated outputs and full tool-call traces for auditability. It applies to any visual reasoning task, not only medical imaging.

Features

  • Multi-turn agentic loop -- JSON-structured tool-calling with configurable turn limits, schema validation, and automatic error recovery
  • 25 built-in tools (23 visual + 2 search) -- visual manipulation (zoom, crop, contrast, threshold, flip, rotate, etc.) and literature/image retrieval (PubMed, Open-i)
  • Task processors -- abstract base class with dependency injection for prompts, schemas, and validation
  • Model adapters -- OpenAI API (including OpenRouter), LM Studio for local models, HuggingFace Transformers
  • Verifiers integration -- reward functions and multi-turn environments for RL training via verifiers

Tools at a glance

The model can call these during reasoning (multi-turn mode); the full set of 25 is in the tool reference.

Category Representative tools
Inspect zoom, crop, rotate, flip_horizontal
Enhance adjust_contrast, adjust_brightness, window_level, equalize_histogram
Analyze threshold, detect_edges, morphological, symmetry_diff
Retrieve search_web (PubMed), search_images (Open-i)

Installation

pip install gaze-vlm

With extras for specific examples:

pip install gaze-vlm[nova]          # NOVA brain-MRI benchmark
pip install gaze-vlm[gemex]         # GEMeX visual grounding
pip install gaze-vlm[agentclinic]   # AgentClinic diagnostic reasoning
pip install gaze-vlm[pubmedqa]      # PubMedQA text-only QA
pip install gaze-vlm[vqa-rad]       # VQA-RAD radiology VQA
pip install gaze-vlm[medmarks]      # MedMarks-compatible NOVA environment
pip install gaze-vlm[verifiers]     # RL reward functions

For development:

git clone https://github.com/liamchalcroft/gaze.git
cd gaze
uv sync

Quick start

Subclass AgenticProcessorBase and implement four methods:

import asyncio
from pathlib import Path
from gaze import AgenticProcessorBase

class MyProcessor(AgenticProcessorBase):
    def get_system_prompt(self, images, metadata):
        return "You are a medical imaging expert."

    def get_user_message(self, images, metadata):
        return f"Analyze this scan. History: {metadata.get('history', '')}"

    def get_response_schema(self):
        return {
            "type": "json_schema",
            "json_schema": {
                "name": "analysis",
                "strict": True,
                "schema": {
                    "type": "object",
                    "properties": {
                        "findings": {"type": "string"},
                        "continue": {"type": "boolean"},
                    },
                    "required": ["findings", "continue"],
                    "additionalProperties": False,
                },
            },
        }

    def validate_response(self, response):
        return "findings" in response

async def main():
    # `async with` releases shared search/HTTP resources on exit.
    async with MyProcessor(model_name="openai/gpt-4o", use_tools=True) as processor:
        result = await processor.analyze(
            images=Path("scan.jpg"),
            metadata={"modality": "MRI", "history": "Patient presents with headache"},
        )
        print(result.final_response)

asyncio.run(main())

The model returns JSON each turn with "continue": true to keep reasoning or "continue": false when done. result.final_response is the validated JSON from the last turn, for example:

{"findings": "No acute intracranial abnormality.", "continue": false}

Architecture

src/gaze/
    base.py          AgenticProcessorBase -- subclass this
    types.py         ToolCall, ToolResult, Turn, AgenticResult (all frozen)
    config.py        Frozen dataclasses: GazeConfig, SearchConfig, etc.
    exceptions.py    GazeError hierarchy
    models/          AdapterProtocol, OpenAIAdapter, LMStudioAdapter, HuggingFaceAdapter
    tools/           Tool, ToolRegistry, 23 visual tools, 2 search tools
    retrieval/       PubMed (NCBI E-utilities), Open-i image search
    prompts/         Jinja2 templates via minijinja
    verifiers/       RL reward functions and multi-turn environments
    utils/           IoU, JSON extraction, type coercion, confidence clamping

The import path is gaze (the package lives under src/gaze/).

Examples

Five complete example applications are included:

Example Task Dataset
nova/ Brain MRI analysis (caption + diagnosis + localization) c-i-ber/Nova
gemex_thinkvg/ Visual grounding with chain-of-thought MIMIC-CXR (PhysioNet)
agentclinic_nejm/ Multi-turn diagnostic reasoning AgentClinic NEJM
pubmedqa/ Medical Q&A (text-only) PubMedQA
vqa_rad/ Radiology VQA VQA-RAD

Each example includes a CLI, evaluation metrics, and run scripts for local models.

Local models (LM Studio)

All examples support local model inference via LM Studio:

uv run python -m examples.nova.src.cli \
  --model qwen3.5-a3b \
  --base-url http://localhost:1234/v1 \
  --mode single_turn \
  --max-samples 5

Environment variables

Variable Required Description
OPENROUTER_API_KEY or OPENAI_API_KEY Yes (for cloud models) Model API access
NCBI_API_KEY No Higher PubMed rate limits
NCBI_EMAIL No PubMed API compliance
GAZE_ALLOW_CUSTOM_BASE_URL No Set to 1 to send API keys to a non-allowlisted model host

Development

uv sync                          # Install dependencies
make check                       # Quality gate: lint + format + typecheck + lockfile + tests
make check-nova                  # Torch-gated + example tests (installs the nova extra)
uv run ruff check .              # Lint
uv run ruff format .             # Format
uv run pyright src/              # Type check
uv run pytest tests/ -x          # Run tests

Stability and versioning

GAZE follows Semantic Versioning. While the project is pre-1.0, minor releases may include breaking changes to the public API; each is recorded in the Changelog. The public API is the set of names exported from the top-level gaze package (gaze.__all__); anything underscore-prefixed or imported from a submodule is internal and may change without notice. From 1.0 onward, removals will ship with a deprecation warning for at least one minor release.

Documentation

Citation

If you use GAZE in your research, please cite:

@article{alim2026gaze,
  title   = {{GAZE}: Grounded Agentic Zero-shot Evaluation with Viewer-Level Tools and Literature Retrieval on Rare Brain {MRI}},
  author  = {Alim, Duaa and Alim, Mogtaba and Chalcroft, Liam},
  journal = {arXiv preprint arXiv:2605.00876},
  year    = {2026},
  note    = {Accepted at AIiH 2026},
}

The preprint is available at arXiv:2605.00876.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gaze_vlm-0.1.0.tar.gz (303.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

gaze_vlm-0.1.0-py3-none-any.whl (137.9 kB view details)

Uploaded Python 3

File details

Details for the file gaze_vlm-0.1.0.tar.gz.

File metadata

  • Download URL: gaze_vlm-0.1.0.tar.gz
  • Upload date:
  • Size: 303.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.13

File hashes

Hashes for gaze_vlm-0.1.0.tar.gz
Algorithm Hash digest
SHA256 bc81e09315dee9d6aaa1f58b2b79546b0d62f39d1bbfd1612af13fb28507d7bf
MD5 5b644a52b71091096db05d8f464961f7
BLAKE2b-256 52fbc40e270a1b9cbc799c80fa90cc57d889f9ddd0216bff32f5e7940b4e698a

See more details on using hashes here.

Provenance

The following attestation bundles were made for gaze_vlm-0.1.0.tar.gz:

Publisher: publish.yml on liamchalcroft/gaze

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file gaze_vlm-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: gaze_vlm-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 137.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.13

File hashes

Hashes for gaze_vlm-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 a5fa3646fa97b3f289494ea644ff9cf22098dfdf06f0e4be5a1d00cc39da24b4
MD5 0f0191e187183a7b8439e5faeed7c038
BLAKE2b-256 08d8391ab856a309e2a59cb22d7031559e23c481eb8559c340caa399ae2e1875

See more details on using hashes here.

Provenance

The following attestation bundles were made for gaze_vlm-0.1.0-py3-none-any.whl:

Publisher: publish.yml on liamchalcroft/gaze

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page