Skip to main content

Grounded agentic framework for medical vision-language models: viewer-level image tools, multi-turn tool use, and PubMed/Open-i literature retrieval

Project description

GAZE: Grounded Agentic Zero-shot Evaluation

CI PyPI Python 3.10+ License: MIT codecov OpenSSF Scorecard Docs

GAZE (Grounded Agentic Zero-shot Evaluation) is a modular Python framework for multi-turn agentic vision-language model (VLM) systems, built for medical image analysis.

A radiologist rarely reads a scan in a single glance: they zoom, adjust the window, compare regions, and consult the literature before writing a report. A vision-language model, by contrast, reads an image once and produces text in a single forward pass. GAZE closes that gap by giving a VLM viewer-level tools (zoom, windowing, contrast, edge detection) and literature retrieval (PubMed, Open-i), then running it as a multi-turn loop with schema-validated outputs and full tool-call traces for auditability. It applies to any visual reasoning task, not only medical imaging.

Features

  • Multi-turn agentic loop -- JSON-structured tool-calling with configurable turn limits, schema validation, and automatic error recovery
  • 25 built-in tools (23 visual + 2 search) -- visual manipulation (zoom, crop, contrast, threshold, flip, rotate, etc.) and literature/image retrieval (PubMed, Open-i)
  • Task processors -- abstract base class with dependency injection for prompts, schemas, and validation
  • Model adapters -- OpenAI API (including OpenRouter), LM Studio for local models, HuggingFace Transformers
  • Verifiers integration -- reward functions and multi-turn environments for RL training via verifiers

Tools at a glance

The model can call these during reasoning (multi-turn mode); the full set of 25 is in the tool reference.

Category Representative tools
Inspect zoom, crop, rotate, flip_horizontal
Enhance adjust_contrast, adjust_brightness, window_level, equalize_histogram
Analyze threshold, detect_edges, morphological, symmetry_diff
Retrieve search_web (PubMed), search_images (Open-i)

Installation

pip install gaze-vlm

With extras for specific examples:

pip install gaze-vlm[nova]          # NOVA brain-MRI benchmark
pip install gaze-vlm[gemex]         # GEMeX visual grounding
pip install gaze-vlm[agentclinic]   # AgentClinic diagnostic reasoning
pip install gaze-vlm[pubmedqa]      # PubMedQA text-only QA
pip install gaze-vlm[vqa-rad]       # VQA-RAD radiology VQA
pip install gaze-vlm[medmarks]      # MedMarks-compatible NOVA environment
pip install gaze-vlm[verifiers]     # RL reward functions

For development:

git clone https://github.com/liamchalcroft/gaze.git
cd gaze
uv sync

Quick start

Subclass AgenticProcessorBase and implement four methods:

import asyncio
from pathlib import Path
from gaze import AgenticProcessorBase

class MyProcessor(AgenticProcessorBase):
    def get_system_prompt(self, images, metadata):
        return "You are a medical imaging expert."

    def get_user_message(self, images, metadata):
        return f"Analyze this scan. History: {metadata.get('history', '')}"

    def get_response_schema(self):
        return {
            "type": "json_schema",
            "json_schema": {
                "name": "analysis",
                "strict": True,
                "schema": {
                    "type": "object",
                    "properties": {
                        "findings": {"type": "string"},
                        "continue": {"type": "boolean"},
                    },
                    "required": ["findings", "continue"],
                    "additionalProperties": False,
                },
            },
        }

    def validate_response(self, response):
        return "findings" in response

async def main():
    # `async with` releases shared search/HTTP resources on exit.
    async with MyProcessor(model_name="openai/gpt-4o", use_tools=True) as processor:
        result = await processor.analyze(
            images=Path("scan.jpg"),
            metadata={"modality": "MRI", "history": "Patient presents with headache"},
        )
        print(result.final_response)

asyncio.run(main())

The model returns JSON each turn with "continue": true to keep reasoning or "continue": false when done. result.final_response is the validated JSON from the last turn, for example:

{"findings": "No acute intracranial abnormality.", "continue": false}

Architecture

src/gaze/
    base.py          AgenticProcessorBase -- subclass this
    types.py         ToolCall, ToolResult, Turn, AgenticResult (all frozen)
    config.py        Frozen dataclasses: GazeConfig, SearchConfig, etc.
    exceptions.py    GazeError hierarchy
    models/          AdapterProtocol, OpenAIAdapter, LMStudioAdapter, HuggingFaceAdapter
    tools/           Tool, ToolRegistry, 23 visual tools, 2 search tools
    retrieval/       PubMed (NCBI E-utilities), Open-i image search
    prompts/         Jinja2 templates via minijinja
    verifiers/       RL reward functions and multi-turn environments
    utils/           IoU, JSON extraction, type coercion, confidence clamping

The import path is gaze (the package lives under src/gaze/).

Examples

Five complete example applications are included:

Example Task Dataset
nova/ Brain MRI analysis (caption + diagnosis + localization) c-i-ber/Nova
gemex_thinkvg/ Visual grounding with chain-of-thought MIMIC-CXR (PhysioNet)
agentclinic_nejm/ Multi-turn diagnostic reasoning AgentClinic NEJM
pubmedqa/ Medical Q&A (text-only) PubMedQA
vqa_rad/ Radiology VQA VQA-RAD

Each example includes a CLI, evaluation metrics, and run scripts for local models.

Local models (LM Studio)

All examples support local model inference via LM Studio:

uv run python -m examples.nova.src.cli \
  --model qwen3.5-a3b \
  --base-url http://localhost:1234/v1 \
  --mode single_turn \
  --max-samples 5

Environment variables

Variable Required Description
OPENROUTER_API_KEY or OPENAI_API_KEY Yes (for cloud models) Model API access
NCBI_API_KEY No Higher PubMed rate limits
NCBI_EMAIL No PubMed API compliance
GAZE_ALLOW_CUSTOM_BASE_URL No Set to 1 to send API keys to a non-allowlisted model host

Development

uv sync                          # Install dependencies
make check                       # Quality gate: lint + format + typecheck + lockfile + tests
make check-nova                  # Torch-gated + example tests (installs the nova extra)
uv run ruff check .              # Lint
uv run ruff format .             # Format
uv run pyright src/              # Type check
uv run pytest tests/ -x          # Run tests

Stability and versioning

GAZE follows Semantic Versioning. While the project is pre-1.0, minor releases may include breaking changes to the public API; each is recorded in the Changelog. The public API is the set of names exported from the top-level gaze package (gaze.__all__); anything underscore-prefixed or imported from a submodule is internal and may change without notice. From 1.0 onward, removals will ship with a deprecation warning for at least one minor release.

Documentation

Citation

If you use GAZE in your research, please cite:

@article{alim2026gaze,
  title   = {{GAZE}: Grounded Agentic Zero-shot Evaluation with Viewer-Level Tools and Literature Retrieval on Rare Brain {MRI}},
  author  = {Alim, Duaa and Alim, Mogtaba and Chalcroft, Liam},
  journal = {arXiv preprint arXiv:2605.00876},
  year    = {2026},
  note    = {Accepted at AIiH 2026},
}

The preprint is available at arXiv:2605.00876.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gaze_vlm-0.1.1.tar.gz (304.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

gaze_vlm-0.1.1-py3-none-any.whl (137.9 kB view details)

Uploaded Python 3

File details

Details for the file gaze_vlm-0.1.1.tar.gz.

File metadata

  • Download URL: gaze_vlm-0.1.1.tar.gz
  • Upload date:
  • Size: 304.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.13

File hashes

Hashes for gaze_vlm-0.1.1.tar.gz
Algorithm Hash digest
SHA256 78de4f36b43da36ac9661784b8f2728ddbd920c9dd8ac528d1a086c81e460bdb
MD5 dc27d7595e744b47bd6cd3012fe1e6ba
BLAKE2b-256 563ec3cc352b7bf12868e887678960ba4e4f067762c2b99bd02299973706f197

See more details on using hashes here.

Provenance

The following attestation bundles were made for gaze_vlm-0.1.1.tar.gz:

Publisher: publish.yml on liamchalcroft/gaze

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file gaze_vlm-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: gaze_vlm-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 137.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.13

File hashes

Hashes for gaze_vlm-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 35c213719a6f553c9fece74f47a63862adad5807d547be4bcb6da54e14b9818c
MD5 30f36d694a3cdc48b75dfa075465a941
BLAKE2b-256 0f86213adcc862c4b9944dc6e0fb885c9468e6ec6ec05e62f451ff9bcc2f5854

See more details on using hashes here.

Provenance

The following attestation bundles were made for gaze_vlm-0.1.1-py3-none-any.whl:

Publisher: publish.yml on liamchalcroft/gaze

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page