Skip to main content

A utility library for dataset generation and clustering

Project description

HEDGE: Hallucination Estimation via Dense Geometric Entropy

HEDGE provides the code and Python package that accompany the paper "HEDGE: Hallucination Estimation via Dense Geometric Entropy for Medical VQA with Vision-Language Models." The library offers utilities for sampling answers from multimodal models, clustering them through logical and embedding-based strategies, and computing hallucination detection metrics across benchmarks such as VQA-RAD and KvasirVQA.

Installation

The utilities published in this repository are available on PyPI as hedge-bench.

pip install hedge-bench

You can also install the package from source by pip install git+https://github.com/SushantGautam/HEDGE.git.

Quickstart

The snippet below shows a minimal end-to-end example adapted from tmp_test.py, demonstrating how to generate answer samples, apply both embedding- and NLI-based clustering, and evaluate hallucination detection metrics.

from datasets import load_dataset
from transformers import pipeline

from hedge_bench.utils import (
    PROMPT_VARIANTS,
    add_hallucination_labels_vllm,
    apply_nli_clustering,
    compute_roc_aucs,
    generate_and_cache_dataset,
    generate_answers,
    optimize_and_apply_embed_clustering,
)

# 1) Prepare a small VQA-RAD subset
n_samples = 3
vqa_dict = [
    {"idx": i, "image": sample["image"], "question": sample["question"], "answer": sample["answer"]}
    for i, sample in enumerate(load_dataset("flaviagiammarino/vqa-rad", split="test"))
][:10]

generated = generate_and_cache_dataset(
    dataset_id="vqa_rad_test",
    num_samples=n_samples,
    vqa_dict=vqa_dict,
    force_regenerate=False,
    n_jobs=40,
)

# 2) Sample answers from a vision-language model
answers = generate_answers(
    generated,
    n_answers_high=n_samples,
    min_temp=0.1,
    max_temp=1.0,
    prompt_variants=PROMPT_VARIANTS,
    model="Qwen/Qwen2.5-VL-7B-Instruct",
)

# 3) Label hallucinations using a VLM judge and cluster by embeddings
answers = add_hallucination_labels_vllm(answers)
answers_embed, threshold, _ = optimize_and_apply_embed_clustering(answers)

# 4) Optionally, also try clustering with an NLI model and compute ROC AUCs
nli = pipeline("text-classification", model="microsoft/deberta-large-mnli", top_k=None, truncation=True)
answers_clustered = apply_nli_clustering(answers_embed, nli, batch_size=768)

aucs = compute_roc_aucs(answers_clustered)
print(f"Embedding clustering optimal threshold = {threshold:.3f}")
print(aucs)

Project layout

  • hedge_bench/algorithms.py – reference implementations of uncertainty estimators, clustering strategies, and scoring utilities.
  • hedge_bench/utils.py – high-level helper functions for dataset caching, answer generation, labeling, and evaluation (as used in the quickstart example).

Citation

If you use HEDGE in your work, please cite the associated paper.

HEDGE: Hallucination Estimation via Dense Geometric Entropy for Medical VQA with Vision-Language Models

License

This project is released under the MIT License. See LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hedge_bench-0.1.2.tar.gz (14.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

hedge_bench-0.1.2-py3-none-any.whl (14.2 kB view details)

Uploaded Python 3

File details

Details for the file hedge_bench-0.1.2.tar.gz.

File metadata

  • Download URL: hedge_bench-0.1.2.tar.gz
  • Upload date:
  • Size: 14.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for hedge_bench-0.1.2.tar.gz
Algorithm Hash digest
SHA256 3ba73614b474bd58a4ebcf4bb47c51222f36fa5a0b6ac10f790aa86ed3f95bb6
MD5 126e81bf30d136359c7e1ebaf402d2f6
BLAKE2b-256 3649b80a16a803e996a3d417488ec01c86838198417c352c027fc137583071aa

See more details on using hashes here.

Provenance

The following attestation bundles were made for hedge_bench-0.1.2.tar.gz:

Publisher: publish.yml on SushantGautam/HEDGE

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file hedge_bench-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: hedge_bench-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 14.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for hedge_bench-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 5d745ccf7f605093c4ce32546779c01df50b281e3899f37b92635dab8c6e4304
MD5 1884c3d8062f8c86a7038fca139d2361
BLAKE2b-256 6fca818bd4eab500a13501b3bc376736e86e3b817463672cc7347c1216526021

See more details on using hashes here.

Provenance

The following attestation bundles were made for hedge_bench-0.1.2-py3-none-any.whl:

Publisher: publish.yml on SushantGautam/HEDGE

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page