Skip to main content

Semantic Consistency-Based Uncertainty Quantification for Radiology Report Generation

Project description

SCUQ-RRG

arXiv Citations License: MIT

Code for the NAACL 2025 paper "Semantic Consistency-Based Uncertainty Quantification for Factuality in Radiology Report Generation".

Install

pip install scuq-rrg

For full functionality (GREEN model and RadGraph):

git clone --recurse-submodules https://github.com/Heimerd1nger/SCUQ-RRG.git
cd SCUQ-RRG
pip install -e .
pip install -e third_party/GREEN/   # green_score (PyPI version has incompatible API)
pip install radgraph                # sentence-level UQ

Usage

Report-Level Uncertainty (VRO-GREEN)

Measures report-level factual uncertainty by comparing a greedy-decoded report against multiple sampled reports using the GREEN metric.

from scuq import ReportUncertaintyScorer

scorer = ReportUncertaintyScorer(
    model_id_or_path="StanfordAIMI/GREEN-radllama2-7b",
    cuda=True,
)

# greedy_report: the reference (greedy-decoded) report
# sampled_reports: list of 10 stochastically sampled reports
greedy_report = "The lungs are clear. No pleural effusion. Cardiomediastinal silhouette is normal."
sampled_reports = [
    "Lungs are clear bilaterally. No effusion or pneumothorax.",
    "Clear lungs. Heart size normal. No acute findings.",
    # ... (typically 10 samples)
]

result = scorer.score(greedy_report, sampled_reports)
print(f"Uncertainty: {result.uncertainty:.3f}")   # e.g. 0.596
print(f"Mean GREEN:  {result.mean_green:.3f}")    # e.g. 0.404

Sentence-Level Uncertainty (VRO-RadGraph)

Identifies the most uncertain sentence in a report using RadGraph entity consistency.

from scuq import SentenceUncertaintyScorer

scorer = SentenceUncertaintyScorer()

greedy_report = (
    "No pneumothorax. "
    "Possible left lower lobe opacity suggesting pneumonia. "
    "Mild cardiomegaly. "
    "No pleural effusion. "
    "Stable appearance compared to prior. "
    "No acute osseous abnormality."
)
sampled_reports = [
    "No pneumothorax or effusion. Heart size normal.",
    "Bilateral lungs clear. No acute findings.",
    # ...
]

result = scorer.score(greedy_report, sampled_reports)
# Per-sentence uncertainty scores (0 = certain, 1 = uncertain):
# [0.05, 0.60, 0.80, 0.40, 0.28, 0.10]
print(f"Most uncertain: '{result.flagged_sentence}'")
print(f"Sentence scores: {[round(s, 2) for s in result.uncertainty_scores]}")

Data Format

Experiments expect:

  • greedy_reports: list of N strings (greedy-decoded reports)
  • sampled_reports: list of N lists, each with 10 sampled strings

See example/example_data.ipynb for the exact pickle/CSV format used in experiments.

Demos

Running Experiments

Report Scores

python -m src.uq.VroGreen \
  --exp_name chexpert-plus \
  --chexpert_file_path data/batch_chexpert_mimix_cxr_num3858.pkl \
  --output_base_path results \
  --num_samples 3858 --batch_size 16

Sentence UQ

python -m src.uq.VroRadSent \
  --exp CheXpertPlus_mimiccxr \
  --chexpert_file data/batch_chexpert_mimix_cxr_num3858.pkl \
  --num_samples 3858 --output_dir results/exp_result

Abstention

python src/abstention/report_abstention.py \
  --exp ChexpertPlus \
  --green_scores_path data/green_scores-3858.pkl \
  --green_uncertainty_path results/chexpert-plus/green_uncertainty-3858.csv \
  --u_lexicalsim_path data/uq/lexicalUQ.csv \
  --output_base_path results

Calibration (RCE)

python src/misc/cal_rce.py \
  --scores_path data/green_scores-3858.pkl \
  --green_uncertainty_path results/chexpert-plus/green_uncertainty-3858.pkl \
  --u_nll_path data/uq/u_nll.csv \
  --u_lexicalsim_path data/uq/lexicalUQ.csv

Citation

@inproceedings{wang2025semantic,
  title={Semantic Consistency-Based Uncertainty Quantification for Factuality in Radiology Report Generation},
  author={Wang, Chenyu and Bhatt, Parth and Shrivastava, Harshit and Bittencourt, Lucas and Kalra, Mannudeep K. and Gichoya, Judy W. and Celi, Leo Anthony and Peng, Yuyin and Patel, Bhavik N. and Trivedi, Hari},
  booktitle={Proceedings of the 2025 Annual Conference of the North American Chapter of the Association for Computational Linguistics},
  year={2025}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scuq_rrg-1.0.0.tar.gz (15.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

scuq_rrg-1.0.0-py3-none-any.whl (15.7 kB view details)

Uploaded Python 3

File details

Details for the file scuq_rrg-1.0.0.tar.gz.

File metadata

  • Download URL: scuq_rrg-1.0.0.tar.gz
  • Upload date:
  • Size: 15.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for scuq_rrg-1.0.0.tar.gz
Algorithm Hash digest
SHA256 3b635b8980927293f9abb037f3ab9183a7092ea2568f569fdbd39649d5a926ba
MD5 d29228de0420335da3a5e64e6285d4b7
BLAKE2b-256 5149e2201bfb96e85a7214aa582beee79470fc17c76e5dc155e724d3efeb38b5

See more details on using hashes here.

Provenance

The following attestation bundles were made for scuq_rrg-1.0.0.tar.gz:

Publisher: publish.yml on Heimerd1nger/SCUQ-RRG

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file scuq_rrg-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: scuq_rrg-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 15.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for scuq_rrg-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 50eb76b7298cffc38183541b5f3f7d4d8ca05beff23b23bba2373b1255f91298
MD5 44aa2faa3e8cb14bd9c4261f597759a3
BLAKE2b-256 4e728efa67d2258e1cdedd641b5765338c4f81183b248ddb774178a71fd701f8

See more details on using hashes here.

Provenance

The following attestation bundles were made for scuq_rrg-1.0.0-py3-none-any.whl:

Publisher: publish.yml on Heimerd1nger/SCUQ-RRG

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page