Merlin: A Vision Language Foundation Model for 3D Computed Tomography
Project description
Merlin: Vision Language Foundation Model for 3D Computed Tomography
Merlin is a 3D VLM for computed tomography that leverages both structured electronic health records (EHR) and unstructured radiology reports for pretraining.
⚡️ Installation
To install Merlin, you can simply run:
pip install merlin-vlm
For an editable installation, use the following commands to clone and install this repository.
git clone https://github.com/StanfordMIMI/Merlin.git
cd merlin
pip install -e .
🚀 Inference on a demo CT scan
import os
import warnings
import torch
from merlin.data import download_sample_data
from merlin.data import DataLoader
from merlin import Merlin
model = Merlin()
model.eval()
model.cuda()
data_dir = os.path.join(os.path.dirname(merlin.__file__), "abct_data")
cache_dir = data_dir.replace("abct_data", "abct_data_cache")
datalist = [
{
"image": download_sample_data(data_dir), # function returns local path to nifti file
"text": "Lower thorax: A small low-attenuating fluid structure is noted in the right cardiophrenic angle in keeping with a tiny pericardial cyst."
"Liver and biliary tree: Normal. Gallbladder: Normal. Spleen: Normal. Pancreas: Normal. Adrenal glands: Normal. "
"Kidneys and ureters: Symmetric enhancement and excretion of the bilateral kidneys, with no striated nephrogram to suggest pyelonephritis. "
"Urothelial enhancement bilaterally, consistent with urinary tract infection. No renal/ureteral calculi. No hydronephrosis. "
"Gastrointestinal tract: Normal. Normal gas-filled appendix. Peritoneal cavity: No free fluid. "
"Bladder: Marked urothelial enhancement consistent with cystitis. Uterus and ovaries: Normal. "
"Vasculature: Patent. Lymph nodes: Normal. Abdominal wall: Normal. "
"Musculoskeletal: Degenerative change of the spine.",
},
]
dataloader = DataLoader(
datalist=datalist,
cache_dir=cache_dir,
batchsize=8,
shuffle=True,
num_workers=0,
)
for batch in dataloader:
outputs = model(
batch["image"].to(device),
batch["text"]
)
print(f"\n================== Output Shapes ==================")
print(f"Contrastive image embeddings shape: {outputs[0].shape}")
print(f"Phenotype predictions shape: {outputs[1].shape}")
print(f"Contrastive text embeddings shape: {outputs[2].shape}")
📎 Citation
If you find this repository useful for your work, please cite the cite the original paper:
@article{blankemeier2024merlin,
title={Merlin: A vision language foundation model for 3d computed tomography},
author={Blankemeier, Louis and Cohen, Joseph Paul and Kumar, Ashwin and Van Veen, Dave and Gardezi, Syed Jamal Safdar and Paschali, Magdalini and Chen, Zhihong and Delbrouck, Jean-Benoit and Reis, Eduardo and Truyts, Cesar and others},
journal={Research Square},
pages={rs--3},
year={2024}
}
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file merlin_vlm-0.0.1.tar.gz.
File metadata
- Download URL: merlin_vlm-0.0.1.tar.gz
- Upload date:
- Size: 12.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.0.0 CPython/3.10.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8399fb238254d4dfac82716a299f69cfeaf4a4fca157c85c0a77c30adc8ca903
|
|
| MD5 |
f13b77c212ad72fe9e788f4c795292e1
|
|
| BLAKE2b-256 |
ced081234d07e205c06c46c0ca9420cf1b83c02efb82446b308ef0b967c45625
|
File details
Details for the file merlin_vlm-0.0.1-py3-none-any.whl.
File metadata
- Download URL: merlin_vlm-0.0.1-py3-none-any.whl
- Upload date:
- Size: 12.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.0.0 CPython/3.10.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c9675028940b48e9150f3b982a431657b07c3b5e8808cc320338fa34bd86a126
|
|
| MD5 |
354bd6b3c15bc74a074b493ca7869e39
|
|
| BLAKE2b-256 |
8d7b6e6d41095ee8169d7ad62780246976706db850b86a0ecfdab399965ad4fd
|