Free-Text Promptable Universal 3D Medical Image Segmentation

These details have not been verified by PyPI

Project links

Project description

VoxTell: Free-Text Promptable Universal 3D Medical Image Segmentation

This repository contains the official implementation of our paper:

VoxTell: Free-Text Promptable Universal 3D Medical Image Segmentation

VoxTell is a 3D vision–language segmentation model that directly maps free-form text prompts, from single words to full clinical sentences, to volumetric masks. By leveraging multi-stage vision–language fusion, VoxTell achieves state-of-the-art performance on anatomical and pathological structures across CT, PET, and MRI modalities, excelling on familiar concepts while generalizing to related unseen classes.

Authors: Maximilian Rokuss*, Moritz Langenberg*, Yannick Kirchhoff, Fabian Isensee, Benjamin Hamm, Constantin Ulrich, Sebastian Regnery, Lukas Bauer, Efthimios Katsigiannopulos, Tobias Norajitra, Klaus Maier-Hein
Paper:

Overview

VoxTell is trained on a large-scale, multi-modality 3D medical imaging dataset, aggregating 158 public sources with over 62,000 volumetric images. The data covers:

Brain, head & neck, thorax, abdomen, pelvis
Musculoskeletal system and extremities
Vascular structures, major organs, substructures, and lesions

This rich semantic diversity enables language-conditioned 3D reasoning, allowing VoxTell to generate volumetric masks from flexible textual descriptions, from coarse anatomical labels to fine-grained pathological findings.

Architecture

VoxTell combines 3D image encoding with text-prompt embeddings and multi-stage vision–language fusion:

Image Encoder: Processes 3D volumetric input into latent feature representations
Prompt Encoder: We use the fozen Qwen3-Embedding-4B model to embed text prompts
Prompt Decoder: Transforms text queries and image latents into multi-scale text features
Image Decoder: Fuses visual and textual information at multiple resolutions using MaskFormer-style query-image fusion with deep supervision

🛠 Installation

1. Create a Virtual Environment

VoxTell supports Python 3.10+ and works with Conda, pip, or any other virtual environment manager. Here's an example using Conda:

conda create -n voxtell python=3.12
conda activate voxtell

2. Install PyTorch

[!WARNING] Temporary Compatibility Warning
There is a known issue with PyTorch 2.9.0 causing OOM errors during inference in nnInteractive (related to 3D convolutions — see the PyTorch issue here).
Until this is resolved, please use PyTorch 2.8.0 or earlier.

Install PyTorch compatible with your CUDA version. For example, for Ubuntu with a modern NVIDIA GPU:

pip install torch==2.8.0 torchvision==0.23.0 --index-url https://download.pytorch.org/whl/cu126

For other configurations (macOS, CPU, different CUDA versions), please refer to the PyTorch Get Started page.

Install via pip (you can also use uv):

pip install voxtell

or install directly from the repository:

git clone https://github.com/MIC-DKFZ/VoxTell
cd VoxTell
pip install -e .

🚀 Getting Started

Command-Line Interface (CLI)

VoxTell provides a convenient command-line interface for running predictions:

voxtell-predict -i input.nii.gz -o output_folder -m /path/to/model -p "liver" "spleen" "kidney"

Single prompt:

voxtell-predict -i case001.nii.gz -o output_folder -m /path/to/model -p "liver"
# Output: output_folder/case001_liver.nii.gz

Multiple prompts (saves individual files by default):

voxtell-predict -i case001.nii.gz -o output_folder -m /path/to/model -p "liver" "spleen" "right kidney"
# Outputs: 
#   output_folder/case001_liver.nii.gz
#   output_folder/case001_spleen.nii.gz
#   output_folder/case001_right_kidney.nii.gz

Save combined multi-label file:

voxtell-predict -i case001.nii.gz -o output_folder -m /path/to/model -p "liver" "spleen" --save-combined
# Output: output_folder/case001.nii.gz (multi-label: 1=liver, 2=spleen)
# ⚠️ WARNING: Overlapping structures will be overwritten by later prompts

CLI Options

Argument	Short	Required	Description
`--input`	`-i`	Yes	Path to input NIfTI file
`--output`	`-o`	Yes	Path to output folder
`--model`	`-m`	Yes	Path to VoxTell model directory
`--prompts`	`-p`	Yes	Text prompt(s) for segmentation
`--device`		No	Device to use: `cuda` (default) or `cpu`
`--gpu`		No	GPU device ID (default: 0)
`--save-combined`		No	Save multi-label file instead of individual files
`--verbose`		No	Enable verbose output

Python API

For more control or integration into Python workflows, use the Python API:

import torch
from voxtell.inference.predictor import VoxTellPredictor
from nnunetv2.imageio.nibabel_reader_writer import NibabelIOWithReorient

# Select device
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

# Load image
image_path = "/path/to/your/image.nii.gz"
img, _ = NibabelIOWithReorient().read_images([image_path])

# Define text prompts
text_prompts = ["liver", "right kidney", "left kidney", "spleen"]

# Initialize predictor
predictor = VoxTellPredictor(
      model_dir="/path/to/voxtell_model_directory",
      device=device,
)

# Run prediction
# Output shape: (num_prompts, x, y, z)
voxtell_seg = predictor.predict_single_image(img, text_prompts)

Optional: Visualize Results

You can visualize the segmentation results using napari:

pip install napari[all]

import napari
import numpy as np

# Create a napari viewer and add the original image
viewer = napari.Viewer() 
viewer.add_image(img, name='Image')

# Add segmentation results as label layers for each prompt
for i, prompt in enumerate(text_prompts):
      viewer.add_labels(voxtell_seg[i].astype(np.uint8), name=prompt)

# Run napari
napari.run()

Important: Image Orientation and Spacing

⚠️ Image Orientation (Critical): For correct anatomical localization (e.g., distinguishing left from right), images must be in RAS orientation. VoxTell was trained on data reoriented using this specific reader. Orientation mismatches can be a source of error. An easy way to test for this is if a simple prompt like "liver" fails and segments parts of the spleen instead. Make sure your image metadata is correct.
Image Spacing: The model does not resample images to a standardized spacing for faster inference. Performance may degrade on images with very uncommon voxel spacings (e.g., super high-resolution brain MRI). In such cases, consider resampling the image to a more typical clinical spacing (e.g., 1.5×1.5×1.5 mm³) before segmentation.

🗺️ Roadmap

Paper Published: arXiv:2511.11450
Code Release: Official implementation published
PyPI Package: Package downloadable via pip
Napari Plugin: Integration into the napari viewer
Model Release: Public availability of pretrained weights
Fine-Tuning: Support and scripts for custom fine-tuning

Citation

@misc{rokuss2025voxtell,
      title={VoxTell: Free-Text Promptable Universal 3D Medical Image Segmentation}, 
      author={Maximilian Rokuss and Moritz Langenberg and Yannick Kirchhoff and Fabian Isensee and Benjamin Hamm and Constantin Ulrich and Sebastian Regnery and Lukas Bauer and Efthimios Katsigiannopulos and Tobias Norajitra and Klaus Maier-Hein},
      year={2025},
      eprint={2511.11450},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2511.11450}, 
}

📬 Contact

For questions, issues, or collaborations, please contact:

📧 maximilian.rokuss@dkfz-heidelberg.de / moritz.langenberg@dkfz-heidelberg.de

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.0

Dec 17, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

voxtell-0.1.0.tar.gz (33.8 kB view details)

Uploaded Dec 17, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

voxtell-0.1.0-py3-none-any.whl (31.1 kB view details)

Uploaded Dec 17, 2025 Python 3

File details

Details for the file voxtell-0.1.0.tar.gz.

File metadata

Download URL: voxtell-0.1.0.tar.gz
Upload date: Dec 17, 2025
Size: 33.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for voxtell-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`361ce156c360c742d5106c8f32f3f48b21283a0d0bed36c11622846fb4074a47`
MD5	`13b774e4409ea097caaac090f237c1fd`
BLAKE2b-256	`5bb469da69f1b3d0351151639e0d80ebb8cfd4749b2ace0c17d7b0c47c9601bb`

See more details on using hashes here.

File details

Details for the file voxtell-0.1.0-py3-none-any.whl.

File metadata

Download URL: voxtell-0.1.0-py3-none-any.whl
Upload date: Dec 17, 2025
Size: 31.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for voxtell-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`d5d9bf0270ee0df25ce060e726e35d80bcec6329143cc324891f86e56d2e0a81`
MD5	`b01291fdfdb18e9b758d4efbac37d57c`
BLAKE2b-256	`f163a87c33eb3ef07fc279e3cbc9372670b2b16e08c6608c9526d8395589e62a`

See more details on using hashes here.

voxtell 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

VoxTell: Free-Text Promptable Universal 3D Medical Image Segmentation

VoxTell: Free-Text Promptable Universal 3D Medical Image Segmentation

Overview

Architecture

🛠 Installation

1. Create a Virtual Environment

2. Install PyTorch

🚀 Getting Started

Command-Line Interface (CLI)

CLI Options

Python API

Optional: Visualize Results

Important: Image Orientation and Spacing

🗺️ Roadmap

Citation

📬 Contact

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes