Skip to main content

Cell and nucleus segmentation for whole slide images (H&E and MIF)

Project description

VitaminP: Cell & Nuclei Segmentation for H&E and Multiplex IF

PyPI version Python 3.8+ PyTorch License

VitaminP is a cross-modal deep learning framework for cell and nuclei segmentation in H&E and multiplex immunofluorescence (MIF) whole slide images. Built on DINOv2 vision transformers, it learns from paired H&E–MIF data to infer cytoplasmic boundaries that are invisible in standard brightfield microscopy — enabling whole-cell segmentation directly from H&E.

Trained on 14 public datasets across 34 cancer types and 7M+ annotated instances.


📦 Installation

pip install vitaminp

⚠️ If you see a NumPy/OpenCV conflict: pip install "numpy<2" --force-reinstall


🗺️ Which Model Should I Use?

Model Input Best For Speed
flex H&E or MIF (any channel) General purpose — most users start here ⚡⚡⚡ Fastest
dual H&E + MIF (paired) Best whole-cell accuracy when both modalities available ⚡⚡
syn H&E only H&E whole-cell when no MIF available ⚡⚡

What branch should I run?

Goal branches=
Nuclei only ['he_nuclei']
Cells only ['he_cell']
Both (recommended) ['he_nuclei', 'he_cell'] — nuclei constrain cells for better accuracy

🚀 Quick Start

import vitaminp

model = vitaminp.load_model('flex')   # downloads once, cached forever
vitaminp.available_models()           # list all models

📖 Usage

1. Flex — General Purpose (H&E or MIF)

H&E input (most common):

import vitaminp
from vitaminp.inference import WSIPredictor

model = vitaminp.load_model('flex', device='cuda')

predictor = WSIPredictor(
    model=model,
    device='cuda',
    patch_size=512,
    overlap=64,
    target_mpp=0.4250,
    magnification=20,
    batch_size=32,        # lower to 4-8 if out of memory
    tissue_dilation=1,
)

results = predictor.predict(
    wsi_path='slide.svs',
    output_dir='results/',
    branches=['he_nuclei', 'he_cell'],  # or just ['he_nuclei'] or ['he_cell']
    filter_tissue=True,
    tissue_threshold=0.10,
    clean_overlaps=True,
    save_geojson=True,
    min_area_um=10.0,
)

print(f"✅ Nuclei: {results['he_nuclei']['num_detections']}")
print(f"✅ Cells:  {results['he_cell']['num_detections']}")

MIF input — set channel config so the model knows which channels are nucleus vs membrane:

import vitaminp
from vitaminp.inference import WSIPredictor
from vitaminp.inference.channel_config import ChannelConfig

model = vitaminp.load_model('flex', device='cuda')

config = ChannelConfig(
    nuclear_channel=2,                  # e.g. DAPI
    membrane_channel=[0, 1],            # e.g. cell markers
    membrane_combination='max',
    channel_names={0: 'CellMarker1', 1: 'CellMarker2', 2: 'DAPI'}
)

predictor = WSIPredictor(
    model=model,
    device='cuda',
    patch_size=512,
    overlap=64,
    target_mpp=0.4250,
    magnification=20,
    mif_channel_config=config,          # required for MIF input
    batch_size=16,
)

results = predictor.predict(
    wsi_path='mif_image.tif',
    output_dir='results/',
    branches=['he_nuclei', 'he_cell'],
    filter_tissue=True,
    clean_overlaps=True,
    save_geojson=True,
    save_visualization=True,
    detection_threshold=0.2,
    min_area_um=5.0,
)

print(f"✅ Nuclei: {results['he_nuclei']['num_detections']}")
print(f"✅ Cells:  {results['he_cell']['num_detections']}")

2. Dual — Paired H&E + MIF (best whole-cell accuracy)

Use this when you have co-registered H&E and MIF from the same tissue section. The model fuses both signals to resolve cytoplasmic boundaries that are ambiguous in H&E alone.

import vitaminp
from vitaminp.inference import WSIPredictor
from vitaminp.inference.channel_config import ChannelConfig

model = vitaminp.load_model('dual', device='cuda')

config = ChannelConfig(
    nuclear_channel=2,
    membrane_channel=[0, 1],
    membrane_combination='max',
    channel_names={0: 'CellMarker1', 1: 'CellMarker2', 2: 'DAPI'}
)

predictor = WSIPredictor(
    model=model,
    device='cuda',
    patch_size=512,
    overlap=64,
    target_mpp=0.4250,
    magnification=20,
    mif_channel_config=config,
    batch_size=4,
)

results = predictor.predict(
    wsi_path='he_image.png',            # H&E
    wsi_path_mif='mif_image.png',       # co-registered MIF
    output_dir='results/',
    branches=['he_nuclei', 'he_cell'],
    filter_tissue=True,
    clean_overlaps=True,
    save_geojson=True,
    save_visualization=True,
    detection_threshold=0.2,
    min_area_um=5.0,
)

print(f"✅ H&E nuclei: {results['he_nuclei']['num_detections']}")
print(f"✅ H&E cells:  {results['he_cell']['num_detections']}")

📊 Output Files

results/
├── he_nuclei_detections.geojson    # QuPath-compatible annotations
├── he_cell_detections.geojson
├── he_nuclei_boundaries.png        # Visualization overlay
└── he_nuclei_centroids.csv         # Centroid coordinates

GeoJSON output is directly compatible with QuPath.


🎯 Common Recipes

Batch Processing

import glob
from pathlib import Path
import vitaminp
from vitaminp.inference import WSIPredictor

model = vitaminp.load_model('flex', device='cuda')
predictor = WSIPredictor(model=model, device='cuda', batch_size=32)

for slide_path in glob.glob('slides/*.svs'):
    name = Path(slide_path).stem
    results = predictor.predict(
        wsi_path=slide_path,
        output_dir=f'results/{name}',
        branches=['he_nuclei', 'he_cell'],
        save_geojson=True,
        min_area_um=10.0,
    )
    print(f"{name}: {results['he_nuclei']['num_detections']} nuclei")

Image Without MPP Metadata

results = predictor.predict(
    wsi_path='image.png',
    mpp_override=0.4250,
    branches=['he_nuclei'],
)

🔧 Troubleshooting

Problem Fix
CUDA out of memory Lower batch_size to 4–8
No MPP in metadata Add mpp_override=0.4250
Too many false positives Increase detection_threshold=0.7, min_area_um=10.0
NumPy/OpenCV error pip install "numpy<2" --force-reinstall
MIF channels wrong Set mif_channel_config with correct channel indices

📚 Citation

If you use VitaminP in your research, please cite:

@article{shokrollahi2025vitaminp,
  title   = {Vitamin-P: vision transformer assisted multi-modality integration 
             network for pathology cell segmentation},
  author  = {Shokrollahi, Yasin and Pinao Gonzales, Karina and Barrientos Toro, Elizve 
             and Acosta, Paul and Chen, Pingjun and Yuan, Yinyin and Pan, Xiaoxi},
  journal = {arXiv},
  year    = {2025}
}

📄 License

MIT License — see LICENSE file.

🙋 Support

  • 🐛 Issues: GitHub Issues
  • 📧 Contact: MD Anderson Cancer Center — Department of Translational Molecular Pathology

Made with ❤️ at MD Anderson Cancer Center

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vitaminp-0.3.0.tar.gz (1.9 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

vitaminp-0.3.0-py3-none-any.whl (2.0 MB view details)

Uploaded Python 3

File details

Details for the file vitaminp-0.3.0.tar.gz.

File metadata

  • Download URL: vitaminp-0.3.0.tar.gz
  • Upload date:
  • Size: 1.9 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.12

File hashes

Hashes for vitaminp-0.3.0.tar.gz
Algorithm Hash digest
SHA256 5dc13df91fb4c24e0b966f4177962ae6a8f275c4ecba873c33f0aa16203a85c2
MD5 305ed0ff1a35b0875c9ae090b71d2e68
BLAKE2b-256 ab92cf3ff9f944e12fb3149a7dcb16a49598939948e2c4c62248bc4619067fef

See more details on using hashes here.

File details

Details for the file vitaminp-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: vitaminp-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 2.0 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.12

File hashes

Hashes for vitaminp-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 e4769067bd2cc94dc552af599c5180aba7dd07ee9f0f00e29c10b49f7b81b8da
MD5 ab77b52b054afae4cd6f7f2127e6dd9e
BLAKE2b-256 03ab90e52b75c3384a60261e56b8e557da611b8ce76aad35c164ecd2a2eadddf

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page