Skip to main content

Deep learning tools for digital histology

Project description

slideflow logo DOI Python application PyPI version

ArXiv | Docs | Slideflow Studio | Cite

🔬 Overview

Slideflow Studio: a visualization tool for interacting with models and whole-slide images.

Slideflow is a deep learning library for digital pathology, offering a user-friendly interface for model development.

Designed at University of Chicago for both medical researchers and AI enthusiasts, the goal of Slideflow is to provide an accessible, easy-to-use interface for developing state-of-the-art pathology models. Slideflow has been built with the future in mind, offering a scalable platform for digital biomarker development that bridges the gap between ever-evolving, sophisticated methods and the needs of a clinical researcher. For developers, Slideflow provides multiple endpoints for integration with other packages and external training paradigms, allowing you to leverage highly optimized, pathology-specific processes with the latest ML methodologies.

🚀 Features

Full documentation with example tutorials can be found at slideflow.dev.

Requirements

  • Python >= 3.7 (<3.10 if using cuCIM)
  • Tensorflow 2.5-2.11 or PyTorch 1.9-2.1
    • GAN functions require PyTorch <1.13

Optional

  • Libvips >= 8.9 (alternative slide reader, adds support for *.scn, *.mrxs, *.ndpi, *.vms, and *.vmu files).
  • Linear solver (for preserved-site cross-validation)

📥 Installation

Slideflow can be installed with PyPI, as a Docker container, or run from source.

Method 1: Install via pip

pip3 install --upgrade setuptools pip wheel
pip3 install slideflow[cucim] cupy-cuda11x

The cupy package name depends on the installed CUDA version; see here for installation instructions. cupy is not required if using Libvips.

Method 2: Docker image

Alternatively, pre-configured docker images are available with OpenSlide/Libvips and the latest version of either Tensorflow and PyTorch. To install with the Tensorflow backend:

docker pull jamesdolezal/slideflow:latest-tf
docker run -it --gpus all jamesdolezal/slideflow:latest-tf

To install with the PyTorch backend:

docker pull jamesdolezal/slideflow:latest-torch
docker run -it --shm-size=2g --gpus all jamesdolezal/slideflow:latest-torch

Method 3: From source

To run from source, clone this repository, install the conda development environment, and build a wheel:

git clone https://github.com/jamesdolezal/slideflow
cd slideflow
conda env create -f environment.yml
conda activate slideflow
python setup.py bdist_wheel
pip install dist/slideflow* cupy-cuda11x

⚙️ Configuration

Deep learning (PyTorch vs. Tensorflow)

Slideflow supports both PyTorch and Tensorflow, defaulting to PyTorch if both are available. You can specify the backend to use with the environmental variable SF_BACKEND. For example:

export SF_BACKEND=tensorflow

Slide reading (cuCIM vs. Libvips)

By default, Slideflow reads whole-slide images using cuCIM. Although much faster than other openslide-based frameworks, it supports fewer slide scanner formats. Slideflow also includes a Libvips backend, which adds support for *.scn, *.mrxs, *.ndpi, *.vms, and *.vmu files. You can set the active slide backend with the environmental variable SF_SLIDE_BACKEND:

export SF_SLIDE_BACKEND=libvips

Getting started

Slideflow experiments are organized into Projects, which supervise storage of whole-slide images, extracted tiles, and patient-level annotations. The fastest way to get started is to use one of our preconfigured projects, which will automatically download slides from the Genomic Data Commons:

import slideflow as sf

P = sf.create_project(
    root='/project/destination',
    cfg=sf.project.LungAdenoSquam,
    download=True
)

After the slides have been downloaded and verified, you can skip to Extract tiles from slides.

Alternatively, to create a new custom project, supply the location of patient-level annotations (CSV), slides, and a destination for TFRecords to be saved:

import slideflow as sf
P = sf.create_project(
  '/project/path',
  annotations="/patient/annotations.csv",
  slides="/slides/directory",
  tfrecords="/tfrecords/directory"
)

Ensure that the annotations file has a slide column for each annotation entry with the filename (without extension) of the corresponding slide.

Extract tiles from slides

Next, whole-slide images are segmented into smaller image tiles and saved in *.tfrecords format. Extract tiles from slides at a given magnification (width in microns size) and resolution (width in pixels) using sf.Project.extract_tiles():

P.extract_tiles(
  tile_px=299,  # Tile size, in pixels
  tile_um=302   # Tile size, in microns
)

If slides are on a network drive or a spinning HDD, tile extraction can be accelerated by buffering slides to a SSD or ramdisk:

P.extract_tiles(
  ...,
  buffer="/mnt/ramdisk"
)

Training models

Once tiles are extracted, models can be trained. Start by configuring a set of hyperparameters:

params = sf.ModelParams(
  tile_px=299,
  tile_um=302,
  batch_size=32,
  model='xception',
  learning_rate=0.0001,
  ...
)

Models can then be trained using these parameters. Models can be trained to categorical, multi-categorical, continuous, or time-series outcomes, and the training process is highly configurable. For example, to train models in cross-validation to predict the outcome 'category1' as stored in the project annotations file:

P.train(
  'category1',
  params=params,
  save_predictions=True,
  multi_gpu=True
)

Evaluation, heatmaps, mosaic maps, and more

Slideflow includes a host of additional tools, including model evaluation and prediction, heatmaps, analysis of layer activations, mosaic maps, and more. See our full documentation for more details and tutorials.

📚 Publications

Slideflow has been used by:

🔓 License

This code is made available under the GPLv3 License and is available for non-commercial academic purposes.

🔗 Reference

If you find our work useful for your research, or if you use parts of this code, please consider citing as follows:

Dolezal, J. M., Kochanny, S., Dyer, E., et al. Slideflow: Deep Learning for Digital Histopathology with Real-Time Whole-Slide Visualization. ArXiv [q-Bio.QM] (2023). http://arxiv.org/abs/2304.04142

@misc{dolezal2023slideflow,
      title={Slideflow: Deep Learning for Digital Histopathology with Real-Time Whole-Slide Visualization},
      author={James M. Dolezal and Sara Kochanny and Emma Dyer and Andrew Srisuwananukorn and Matteo Sacco and Frederick M. Howard and Anran Li and Prajval Mohan and Alexander T. Pearson},
      year={2023},
      eprint={2304.04142},
      archivePrefix={arXiv},
      primaryClass={q-bio.QM}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

slideflow-2.2.1.post1-py3-none-any.whl (2.0 MB view details)

Uploaded Python 3

File details

Details for the file slideflow-2.2.1.post1-py3-none-any.whl.

File metadata

File hashes

Hashes for slideflow-2.2.1.post1-py3-none-any.whl
Algorithm Hash digest
SHA256 10d4842725f4647d4a293cff1c5a88e7797fb0d734fb742753f939cacaeb379b
MD5 4beec8a1718daff54f5f28ab09c7d4b3
BLAKE2b-256 4b8af2c9a890ebdac10e87ddb3800fe957dc5bf371e37e277e8d18f51bab4338

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page