slideflow

Deep learning tools for digital histology

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

slideflow logo

Slideflow is a deep learning library for digital pathology that provides a unified API for building, training, and testing models using Tensorflow or PyTorch.

Slideflow includes tools for whole-slide image processing and tile extraction, customizable deep learning model training with dozens of supported architectures, explainability tools including heatmaps, mosaic maps, GANs, and saliency maps, analysis of activations from model layers, uncertainty quantification, and more. A variety of fast, optimized whole-slide image processing tools are included, including background filtering, blur/artifact detection, stain normalization, and efficient storage in *.tfrecords format. Model training is easy and highly configurable, with an easy drop-in API for training custom architectures. For external training loops, Slideflow can be used as an image processing backend, serving an optimized tf.data.Dataset or torch.utils.data.DataLoader to read and process slide images and perform real-time stain normalization.

workbench preview Slideflow Workbench: a visualization tool for interacting with models and whole-slide images, new in version 1.3.

Slideflow has been used by:

Dolezal et al, Modern Pathology, 2020
Rosenberg et al, Journal of Clinical Oncology [abstract], 2020
Howard et al, Nature Communications, 2021
Dolezal et al Nature Communications, 2022
Storozuk et al, Modern Pathology [abstract], 2022
Partin et al [arXiv], 2022
Dolezal et al [abstract], 2022
Howard et al [bioRxiv], 2022
Dolezal et al [arXiv], 2022

Full documentation with example tutorials can be found at slideflow.dev.

Requirements

Python >= 3.7 (<3.10 if using cuCIM)
Tensorflow 2.5-2.9 or PyTorch 1.9-1.12

Optional

Libvips >= 8.9 (alternative slide reader, adds support for *.scn, *.mrxs, *.ndpi, *.vms, and *.vmu files).
QuPath (for pathologist ROIs)
Linear solver (for preserved-site cross-validation)
- CPLEX 20.1.0 with Python API
- or Pyomo with Bonmin solver

Installation

Slideflow can be installed with PyPI, as a Docker container, or run from source.

Method 1: Install via pip

pip3 install --upgrade setuptools pip wheel
pip3 install slideflow[cucim] cupy-cuda11x

The cupy package name depends on the installed CUDA version; see here for installation instructions. cupy is not required if using Libvips.

Method 2: Docker image

Alternatively, pre-configured docker images are available with OpenSlide/Libvips and the latest version of either Tensorflow and PyTorch. To install with the Tensorflow backend:

docker pull jamesdolezal/slideflow:latest-tf
docker run -it --gpus all jamesdolezal/slideflow:latest-tf

To install with the PyTorch backend:

docker pull jamesdolezal/slideflow:latest-torch
docker run -it --shm-size=2g --gpus all jamesdolezal/slideflow:latest-torch

Method 3: From source

To run from source, clone this repository, install the conda development environment, and build a wheel:

git clone https://github.com/jamesdolezal/slideflow
cd slideflow
conda env create -f environment.yml
conda activate slideflow
python setup.py bdist_wheel
pip install dist/slideflow* cupy-cuda11x

Configuration

Deep learning (Tensorflow vs. PyTorch)

Slideflow supports both Tensorflow and PyTorch, defaulting to Tensorflow if both are available. You can specify the backend to use with the environmental variable SF_BACKEND. For example:

export SF_BACKEND=torch

Slide reading (cuCIM vs. Libvips)

By default, Slideflow reads whole-slide images using cuCIM. Although much faster than other openslide-based frameworks, it supports fewer slide scanner formats. Slideflow also includes a Libvips backend, which adds support for *.scn, *.mrxs, *.ndpi, *.vms, and *.vmu files. You can set the active slide backend with the environmental variable SF_SLIDE_BACKEND:

export SF_SLIDE_BACKEND=libvips

Getting started

Slideflow experiments are organized into Projects, which supervise storage of whole-slide images, extracted tiles, and patient-level annotations. The fastest way to get started is to use one of our preconfigured projects, which will automatically download slides from the Genomic Data Commons. Download one of our dataset folders, and supply the *.json file to the project creation function:

import slideflow as sf

P = sf.project.create(
  '/project/destination',
  cfg='datasets/thyroid_brs/thyroid_brs.json',
  download=True,
  md5=True
)

After the slides have been downloaded and verified, you can skip to Extract tiles from slides.

Alternatively, to create a new custom project, create an instance of the slideflow.Project class and supply patient-level annotations in CSV format:

import slideflow as sf
P = sf.Project(
  '/project/path',
  annotations="/patient/annotations.csv"
)

Once the project is created, add a new dataset source with paths to whole-slide images, tumor Region of Interest (ROI) files [if applicable], and paths to where extracted tiles/tfrecords should be stored. This will only need to be done once.

P.add_source(
  name="TCGA",
  slides="/slides/directory",
  roi="/roi/directory",
  tiles="/tiles/directory",
  tfrecords="/tfrecords/directory"
)

This step should attempt to automatically associate slide names with the patient identifiers in your annotations file. After this step has completed, double check that the annotations file has a slide column for each annotation entry with the filename (without extension) of the corresponding slide.

Extract tiles from slides

Next, whole-slide images are segmented into smaller image tiles and saved in *.tfrecords format. Extract tiles from slides at a given magnification (width in microns size) and resolution (width in pixels) using sf.Project.extract_tiles():

P.extract_tiles(
  tile_px=299,  # Tile size, in pixels
  tile_um=302   # Tile size, in microns
)

If slides are on a network drive or a spinning HDD, tile extraction can be accelerated by buffering slides to a SSD or ramdisk:

P.extract_tiles(
  ...,
  buffer="/mnt/ramdisk"
)

Training models

Once tiles are extracted, models can be trained. Start by configuring a set of hyperparameters:

params = sf.ModelParams(
  tile_px=299,
  tile_um=302,
  batch_size=32,
  model='xception',
  learning_rate=0.0001,
  ...
)

Models can then be trained using these parameters. Models can be trained to categorical, multi-categorical, continuous, or time-series outcomes, and the training process is highly configurable. For example, to train models in cross-validation to predict the outcome 'category1' as stored in the project annotations file:

P.train(
  'category1',
  params=params,
  save_predictions=True,
  multi_gpu=True
)

Evaluation, heatmaps, mosaic maps, and more

Slideflow includes a host of additional tools, including model evaluation and prediction, heatmaps, mosaic maps, analysis of layer activations, and more. See our full documentation for more details and tutorials.

License

This code is made available under the GPLv3 License and is available for non-commercial academic purposes.

Reference

If you find our work useful for your research, or if you use parts of this code, please consider citing as follows:

James Dolezal, Sara Kochanny, & Frederick Howard. (2022). Slideflow: A Unified Deep Learning Pipeline for Digital Histology (1.3.0). Zenodo. https://doi.org/10.5281/zenodo.7183188

@software{james_dolezal_2022_7183188,
  author       = {James Dolezal and
                  Sara Kochanny and
                  Frederick Howard},
  title        = {{Slideflow: A Unified Deep Learning Pipeline for
                   Digital Histology}},
  month        = oct,
  year         = 2022,
  publisher    = {Zenodo},
  version      = {1.3.0},
  doi          = {10.5281/zenodo.7183188},
  url          = {https://doi.org/10.5281/zenodo.7183188}
}

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

2.3.1

Jan 11, 2024

2.3.0

Dec 21, 2023

2.2.2

Dec 20, 2023

2.2.1.post1

Nov 24, 2023

2.2.1

Nov 24, 2023

2.2.0

Oct 31, 2023

2.1.1

Oct 2, 2023

2.1.0

Aug 2, 2023

2.0.5

May 25, 2023

2.0.4

May 17, 2023

2.0.3.post1

Apr 23, 2023

2.0.3

Apr 21, 2023

2.0.2.post1

Apr 17, 2023

2.0.2

Apr 17, 2023

2.0.1

Apr 12, 2023

2.0.0

Apr 9, 2023

2.0.0b1 pre-release

Apr 5, 2023

1.5.4

Apr 5, 2023

1.5.3

Mar 19, 2023

1.5.2

Feb 21, 2023

1.5.1

Feb 10, 2023

1.5.0

Feb 7, 2023

1.4.4

Feb 4, 2023

This version

1.4.3

Feb 2, 2023

1.4.2

Jan 25, 2023

1.4.1

Dec 21, 2022

1.4.0.post1

Dec 2, 2022

1.4.0

Dec 2, 2022

1.4.0rc0 pre-release

Dec 1, 2022

1.3.3

Nov 23, 2022

1.3.2

Nov 10, 2022

1.3.1

Oct 22, 2022

1.3.0

Oct 10, 2022

1.3.0rc0 pre-release

Oct 7, 2022

1.2.5

Sep 1, 2022

1.2.4

Aug 11, 2022

1.2.3

Jul 22, 2022

1.2.2

Jul 18, 2022

1.2.1

Jul 15, 2022

1.2.0

Jul 15, 2022

1.1.4

Jul 14, 2022

1.1.3

May 4, 2022

1.1.2

Apr 22, 2022

1.1.1

Apr 19, 2022

1.1.0

Apr 16, 2022

1.1.0rc2 pre-release

Apr 15, 2022

1.1.0rc1 pre-release

Apr 8, 2022

1.0.8

Jul 15, 2022

1.0.7

May 4, 2022

1.0.6

Feb 16, 2022

1.0.5

Dec 28, 2021

1.0.4

Nov 21, 2021

1.0.3

Nov 18, 2021

1.0.2

Nov 17, 2021

1.0.1

Nov 16, 2021

1.0.0

Nov 13, 2021

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

slideflow-1.4.3-py3-none-any.whl (1.5 MB view hashes)

Uploaded Feb 2, 2023 Python 3

Hashes for slideflow-1.4.3-py3-none-any.whl

Hashes for slideflow-1.4.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`00cb4e19754709cf3170682aeb1092d3600cec5155fd206742e290ab65ec02c0`
MD5	`bd9efa5d41dde5486939968d76b54489`
BLAKE2b-256	`4660325e1c70423ed0654ecf84a2aa4e605d97da58cafefec57e78759ff15080`