Skip to main content

PICAI Baselines

Project description

Baseline AI Models for Prostate Cancer Detection in MRI

This repository contains utilities to set up and train deep learning-based detection models for clinically significant prostate cancer (csPCa) in MRI. In turn, these models serve as the official baseline AI solutions for the PI-CAI challenge. As of now, the following three models will be provided and supported:

All three solutions share the same starting point, with respect to their expected folder structure and data preparation pipeline.

Issues

Please feel free to raise any issues you encounter here.

Installation

picai_baseline can be pip-installed:

pip install picai_baseline

Alternatively, picai_baseline can be installed from source:

git clone https://github.com/DIAGNijmegen/picai_baseline
cd picai_baseline
pip install -e .

This ensures the scripts are present locally, which enables you to run the provided Python scripts. Additionally, this allows you to modify the baseline solutions, due to the -e option. Furthermore, this ensures the latest version is installed.

General Setup

We define setup steps that are shared between the different baseline algorithms. To follow the baseline algorithm tutorials, this setup must be completed first.

Folder Structure

We define three main folders that must be prepared apriori:

  • /input/ contains one of the PI-CAI datasets. This can be the Public Training and Development Dataset, the Private Training Dataset, the Hidden Validation and Tuning Cohort, or the Hidden Testing Cohort.
    • /input/images/ contains the imaging files. For the Public Training and Development Dataset, these can be retrieved here.
    • /input/labels/ contains the annotations. For the Public Training and Development Dataset, these can be retrieved here.
  • /workdir/ stores intermediate results, such as preprocessed images and annotations.
    • /workdir/results/[model name]/ stores model checkpoints/weights during training (enables the ability to pause/resume training).
  • /output/ stores training output, such as trained model weights and preprocessing plan.

Data Preparation

Unless specified otherwise, this tutorial assumes that the PI-CAI: Public Training and Development Dataset will be downloaded and unpacked. Before downloading the dataset, read its documentation and dedicated forum post (for all updates/fixes, if any). To download and unpack the dataset, run the following commands:

# download all folds
curl -C - "https://zenodo.org/record/6624726/files/picai_public_images_fold0.zip?download=1" --output picai_public_images_fold0.zip
curl -C - "https://zenodo.org/record/6624726/files/picai_public_images_fold1.zip?download=1" --output picai_public_images_fold1.zip
curl -C - "https://zenodo.org/record/6624726/files/picai_public_images_fold2.zip?download=1" --output picai_public_images_fold2.zip
curl -C - "https://zenodo.org/record/6624726/files/picai_public_images_fold3.zip?download=1" --output picai_public_images_fold3.zip
curl -C - "https://zenodo.org/record/6624726/files/picai_public_images_fold4.zip?download=1" --output picai_public_images_fold4.zip

# unzip all folds
unzip picai_public_images_fold0.zip -d /input/images/
unzip picai_public_images_fold1.zip -d /input/images/
unzip picai_public_images_fold2.zip -d /input/images/
unzip picai_public_images_fold3.zip -d /input/images/
unzip picai_public_images_fold4.zip -d /input/images/

In case unzip is not installed, you can use Docker to unzip the files:

docker run --cpus=2 --memory=8gb --rm -v /path/to/input:/input joeranbosma/picai_nnunet:latest unzip /input/picai_public_images_fold0.zip -d /input/images/
docker run --cpus=2 --memory=8gb --rm -v /path/to/input:/input joeranbosma/picai_nnunet:latest unzip /input/picai_public_images_fold1.zip -d /input/images/
docker run --cpus=2 --memory=8gb --rm -v /path/to/input:/input joeranbosma/picai_nnunet:latest unzip /input/picai_public_images_fold2.zip -d /input/images/
docker run --cpus=2 --memory=8gb --rm -v /path/to/input:/input joeranbosma/picai_nnunet:latest unzip /input/picai_public_images_fold3.zip -d /input/images/
docker run --cpus=2 --memory=8gb --rm -v /path/to/input:/input joeranbosma/picai_nnunet:latest unzip /input/picai_public_images_fold4.zip -d /input/images/

Please follow the instructions here to set up the Docker container.

Also, collect the training annotations via the following command:

git clone https://github.com/DIAGNijmegen/picai_labels /input/labels/

Cross-Validation Splits

We have prepared 5-fold cross-validation splits of all 1500 cases in the PI-CAI: Public Training and Development Dataset. We have ensured there is no patient overlap between training/validation splits. You can load these splits as follows:

from picai_baseline.splits.picai import train_splits, valid_splits

for fold, ds_config in train_splits.items():
    print(f"Training fold {fold} has cases: {ds_config['subject_list']}")

for fold, ds_config in valid_splits.items():
    print(f"Validation fold {fold} has cases: {ds_config['subject_list']}")

Additionally, we prepared 5-fold cross-validation splits of all cases with an expert-derived csPCa annotation. These splits are subsets of the splits above. You can load these splits as follows:

from picai_baseline.splits.picai_nnunet import train_splits, valid_splits

When using picai_eval from the command line, we recommend saving the splits to disk. Then, you can pass these to picai_eval to ensure all cases were found. You can export the labelled cross-validation splits using:

python -m picai_baseline.splits.picai_nnunet --output "/workdir/splits/picai_nnunet"

Data Preprocessing

We follow the nnU-Net Raw Data Archive format to prepare our dataset for usage. For this, you can use the picai_prep module. Note, the picai_prep module should be automatically installed when installing the picai_baseline module, and is installed within the picai_nnunet and picai_nndetection Docker containers as well.

To convert the dataset in /input/ into the nnU-Net Raw Data Archive format, and store it in /workdir/nnUNet_raw_data, please follow the instructions provided here, or set your target paths in prepare_data.py and execute it:

python src/picai_baseline/prepare_data.py

To adapt/modify the preprocessing pipeline or its default specifications, please make changes to the prepare_data.py script accordingly.

Alternatively, you can use Docker to run the Python script:

docker run --cpus=2 --memory=16gb --rm \
    -v /path/to/input/:/input/ \
    -v /path/to/workdir/:/workdir/ \
    -v /path/to/picai_baseline:/scripts/picai_baseline/ \
    joeranbosma/picai_nnunet:latest python3 /scripts/picai_baseline/src/picai_baseline/prepare_data.py

Baseline Algorithms

We provide end-to-end training pipelines for csPCa detection/diagnosis in 3D. Each baseline includes a template to encapsulate the trained AI model in a Docker container, and uploading the same to the grand-challenge.org platform as an "algorithm".

U-Net

We include a baseline U-Net to provide a playground environment for participants and kickstart their development cycle. The U-Net baseline generates quick results with minimal complexity, but does so at the expense of sub-optimal performance and low flexibility in adapting to any other task.

→ Read the full documentation here.

nnU-Net

The nnU-Net framework [1] provides a performant framework for medical image segmentation, which is straightforward to adapt for csPCa detection.

→ Read the full documentation here.

nnDetection

The nnDetection framework is geared towards medical object detection [2]. Setting up nnDetection and tweaking its implementation is not as straightforward as for the nnUNet or UNet baselines, but it can provide a strong csPCa detection model.

→ Read the full documentation here.

References

[1] Fabian Isensee, Paul F. Jaeger, Simon A. A. Kohl, Jens Petersen and Klaus H. Maier-Hein. "nnU-Net: a self-configuring method for deep learning-based biomedical image segmentation". Nature Methods 18.2 (2021): 203-211.

[2] Michael Baumgartner, Paul F. Jaeger, Fabian Isensee, Klaus H. Maier-Hein. "nnDetection: A Self-configuring Method for Medical Object Detection". International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, Cham, 2021.

[3] Joeran Bosma, Anindo Saha, Matin Hosseinzadeh, Ilse Slootweg, Maarten de Rooij, Henkjan Huisman. "Semi-supervised learning with report-guided lesion annotation for deep learning-based prostate cancer detection in bpMRI". arXiv:2112.05151.

[4] Joeran Bosma, Natalia Alves and Henkjan Huisman. "Performant and Reproducible Deep Learning-Based Cancer Detection Models for Medical Imaging". Under Review.

If you are using this codebase or some part of it, please cite the following article:

A. Saha, J. J. Twilt, J. S. Bosma, B. van Ginneken, D. Yakar, M. Elschot, J. Veltman, J. J. Fütterer, M. de Rooij, H. Huisman, "Artificial Intelligence and Radiologists at Prostate Cancer Detection in MRI: The PI-CAI Challenge (Study Protocol)", DOI: 10.5281/zenodo.6667655

BibTeX:

@ARTICLE{PICAI_BIAS,
    author = {Anindo Saha, Jasper J. Twilt, Joeran S. Bosma, Bram van Ginneken, Derya Yakar, Mattijs Elschot, Jeroen Veltman, Jurgen Fütterer, Maarten de Rooij, Henkjan Huisman},
    title  = {{Artificial Intelligence and Radiologists at Prostate Cancer Detection in MRI: The PI-CAI Challenge (Study Protocol)}}, 
    year   = {2022},
    doi    = {10.5281/zenodo.6667655}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

picai_baseline-0.8.2.tar.gz (855.3 kB view details)

Uploaded Source

Built Distribution

picai_baseline-0.8.2-py3-none-any.whl (906.7 kB view details)

Uploaded Python 3

File details

Details for the file picai_baseline-0.8.2.tar.gz.

File metadata

  • Download URL: picai_baseline-0.8.2.tar.gz
  • Upload date:
  • Size: 855.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.10.6

File hashes

Hashes for picai_baseline-0.8.2.tar.gz
Algorithm Hash digest
SHA256 8da04013cab6420e5d1c460d39d4f623dfe00300da68c4924cb5b73631b7dc5c
MD5 8c3e81bccb76be08ae4b2e1a003807d7
BLAKE2b-256 7b9d3c12df1fef22c3289a443f660a6fafd21c3f16ee82d17afd7e920d75fbbf

See more details on using hashes here.

File details

Details for the file picai_baseline-0.8.2-py3-none-any.whl.

File metadata

File hashes

Hashes for picai_baseline-0.8.2-py3-none-any.whl
Algorithm Hash digest
SHA256 178be84d782f31a263ede1be80a66d382c983dc72b839d4c7d042a1560a67e39
MD5 f8733fb16de3841620941c4dc56242d5
BLAKE2b-256 aac2da78331ad716f49243e5d278fa93e9bbc3928dc40fd1dee0a0f8eee55150

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page