3D Neural Network for Lung Cancer Risk Prediction on CT Volumes

These details have not been verified by PyPI

Project links

Homepage

License
- OSI Approved :: GNU General Public License v3 (GPLv3)
Operating System
- OS Independent
Programming Language
- Python :: 3

Project description

3D Neural Network for Lung Cancer Risk Prediction on CT Volumes

Overview

This repository contains my implementation of the "full-volume" model from the paper:

End-to-end lung cancer screening with three-dimensional deep learning on low-dose chest computed tomography.
Ardila, D., Kiraly, A.P., Bharadwaj, S. et al. Nat Med 25, 954–961 (2019).

Model Workflow

The model uses a three-dimensional (3D) CNN to perform end-to-end analysis of whole-CT volumes, using LDCT volumes with pathology-confirmed cancer as training data. The CNN architecture is an Inflated 3D ConvNet (I3D) (Carreira and Zisserman).

Data

We use the NLST dataset which contains chest LDCT volumes with pathology-confirmed cancer evaluations. For description and access to the dataset refer to the NCI website.

Example cases

Setup

pip install lungs

Running the code

Inference

import lungs
lungs.predict('path/to/data')

From command line (with preprocessed data):

python main.py --preprocessed --input path/to/preprocessed/data

Training

The inputs to the training procedure are training and validation .list files containing coupled data - a volume path and its label in each row. These .list files need to be generated beforehand using preprocess.py, as described in the next section.

import lungs
# train with default hyperparameters
lungs.train(train='path/to/train.list', val='path/to/val.list')

The main.py module contains training (fine-tuning) and inference procedures. The inputs are preprocessed CT volumes, as produced by preprocess.py. For usage example, refer to the arguments' description and default values in the bottom of main.py.

Data Preprocessing

Preprocess volumes

Each CT volume in NLST is a directory of DICOM files (each .dcm file is one slice of the CT). The preprocess.py module accepts a directory path/to/data containing multiple such directories (volumes). It performs several preprocessing steps, and writes each preprocessed volume as an .npz file in path/to/data_preprocssed. These steps are based on this tutorial, and include:

Resampling to a 1.5mm voxel size (slow)
Coarse lung segmentation – used to compute lung center for alignment and reduction of problem space

To save storage space, the following preprocessing steps are performed online (during training/inference):

Windowing – clip pixel values to focus on lung volume
RGB normalization

Example usage:

from lungs import preprocess
# Step 1: Preprocess all volumes, will save them to '/path/to/dataset_preprocessed'
preprocess.preprocess_all('/path/to/dataset')

Create train/val `.list` files

Once the preprocessed data is ready, the next step is to split it randomly into train/val sets, and save each set as a .list file of paths/labels, required for the training procedure.

Example usage:

from lungs import preprocess
# Step 2: Split preprocessed data into train/val coupled `.list` files
preprocess.split(positives='/path/to/dataset_preprocessed/positives',
                 negatives='/path/to/dataset_preprocessed/negatives',
                 lists_dir='/path/to/write/lists',
                 split_ratio=0.7)

Provided checkpoint

By default, if the ckpt argument is not given, the model is initialized using our best fine-tuned checkpoint. Due to limited storage and compute time, our checkpoint was trained on a small subset of NLST containing 1,045 volumes (34% positive).

Note that in order to obtain a general purpose prediction model, one would have to train on the full NLST dataset. Steps include:

Gainng access to the NLST dataset
Downloading ~6TB of positive and negative volumes (requires storage and a few days for downloading)
Preprocessing (see Data Preprocessing section above)
Training (requires a capable GPU)

Even though we used a samll subset of NLST, we still achieved a state-of-the-art AUC score of 0.892 on a validation set of 448 volumes. This is comparable to the original paper's AUC for the full-volume model (see the paper's supplemtary material), trained on 47,974 volumes (1.34% positive).

To train this model we first initialized by bootstrapping the filters from the ImageNet pre-trained 2D Inception-v1 model into 3D, as described in the I3D paper. It was then fine-tuned on the preprocessed CT volumes to predict cancer within 1 year (binary classification). Each of these volumes was a large region cropped around the center of the bounding box, as determined by lung segmentation in the preprocessing step.

For the training setup, we set the dropout keep_prob to 0.7, and trained in mini-batches of size of 2 (due to limited GPU memory). We used tf.train.AdamOptimizer with a small learning rate of 5e-5, (due to the small batch size) and stopped the training before overfitting started around epoch 37. The focal loss function from the paper is provided in the code, but we did not experience improved results using it, compared to cross-entropy loss which was used instead. The likely reason is that our dataset was more balanced than the original paper's.

The follwoing plots show loss, AUC, and accuracy progression during training, along with ROC curves for selected epochs:

Acknowledgments

The author thanks the National Cancer Institute for access to NCI's data collected by the National Screening Trial (NLST). The statements contained herein are solely those of the author and do not represent or imply concurrence or endorsement by NCI.

Project details

These details have not been verified by PyPI

Project links

Homepage

License
- OSI Approved :: GNU General Public License v3 (GPLv3)
Operating System
- OS Independent
Programming Language
- Python :: 3

Release history Release notifications | RSS feed

This version

0.1.5

Jun 27, 2020

0.1.3

Jun 25, 2020

0.1.2

Jun 25, 2020

0.1.1

Jun 24, 2020

0.1.0

Jun 24, 2020

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lungs-0.1.5.tar.gz (22.0 kB view details)

Uploaded Jun 27, 2020 Source

Built Distribution

lungs-0.1.5-py3-none-any.whl (33.6 kB view details)

Uploaded Jun 27, 2020 Python 3

File details

Details for the file lungs-0.1.5.tar.gz.

File metadata

Download URL: lungs-0.1.5.tar.gz
Upload date: Jun 27, 2020
Size: 22.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/47.3.1 requests-toolbelt/0.9.1 tqdm/4.46.1 CPython/3.6.8

File hashes

Hashes for lungs-0.1.5.tar.gz
Algorithm	Hash digest
SHA256	`aaabd8d8642566744ffc35d2a504d1abb42e48d52cc4dfd93a854ba3b4591def`
MD5	`6e32d509290ea04f2f07f28199000ff6`
BLAKE2b-256	`cfbc57d356f30accc9af7e093254f95236d7ae49c34f0c22e28c57e676ac8862`

See more details on using hashes here.

File details

Details for the file lungs-0.1.5-py3-none-any.whl.

File metadata

Download URL: lungs-0.1.5-py3-none-any.whl
Upload date: Jun 27, 2020
Size: 33.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/47.3.1 requests-toolbelt/0.9.1 tqdm/4.46.1 CPython/3.6.8

File hashes

Hashes for lungs-0.1.5-py3-none-any.whl
Algorithm	Hash digest
SHA256	`8266edda3593258c6d63ed763a5132e0729d9ef0b56b4300385e4a555b297409`
MD5	`deca1776817c7cfd262f7d228312e46e`
BLAKE2b-256	`9b2df757c5ea93cc68599de5b82c3e90bb21bbd37f30a6e2ec7258d7c8500865`

See more details on using hashes here.

lungs 0.1.5

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

3D Neural Network for Lung Cancer Risk Prediction on CT Volumes

Overview

Data

Setup

Running the code

Inference

Training

Data Preprocessing

Preprocess volumes

Create train/val `.list` files

Provided checkpoint

Acknowledgments

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

lungs 0.1.5

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

3D Neural Network for Lung Cancer Risk Prediction on CT Volumes

Overview

Data

Setup

Running the code

Inference

Training

Data Preprocessing

Preprocess volumes

Create train/val .list files

Provided checkpoint

Acknowledgments

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

Create train/val `.list` files