TorchXRayVision: A library of chest X-ray datasets and models
Project description
torchxrayvision
A library for chest X-ray datasets and models. Including pre-trainined models.
This code is still under development
Getting started
pip install torchxrayvision
import torchxrayvision as xrv
These are default pathologies:
xrv.datasets.default_pathologies
['Atelectasis',
'Consolidation',
'Infiltration',
'Pneumothorax',
'Edema',
'Emphysema',
'Fibrosis',
'Effusion',
'Pneumonia',
'Pleural_Thickening',
'Cardiomegaly',
'Nodule',
'Mass',
'Hernia',
'Lung Lesion',
'Fracture',
'Lung Opacity',
'Enlarged Cardiomediastinum']
models (demo notebook)
Specify weights for pretrained models (currently all DenseNet121)
Note: Each pretrained model has 18 outputs. The all
model has every output trained. However, for the other weights some targets are not trained and will predict randomly becuase they do not exist in the training dataset. The only valid outputs are listed in the field {dataset}.pathologies
on the dataset that corresponds to the weights.
model = xrv.models.DenseNet(weights="all")
model = xrv.models.DenseNet(weights="kaggle") # RSNA Pneumonia Challenge
model = xrv.models.DenseNet(weights="nih") # NIH chest X-ray8
model = xrv.models.DenseNet(weights="pc") # PadChest (University of Alicante)
model = xrv.models.DenseNet(weights="chex") # CheXpert (Stanford)
model = xrv.models.DenseNet(weights="minix_nb") # MIMIC-CXR (MIT)
model = xrv.models.DenseNet(weights="minix_ch") # MIMIC-CXR (MIT)
datasets (demo notebook)
Only stats for PA/AP views are shown. Datasets may include more.
transform = torchvision.transforms.Compose([xrv.datasets.XRayCenterCrop(),
xrv.datasets.XRayResizer(224)])
d_kaggle = xrv.datasets.RSNA_Pneumonia_Dataset(imgpath="path to stage_2_train_images_jpg",
transform=transform)
d_chex = xrv.datasets.CheX_Dataset(imgpath="path to CheXpert-v1.0-small",
csvpath="path to CheXpert-v1.0-small/train.csv",
transform=transform)
d_nih = xrv.datasets.NIH_Dataset(imgpath="path to NIH images")
d_nih2 = xrv.datasets.NIH_Google_Dataset(imgpath="path to NIH images")
d_pc = xrv.datasets.PC_Dataset(imgpath="path to image folder")
d_covid19 = xrv.datasets.COVID19_Dataset() # specify imgpath and csvpath for the dataset
National Library of Medicine Tuberculosis Datasets paper
d_nlmtb = xrv.datasets.NLMTB_Dataset(imgpath="path to MontgomerySet or ChinaSet_AllFiles")
Using MontgomerySet data:
NLMTB_Dataset num_samples=138 views=['PA']
{'Tuberculosis': {0: 80, 1: 58}}
or using ChinaSet_AllFiles data:
NLMTB_Dataset num_samples=662 views=['PA', 'AP']
{'Tuberculosis': {0: 326, 1: 336}}
dataset tools
relabel_dataset will align labels to have the same order as the pathologies argument.
xrv.datasets.relabel_dataset(xrv.datasets.default_pathologies , d_nih) # has side effects
specify a subset of views (demo notebook)
d_kaggle = xrv.datasets.RSNA_Pneumonia_Dataset(imgpath="...",
views=["PA","AP","AP Supine"])
specify only 1 image per patient
d_kaggle = xrv.datasets.RSNA_Pneumonia_Dataset(imgpath="...",
unique_patients=True)
obtain summary statistics per dataset
d_chex = xrv.datasets.CheX_Dataset(imgpath="CheXpert-v1.0-small",
csvpath="CheXpert-v1.0-small/train.csv",
views=["PA","AP"], unique_patients=False)
CheX_Dataset num_samples=191010 views=['PA', 'AP']
{'Atelectasis': {0.0: 17621, 1.0: 29718},
'Cardiomegaly': {0.0: 22645, 1.0: 23384},
'Consolidation': {0.0: 30463, 1.0: 12982},
'Edema': {0.0: 29449, 1.0: 49674},
'Effusion': {0.0: 34376, 1.0: 76894},
'Enlarged Cardiomediastinum': {0.0: 26527, 1.0: 9186},
'Fracture': {0.0: 18111, 1.0: 7434},
'Lung Lesion': {0.0: 17523, 1.0: 7040},
'Lung Opacity': {0.0: 20165, 1.0: 94207},
'Pleural Other': {0.0: 17166, 1.0: 2503},
'Pneumonia': {0.0: 18105, 1.0: 4674},
'Pneumothorax': {0.0: 54165, 1.0: 17693},
'Support Devices': {0.0: 21757, 1.0: 99747}}
Pathology masks (demo notebook)
d_rsna = xrv.datasets.RSNA_Pneumonia_Dataset(imgpath="stage_2_train_images_jpg",
views=["PA","AP"],
pathology_masks=True)
d_rsna.csv.has_masks.value_counts()
False 20672
True 6012
sample["pathology_masks"]
it also works with data_augmentation if you pass in data_aug=data_transforms
to the dataloader. The random seed is matched to align calls for the image and the mask.
Citation
Joseph Paul Cohen, Joseph Viviano, Mohammad Hashir, and Hadrien Bertrand.
TorchXrayVision: A library of chest X-ray datasets and models.
https://github.com/mlmed/torchxrayvision, 2020
and this paper https://arxiv.org/abs/2002.02497
Cohen, J. P., Hashir, M., Brooks, R., & Bertrand, H.
On the limits of cross-domain generalization in automated X-ray prediction.
Medical Imaging with Deep Learning 2020 (Online: https://arxiv.org/abs/2002.02497)
@inproceedings{cohen2020limits,
title={On the limits of cross-domain generalization in automated X-ray prediction},
author={Cohen, Joseph Paul and Hashir, Mohammad and Brooks, Rupert and Bertrand, Hadrien},
booktitle={Medical Imaging with Deep Learning}
year={2020},
url={https://arxiv.org/abs/2002.02497}
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for torchxrayvision-0.0.17-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 65175655b6bda203526b65eb4e417290267b7e1f37407d4be7e34dfb6b60c3bb |
|
MD5 | 2d2555ab63815f4c48afbeeb94f5e3d0 |
|
BLAKE2b-256 | ab99bae999ca5b1208af9ad1609436112fb4c38492fca6c393a16ef2ac8ff07c |