No project description provided

These details have not been verified by PyPI

Project description

UMIE_datasets

🤩 About the Project

Warning: This project is currently in alpha stage and may be subject to major changes

This repository presents a suite of unified scripts to standardize, preprocess, and integrate 882,774 images from 20 open-source medical imaging datasets, spanning modalities such as X-ray, CT, and MR. The scripts allow for seamless and fast download of a diverse medical data set. We create a unified set of annotations allowing for merging the datasets together without mislabelling. Each dataset is preprocessed with a custom sklearn pipeline. The pipeline steps are reusable across the datasets. The code was designed so that preorocessing a new dataset is simple and requires only reusing the available pipeline steps with customization performed through setting the appropriate values of the pipeline params.

The labels and segmentation masks were unified to be compliant with RadLex ontology.

Preprocessing_modules

Datasets

uid	Dataset	Modality	TASK
0	KITS-23	CT	classification/segmentation
1	CoronaHack	XRAY	classification
2	Alzheimers Dataset	MRI	Classification
3	Brain Tumor Classification	MRI	classification
4	COVID-19 Detection X-Ray	XRAY	classification
5	Finding and Measuring Lungs in CT Data	CT	Segmentation
6	Brain CT Images with Intracranial Hemorrhage Masks	CT	Classification
7	Liver and Liver Tumor Segmentation	CT	Classification, Segmentation
8	Brain MRI Images for Brain Tumor Detection	MRI	Classification
9	Knee Osteoarthritis Dataset with Severity Grading	X-Ray	Classification
10	Brain Tumor Progression	MRI	segmentation
11	Chest X-ray 14	XRAY	classification
12	COCA- Coronary Calcium and chest CTs	CT	Segmentation
13	BrainMetShare	MRI	Segmentation

Using the datasets

Installing requirements

poetry install

Creating the dataset

Due to the copyright restrictions of the source datasets, we can't share the files directly. To obtain the full dataset you have to download the source datasets yourself and run the preprocessing scripts.

0.KITS-23

KITS-23

Clone the KITS-23 repository.
Enter the KITS-23 directory and install the packages with pip.
```
cd kits23
pip3 install -e .
```
Run the following command to download the data to the dataset/ folder.
```
kits23_download_data
```

Fill in the source_path and target_path KITS-23Pipeline() in config/runner_config.py. e.g.

 KITS23Pipeline(
      path_args={
          "source_path": "kits23/dataset",  # Path to the dataset directory in KITS23 repo
          "target_path": TARGET_PATH,
          "labels_path": "kits23/dataset/kits23.json",  # Path to kits23.json
      },
      dataset_args=dataset_config.KITS23
  ),

1. Xray CoronaHack -Chest X-Ray-Dataset

1. Xray CoronaHack -Chest X-Ray-Dataset

Go to CoronaHack page on Kaggle.
Login to your Kaggle account.
Download the data.
Extract archive.zip.
Fill in the source_path to the location of the archive folder in CoronaHackPipeline() in config/runner_config.py.

2. Alzheimer's Dataset

2. Alzheimer's Dataset ( 4 class of Images)

Go to Alzheimer's Dataset page on Kaggle.
Login to your Kaggle account.
Download the data.
Extract archive.zip.
Fill in the source_path to the location of the archive folder in AlzheimersPipeline() in config/runner_config.py.

3. Brain Tumor Classification (MRI

3. Brain Tumor Classification (MRI)

Go to Brain Tumor Classification page on Kaggle.
Login to your Kaggle account.
Download the data.
Extract archive.zip.
Fill in the source_path to the location of the archive folder in BrainTumorClassificationPipeline() in config/runner_config.py.

4. COVID-19 Detection X-Ray

4. COVID-19 Detection X-Ray

Go to COVID-19 Detection X-Ray page on Kaggle.
Login to your Kaggle account.
Download the data.
Extract archive.zip.
REMOVE TrainData folder. We do not want augmented data at this stage.
Fill in the source_path to the location of the archive folder in COVID19DetectionPipeline() in config/runner_config.py.

5. Finding and Measuring Lungs in CT Dat

5. Finding and Measuring Lungs in CT Data

Go to Finding and Measuring Lungs in CT Data page on Kaggle.
Login to your Kaggle account.
Download the data.
Extract archive.zip.
Fill in the source_path to the location of the archive/2d_images folder in FindingAndMeasuringLungsPipeline() in config/runner_config.py. Fill in masks_path with the location of the archive/2d_masks folder.

6. Brain CT Images with Intracranial Hemorrhage Masks

6. Brain CT Images with Intracranial Hemorrhage Masks

Go to Brain With Intracranial Hemorrhage page on Kaggle.
Login to your Kaggle account.
Download the data.
Extract archive.zip.
Fill in the source_path to the location of the archive folder in BrainWithIntracranialHemorrhagePipeline() in config/runner_config.py. Fill in masks_path with the same path as the source_path.

7. Liver and Liver Tumor Segmentation (LITS)

7. Liver and Liver Tumor Segmentation (LITS)

Go to Liver and Liver Tumor Segmentation.
Login to your Kaggle account.
Download the data.
Extract archive.zip.
Fill in the source_path to the location of the archive folder in COVID19DetectionPipeline() in config/runner_config.py. Fill in masks_path too.

8. Brain MRI Images for Brain Tumor Detection

8. Brain MRI Images for Brain Tumor Detection

Go to Brain MRI Images for Brain Tumor Detection page on Kaggle.
Login to your Kaggle account.
Download the data.
Extract archive.zip.
Fill in the source_path to the location of the archive folder in BrainTumorDetectionPipeline() in config/runner_config.py.

9. Knee Osteoarthrithis Dataset with Severity Grading

9. Knee Osteoarthrithis Dataset with Severity Grading 1. Go to Knee Osteoarthritis Dataset with Severity Grading. 2. Login to your Kaggle account. 3. Download the data. 4. Extract archive.zip. 5. Fill in the source_path to the location of the archive folder in COVID19DetectionPipeline() in config/runner_config.py.

10. Brain-Tumor-Progression

10. Brain-Tumor-Progression

Go to Brain Tumor Progression dataset from the cancer imaging archive.

11. Chest X-ray 14

11. Chest X-ray 14

Go to Chest X-ray 14.
Create an account.
Download the images folder and DataEntry2017_v2020.csv.

12. COCA- Coronary Calcium and chest CTs

12. COCA- Coronary Calcium and chest CTs

Go to COCA- Coronary Calcium and chest CTs.
Log in or sign up for a Stanford AIMI account.
Fill in your contact details.
Download the data with azcopy.
Fill in the source_path with the location of the cocacoronarycalciumandchestcts-2/Gated_release_final/patient folder. Fill in masks_path with cocacoronarycalciumandchestcts-2/Gated_release_final/calcium_xml xml file.

13. BrainMetShare

13. BrainMetShare

Go to BrainMetShare.
Log in or sign up for a Stanford AIMI account.
Fill in your contact details.
Download the data with azcopy.

To preprocess the dataset that is not among the above, search the preprocessing folder. It contains the reusable steps for changing imaging formats, extracting masks, creating file trees, etc. Go to the config file to check which masks and label encodings are available. Append new labels and mask encodings if needed.

Overall the dataset should have ** 882,774** images in .png format

CT - 500k+
X-Ray - 250k+
MRI - 100k+

🎯 Roadmap

dcm
jpg
nii
tif
Shared radlex ontology
Huggingface datasets
Data dashboards

:wave: Contributors

:handshake: Contact

Barbara Klaudel

TheLion.AI

Development

Pre-commits

Install pre-commits https://pre-commit.com/#installation

If you are using VS-code install the extention https://marketplace.visualstudio.com/items?itemName=MarkLarah.pre-commit-vscode

To make a dry-run of the pre-commits to see if your code passes run

pre-commit run --all-files

Adding python packages

Dependencies are handeled by poetry framework, to add new dependency run

poetry add <package_name>

Debugging

To modify and debug the app, development in containers can be useful .

Testing

run_tests.sh

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.1.4

Aug 26, 2024

0.1.3

Aug 26, 2024

0.1.2

Aug 26, 2024

0.1.1

Aug 26, 2024

0.1.0

Aug 26, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

umie_datasets-0.1.4.tar.gz (35.0 kB view details)

Uploaded Aug 26, 2024 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

umie_datasets-0.1.4-py3-none-any.whl (58.8 kB view details)

Uploaded Aug 26, 2024 Python 3

File details

Details for the file umie_datasets-0.1.4.tar.gz.

File metadata

Download URL: umie_datasets-0.1.4.tar.gz
Upload date: Aug 26, 2024
Size: 35.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: poetry/1.6.1 CPython/3.10.10 Darwin/23.5.0

File hashes

Hashes for umie_datasets-0.1.4.tar.gz
Algorithm	Hash digest
SHA256	`59896649f008c0fa556303ddf90666dc907dcfe0063669250dd08fef680a3e63`
MD5	`c41b0446eebe8b8ce3a325a8e5aa77f5`
BLAKE2b-256	`dce6eddce12c84524f82d502bfdf3d3ce20daa24466ae72870c82743e34d8956`

See more details on using hashes here.

File details

Details for the file umie_datasets-0.1.4-py3-none-any.whl.

File metadata

Download URL: umie_datasets-0.1.4-py3-none-any.whl
Upload date: Aug 26, 2024
Size: 58.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: poetry/1.6.1 CPython/3.10.10 Darwin/23.5.0

File hashes

Hashes for umie_datasets-0.1.4-py3-none-any.whl
Algorithm	Hash digest
SHA256	`a71bbbdaf1f3885440b21e5dec95675d3137f707b35b86cccc081440ced772b5`
MD5	`2bce2884a298e1317f7ab3b3d3b50669`
BLAKE2b-256	`1716e346f665de8e8fe91e553e881b6f69d482bb05dd4f4b1bb99df755830b54`

See more details on using hashes here.

umie-datasets 0.1.4

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

UMIE_datasets

🤩 About the Project

Datasets

Using the datasets

Installing requirements

Creating the dataset

KITS-23

🎯 Roadmap

:wave: Contributors

:handshake: Contact

Development

Pre-commits

Adding python packages

Debugging

Testing

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes