Skip to main content

This package splits a medical imaging dataset into test and train sets in a patient aware and stratified manner.

Project description

Covid_Patient_Aware_Image_Split

It is important not to split images of the same patient between the test and train sets to avoid overfitting. This repository splits a sample Covid/Normal classification dataset into test and train sets in a patient aware and stratified manner. The meta-data file is used to group the images based on Patient-ID. For example, all the images colored green belong to the same patient and should be either in the test or the train split.

Screenshot

While grouping should be done strictly to ensure there is no splitting images of a patient, stratification can be done approximately i.e. as well as possible. This code also assumes that all images of one patient have the same stratification category (diagnosis), meaning that all the images coming from the same Patient ID are either Covid or NonCovid.

To split images into 4 folders (train/Covid, train/NonCovid, test/Covid, test/NonCovid) inside splitted folder:

split_to_folders.py

To split images into a dictionary:

split_into_dictionary.py

To split images into a torch Dataloader:

split_into_dataloader.py

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

patient_aware_splitter-0.0.1.tar.gz (4.0 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page