Skip to main content

This package splits a medical imaging dataset into test and train sets in a patient aware and stratified manner.

Project description

Covid_Patient_Aware_Image_Split

It is important not to split images of the same patient between the test and train sets to avoid overfitting. This repository splits a sample Covid/Normal classification dataset into test and train sets in a patient aware and stratified manner. The meta-data file is used to group the images based on Patient-ID. For example, all the images colored green belong to the same patient and should be either in the test or the train split.

Screenshot

While grouping should be done strictly to ensure there is no splitting images of a patient, stratification can be done approximately i.e. as well as possible. This code also assumes that all images of one patient have the same stratification category (diagnosis), meaning that all the images coming from the same Patient ID are either Covid or NonCovid.

To split images into 4 folders (train/Covid, train/NonCovid, test/Covid, test/NonCovid) inside splitted folder:

split_to_folders.py

To split images into a dictionary:

split_into_dictionary.py

To split images into a torch Dataloader:

split_into_dataloader.py

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

patient_aware_splitter-0.0.1.tar.gz (4.0 kB view details)

Uploaded Source

File details

Details for the file patient_aware_splitter-0.0.1.tar.gz.

File metadata

  • Download URL: patient_aware_splitter-0.0.1.tar.gz
  • Upload date:
  • Size: 4.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.5.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.56.0 CPython/3.7.4

File hashes

Hashes for patient_aware_splitter-0.0.1.tar.gz
Algorithm Hash digest
SHA256 47460c3d87e72ccc1bcf5c813a4da35cd4bc01778d193360e5ba42c6f2b1f574
MD5 bdcfd86fcc7be4f2db9378acc84e33de
BLAKE2b-256 6a022b2a70c5dec5c2cb7ee1a1f89111525411645772891ec6506f53afdd2342

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page