Skip to main content

Medical data formatting and pre-processing module whose main objective is to build an HDF5 dataset containing all medical images of patients (DICOM format) and their associated segmentations. The HDF5 dataset is then easier to use to perform tasks on the medical data, such as machine learning tasks.

Project description

Medical data formatting and pre-processing module

This package is a medical data formatting and pre-processing module whose main objective is to build an HDF5 dataset containing all medical images of patients (DICOM format) and their associated segmentations. The HDF5 dataset is then easier to use to perform tasks on the medical data, such as machine learning tasks.

Anyone who is willing to contribute is welcome to do so.

What is the purpose of this module?

Digital Imaging and Communications in Medicine (DICOM) is the international standard for medical images and related information. It defines the formats for medical images that can be exchanged with the data and quality necessary for clinical use. With the rapid development of artificial intelligence in the last few years, especially deep learning, medical images are increasingly used for understanding or prediction purposes. The working group DICOM WG-23 on Artificial Intelligence / Application Hosting is currently working to identify or develop the DICOM mechanisms to support AI workflows, concentrating on the clinical context. Moreover, their future roadmap and objectives includes working on the concern that current DICOM mechanisms might not be adequate to cover some use cases, particularly bulk analysis of large repository data, e.g. for training deep learning neural networks.

The purpose of this module is therefore to provide the necessary tools to facilitate the use of medical images in an AI workflow. This goal is accomplished by using the HDF file format to create a dataset containing all medical images of patients (DICOM format) and their associated segmentations. Currently, the accepted file formats for segmentations are .nrrd and .seg.nrrd (3D slicer segmentation format).

Installation

Latest stable version :

pip install dicom2hdf

Latest (possibly unstable) version :

pip install git+https://github.com/MaxenceLarose/dicom2hdf

Getting started

The easiest way to import the package is to use :

from dicom2hdf import *

This will import the useful classes PatientDataset and PatientDataGenerator. These two classes represent two different ways of using the package. The following examples will present both procedures.

Example using the patient dataset class

from dicom2hdf import *

dataset = PatientDataset(
    path_to_dataset=PathName.PATH_TO_PATIENT_DATASET,
)

dataset.create_hdf5_dataset(
    path_to_patients_folder=PathName.PATH_TO_PATIENTS_FOLDER,
    path_to_segmentations_folder=PathName.PATH_TO_SEGMENTATIONS_FOLDER,
    series_descriptions=os.path.join(PathName.PATH_TO_DATA_FOLDER, "series_descriptions.json"),
    organs=os.path.join(PathName.PATH_TO_DATA_FOLDER, "organs.json"),
    images_folder_name=FolderName.IMAGES_FOLDER_NAME,
    verbose=True,
    overwrite_dataset=True
)

Example using the patient data generator class

from dicom2hdf import *

patient_data_generator = PatientDataGenerator(
    paths_to_patients_folder_and_segmentations=paths_to_patients_folder_and_segmentations,
    verbose=verbose,
    series_descriptions=series_descriptions,
    organs=organs
)

for patient_dataset in patient_data_generator:
    patient_name = patient_dataset.patient_name
    
    for patient_image_data in enumerate(patient_dataset.data):
        dicom_header = patient_image_data.image.dicom_header
        simple_itk_image = patient_image_data.image.simple_itk_image
        numpy_array_image = sitk.GetArrayFromImage(simple_itk_image)
        
        """Perform some tasks on images on-the-fly."""

License

This code is provided under the Apache License 2.0.

Citation

@article{dicom2hdf,
  title={DICOM to HDF python module},
  author={Maxence Larose},
  year={2022},
  publisher={Université Laval},
  url={https://github.com/MaxenceLarose/dicom2hdf},
}

Contact

Maxence Larose, B. Ing., maxence.larose.1@ulaval.ca

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dicom2hdf-0.0.3.tar.gz (26.9 kB view hashes)

Uploaded Source

Built Distribution

dicom2hdf-0.0.3-py3-none-any.whl (37.8 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page