Skip to main content

A package that allows for the streamlining of retrieveing data from the UK-Biobank.

Project description

UKB-UnPAC

Welcome to UKB-UnPAC!

UKB-UnPAC is a Python package dedicated to the extraction of curated image data from the UK-Biobank. Developed by researchers in Trinity College Dublin’s Discipline of Radiation Therapy, as part of a project funded by the World Cancer Research Fund (WCRF). The funded project aimed to extract radiomics information from the Dixon weighted MRIs found in the UK-Biobank after being granted access. Considering the vastness of the data available an automated pipline was needed to streamline the acquisition of useful data from the UK-Biobank. This gave rise to UKB-UnPAC!

What is UKB-UnPAC?

UKB-UnPAC offers an all-in-one system, designed to reduce the workload from extracting image data from the UK-Biobank, but can be adapted for any large scale radiological dataset. UKB-UnPAC has 4 main scripts which are its name sake, that allows researchers to:

  • UnZip downloaded DICOM files.
  • Parse those DICOM files based on researcher set variables.
  • Acquire only anatomically relevant images from the UK-Biobank imaging datasets.
  • Convert these images into a more research friendly format.

Schematic diagram of the UKN-UnPAC process, highlighting inputs, outputs, and the 4 key processes

More detail on each of the processes are detailed below. Usage instructions can be found in (LINK)

UnZipper

When initally downloaded, the MRI images from the UK-Biobank (UKB) come in the form of compressed, zip files (.zip). This file type is not easily read into many applications and is not accessible for research purposes. To overcome this, the UnZipper script was created. This function iteratively unzips each of the files into a readable, unzipped format, ready for use in research activities. This function retains the name of the zip file and matches the unzipped and zipped directories based on name. After all directories have been unzipped, the function then deletes the original zipped directories, to reduce data storage.

Parser

The parser function allows researchers to filter which data to parse based on a user set variable. Researchers will need to have created a CSV file with the list of patient IDs to be included in the study, and assign the patient ID variable name when calling the function. This function works by comparing the list of unzipped image files names to the CSV file list, therefore directory names and patient ID in the CSV must match! Any directory not found on your CSV list will be deleted.

Acquirer

The acquirer function, as is illustrated above, reads the metadata of the MRI and splits the full body MRI into its component parts based on the name of the series and the series number. This can be changed to allow for scans of different anatomical areas to be retained. The acquirer function reads the metadata, organizes the files in a neater pathway, based on the work of Dr. Alexandor Weston, PhD (Accessed at: https://towardsdatascience.com/a-python-script-to-sort-dicom-files-f1623a7f40b8 on 17/04/2024). The acquirer function then discards the unspecified images, to reduce data storage requirements. The final product is a file pathway containing a 3D image of only the region of interest, in a specified weight from a specified series.

Converter

The objective of this project was to use segmented images to train U-Net based neural network. In order to achieve this both the images and the segments must be stored in the same format. To overcome this, the converter script creates a copy of all images in a directory as a NIfTI file. This function retains both the DICOM file and the newly generated NIfTI file.

Flow chart of the information contained in a UK-Biobank download with the image data relevant to this study highlighted

The above diagram shows the extent of the imaging data available on the UKB vs the data needed (highlighted nodes). UnPAC allows the researcher to dictate how much of this data they store.

Application

We believe that this package and the included function will be extremely beneficial to the research community. The most evident application is the analysis of the UKB, however with small alterations to the code, and an understanding of the image formats, these functions could be applied to any large scale 3D radiological databases. These functions automate what is a repetitive task in this type of research and can also do it in a fraction of the time, saving time and resources. The entire pipeline from start to finish takes approx. 11.1 seconds per patient. This could not be achieved at this speed manually.

In version 3.0.0 the entire package has been optimiased, greatley reducing the time taken per pateint to <5s/patient.

For more information visit the usage document on how to apply it to your own research.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

unpac-3.0.2.tar.gz (6.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

UnPAC-3.0.2-py3-none-any.whl (6.2 kB view details)

Uploaded Python 3

File details

Details for the file unpac-3.0.2.tar.gz.

File metadata

  • Download URL: unpac-3.0.2.tar.gz
  • Upload date:
  • Size: 6.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.4

File hashes

Hashes for unpac-3.0.2.tar.gz
Algorithm Hash digest
SHA256 60a8652a5bd9f70ab46c1e63858e8b6bb019eb4f9e402eae0f4c5594d9ee33b0
MD5 3e73167b651fe4ec86a55040a1fb93ce
BLAKE2b-256 352e26b9d06133ac8bc2f53a640a2e3819e5579c9bef9b000f4dfc65c629ea41

See more details on using hashes here.

File details

Details for the file UnPAC-3.0.2-py3-none-any.whl.

File metadata

  • Download URL: UnPAC-3.0.2-py3-none-any.whl
  • Upload date:
  • Size: 6.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.4

File hashes

Hashes for UnPAC-3.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 bfddca2612428a63d6de00c0d241a21d1f0ee198de25d04e07ace3716337c26b
MD5 34357ca3ba45b305f17984426e4d2fa2
BLAKE2b-256 47a0125a6d06409f1c335f53a9b45c5b8b7a10e5dfebac9e0b8bd3c5a563d1ed

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page