Skip to main content

De-Identification of Medical Imaging Data: A Comprehensive Tool for Ensuring Patient Privacy

Project description

De-Identification of Medical Imaging Data: A Comprehensive Tool for Ensuring Patient Privacy

Python 3.11.2 Code style: black License Open Source Love Docker PyPI - Version


[!IMPORTANT]
The package is now available on PyPI: pip install mede

[!NOTE]
MEDE now supports the Enhanced DICOM format!


This repository contains the De-Identification of Medical Imaging Data: A Comprehensive Tool for Ensuring Patient Privacy, which enables users to anonymize a wide variety of medical imaging types, including:

  • Magnetic Resonance Imaging (MRI)
  • Computer Tomography (CT)
  • Ultrasound (US)
  • Whole Slide Images (WSI)
  • MRI raw data (twix)
Overview

This tool combines multiple anonymization steps, including metadata deidentification, defacing, and skull-stripping, while being faster than current state-of-the-art deidentification tools.

Computation Times


Getting Started

You can install the anonymization tool directly via pip or Docker.

Installation via pip

Our tool is available via pip. You can install it with the following command:

pip install mede

Additional Dependencies for Text Removal

[!WARNING]
Since version 0.0.11 we use EasyOCR instead of Tesseract for text removal. If you want to use the text removal feature, you need to install EasyOCR and its dependencies. EasyOCR is installed automatically with the newer version of MEDE.

We also implement a manual text removal feature that can be used to remove any remaining text from the images. This feature is optional and can be enabled with the additional --refine flag. This will open an interactive window where you can manually select any remaining text/artifacts and remove them by drawing a bounding box around them.

To draw a bounding box, click and hold the left mouse button, drag to create a rectangle around the text/artifact you want to remove, and then release the mouse button. The selected area will be filled with black pixels to effectively remove the text/artifact from the image.

To reset the image to its original state, press the r key while the interactive window is open.

To save the changes, press space or enter.


The following installation is only necessary if you want to use the old text removal feature and have an older version of MEDE installed:

If you want to use the old text removal feature, you also need to install Google's Tesseract OCR engine. Follow the installation instructions for your operating system here.

  • On Ubuntu:

    sudo apt install tesseract-ocr
    sudo apt install libtesseract-dev
    
  • On macOS (via Homebrew):

    brew install tesseract
    

Installation via Docker

Alternatively, this tool is distributed via Docker. You can find the Docker images here. The Docker image is available for Linux-based (including macOS) amd64 and arm64 platforms.

Steps:

  1. Pull the Docker image:

    docker pull morrempe/mede:[tag]   # Replace [tag] with either arm64 or amd64
    
  2. Run the Docker container with an attached volume:
    Your data will be mounted in the data folder:

    docker run --rm -it -v [Path/to/your/data]:/data morrempe/mede:[tag]
    
  3. Run the script with the corresponding CLI parameters:

    mede-deidentify [your flags]
    

Usage

De-Identification CLI

The mede-deidentify command-line interface (CLI) allows you to de-identify medical imaging data with various options. Below is the detailed usage guide:

mede-deidentify [-h] [-v | --verbose] [-t | --text-removal] [-i | --input]
                [-o OUTPUT] [--gpu] [-s | --skull_strip] [-de | --deface]
                [-tw | --twix] [-w | --wsi] [-r | --rename]
                [-p PROCESSES] 
                [-d {basicProfile,cleanDescOpt,cleanGraphOpt,cleanStructContOpt,
                     rtnDevIdOpt,rtnInstIdOpt,rtnLongFullDatesOpt,
                     rtnLongModifDatesOpt,rtnPatCharsOpt,rtnSafePrivOpt,
                     rtnUIDsOpt} ...]

Options

Option Description
-h, --help Show the help message and exit.
-v, --verbose Enable verbose output.
-i INPUT, --input INPUT Path to the input data.
-o OUTPUT, --output OUTPUT Path to save the output data.
--gpu GPU Specify the GPU device number (default: 0).
-s, --skull_strip Perform skull stripping.
-de, --deface Perform defacing.
-tw, --twix Process MRI raw data (twix format) and anonymize metadata.
-w, --wsi Process Whole Slide Images (WSI).
-t, --text-removal Perform text removal.
--refine Enable interactive refinement of text removal results.
-r, --rename Rename files during processing.
-p PROCESSES, --processes PROCESSES Number of processes to use for multiprocessing.
-d, --deidentification-profile Specify one or more DICOM deidentification profiles to apply (see below).

De-Identification Profiles

The -d or --deidentification-profile option allows you to specify one or more DICOM deidentification profiles. Available profiles include:

  • basicProfile
  • cleanDescOpt
  • cleanGraphOpt
  • cleanStructContOpt
  • rtnDevIdOpt
  • rtnInstIdOpt
  • rtnLongFullDatesOpt
  • rtnLongModifDatesOpt
  • rtnPatCharsOpt
  • rtnSafePrivOpt
  • rtnUIDsOpt

You can specify multiple profiles by separating them with spaces. For example:

mede-deidentify -d basicProfile cleanDescOpt

Example Usage

Here’s an example of how to use the CLI:

mede-deidentify -i /path/to/input -o /path/to/output -s -d basicProfile

This command will:

  1. Take input data from /path/to/input.
  2. Save the output to /path/to/output.
  3. Apply skull stripping.
  4. Use the basicProfile deidentification profile.

Citation

If you use our tool in your work, please cite us with the following BibTeX entry.

@article{rempe2025identification,
  title={De-identification of medical imaging data: a comprehensive tool for ensuring patient privacy},
  author={Rempe, Moritz and Heine, Lukas and Seibold, Constantin and H{\"o}rst, Fabian and Kleesiek, Jens},
  journal={European Radiology},
  pages={1--10},
  year={2025},
  publisher={Springer}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mede-0.0.12.tar.gz (76.7 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mede-0.0.12-py3-none-any.whl (41.3 MB view details)

Uploaded Python 3

File details

Details for the file mede-0.0.12.tar.gz.

File metadata

  • Download URL: mede-0.0.12.tar.gz
  • Upload date:
  • Size: 76.7 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.13

File hashes

Hashes for mede-0.0.12.tar.gz
Algorithm Hash digest
SHA256 321de3a5ae80115e274adc6b0ac699c3b8cc9dff3ac2bf434d2cc56c77d84442
MD5 9be987fd6d6bdff8876d894ea87fa07b
BLAKE2b-256 90836ff7e8c0240b28359ac2c1869334659b07fc02bdb3cc51d033d702459c83

See more details on using hashes here.

File details

Details for the file mede-0.0.12-py3-none-any.whl.

File metadata

  • Download URL: mede-0.0.12-py3-none-any.whl
  • Upload date:
  • Size: 41.3 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.13

File hashes

Hashes for mede-0.0.12-py3-none-any.whl
Algorithm Hash digest
SHA256 0d5ff22237302bd4a73120db72bec4f5d1aa7c512c47247c6d64c45195db944a
MD5 e83f49e2970bc617a89782963a3eed4e
BLAKE2b-256 bd3b6aa86d84fd7f0092d3ca54e3f10308d085a1c9f0557b29a1cf5bc5faea5c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page