Skip to main content

Python tool to batch anonymization of DICOM files with Dockerized CTP Dicom Anonymizer Tool

Project description

TML-CTP

This project aims to depersonalize imaging data. It is developed by the Translational Machine Learning Lab at the Lausanne University Hospital for internal use as well as for open-source software distribution.

DISCLAIMER: The few template scripts provided by TML-CTP are only for testing it with your application. They are not intended to be used in a clinical or research setting, and should be considered incomplete test samples. DICOM files filtered through this program and associated scripts are not guaranteed to be free of Protected Health Information (PHI).

Description

TML-CTP leverages Clinical Trial Processor (CTP) platform to depersonalize imaging data by providing:

  • the tml-ctp-anonymizer Docker image which encapsulates the Dicom Anonymizer Tool (DAT) of the Clinical Trial Processor (CTP):
  • the tml_ctp Python package which batches the anonymization process via the Docker image
  • a few template scripts for testing DAT.

The project publishes for each version release:

Pre-requisites

TML-CTP relies on the two main tools that has to be installed a-priori:

Installation

The installation of TML_CTP consists of the two following tasks:

  1. Pull the tml-ctp-anonymizer image from quay.io:
docker pull quay.io/translationalml/tml-ctp-anonymizer:1.0.0
  1. In a Python 3.10 environment, install the Python package tml_ctp with pip:
pip install tmp_ctp==1.0.0

This will install the main tml_ctp_dat_batcher script and a two other utility scripts (tml_ctp_clean_series_tags and tmp_ctp_delete_identifiable_dicoms) among with all Python dependencies.

You are ready to use TML-CTP!🚀

How to use tml_ctp_dat_batcher

Usage

usage: tml_ctp_dat_batcher [-h] -i INPUT_FOLDERS -o OUTPUT_FOLDER -s DAT_SCRIPT [--new-ids NEW_IDS]
                           [--day-shift DAY_SHIFT] [--image-tag IMAGE_TAG] [--version]

Run DAT.jar (CTP DicomAnonymizerTool) with Docker to anonymize DICOM files.

options:
  -h, --help            show this help message and exit
  -i INPUT_FOLDERS, --input-folders INPUT_FOLDERS
                        Parent folder including all folders of files to be anonymized.
  -o OUTPUT_FOLDER, --output-folder OUTPUT_FOLDER
                        Folder where the anonymized files will be saved.
  -s DAT_SCRIPT, --dat-script DAT_SCRIPT
                        Script to be used for anonymization by the DAT.jar tool.
  --new-ids NEW_IDS     JSON file generated by pacsman-get-pseudonyms containing the mapping between the
                        old and new patient IDs. It should in the format {'old_id1': 'new_id1',
                        'old_id2': 'new_id2', ...}. If not provided, the script will generate a new ID
                        randomly.
  --day-shift DAY_SHIFT
                        JSON file containing the day shift / increment to use for each patient ID. The
                        old patient ID is the key and the day shift is the value, e.g. {'old_id1': 5,
                        'old_id2': -3, ...}. If not provided, the script will generate a new day shift
                        randomly.
  --image-tag IMAGE_TAG
                        Tag of the Docker image to use for running DAT.jar (default: tml-ctp-
                        anonymizer:<version>).
  --version             show program's version number and exit

Examples

Basic

tml_ctp_dat_batcher \
  -i /path/to/input/folder \
  -o /path/of/output/folder \
  -s /path/to/dat/script

where:

  • /path/to/input/folder should be structured as follows:

    /path/to/input/folder
    |__ sub-<patientID1>
    |     |__ ses-<sessionDate1>
    |     |     |__ Series1-Description  # Can be any name
    |     |     |     |__ 001.dcm
    |     |     |     |__ 002.dcm
    |     |     |     |__ ...
    |     |     |__ Series2-Description  # Can be any name
    |     |     |     |__ 001.dcm
    |     |     |     |__ 002.dcm
    |     |     |     |__ ...
    |     |     |__ ...
    |     |__ ses-<sessionDate2>
    |     |     |__ Series1-Description  # Can be any name
    |     |     |     |__ 001.dcm
    |     |     |     |__ 002.dcm
    |     |     |     |__ ...
    |     |     |__ Series2-Description  # Can be any name
    |     |     |     |__ 001.dcm
    |     |     |     |__ 002.dcm
    |     |     |     |__ ...
    |     |     |__ ...
    |     |_ ...
    |__ sub-<patientID2>
    |     |__ ses-<sessionDate1>
    |     |     |__ Series1-Description  # Can be any name
    |     |     |     |__ 001.dcm
    |     |     |     |__ 002.dcm
    |     |     |     |__ ...
    |     |     |__ Series2-Description  # Can be any name
    |     |     |     |__ 001.dcm
    |     |     |     |__ 002.dcm
    |     |     |     |__ ...
    |     |     |__ ...
    
  • /path/of/output/folder will keep the same structure but the patientIDs and sessionDates will be replaced by the new IDs and Dates

  • /path/to/dat/script should point to the anonymizer script used by DAT. The doc for the syntax can be found here: https://mircwiki.rsna.org/index.php?title=The_CTP_DICOM_Anonymizer.

Note: the RSNA MIRC Clinical Trial Processor (CTP) DICOM anonymiser is the core engine. CTP doc is here: https://mircwiki.rsna.org/index.php?title=MIRC_CTP.

How to use tml_ctp_clean_series_tags

After running tml_ctp_dat_batcher, you may still need to make sure any PatientID or SeriesDate are not present in the DICOM tags at all level (such as in sequences). You can use tml_ctp_clean_series_tags for that.

Usage

usage: tml_ctp_clean_series_tags [-h] [--CTP_data_folder CTP_DATA_FOLDER] [--original_cohort ORIGINAL_COHORT]
                                 [--ids_file IDS_FILE]

Dangerous tags process and recursive overwrite of DICOM images.

options:
  -h, --help            show this help message and exit
  --CTP_data_folder CTP_DATA_FOLDER
                        Path to the CTP data folder.
  --original_cohort ORIGINAL_COHORT
                        Path to the original cohort folder.
  --ids_file IDS_FILE   Path to the IDs file generated byt the CTP batcher file.

How to use tml_ctp_delete_identifiable_dicoms

After running tml_ctp_dat_batcher, you may still need to delete some files that may have burn-in patient data, such as dose reports, or visible face, such as T1w MPRAGEs. You can use delete_identifiable_dicoms.py for that.

Usage

usage: tml_ctp_delete_identifiable_dicoms [-h] --in_folder IN_FOLDER [--delete_T1w] [--delete_T2w]

Delete DICOM files that could lead to identifying the patient.

options:
  -h, --help            show this help message and exit
  --in_folder IN_FOLDER, -d IN_FOLDER
                        Root dir to the dicom files to be screened for identifiables files.
  --delete_T1w, -t1w    Delete potentially identifiable T1-weighted images such as MPRAGE
  --delete_T2w, -t2w    Delete potentially identifiable T2-weighted images such as FLAIR

For Developers

Extra pre-requisites

You will need make (See official docs for installation) which is used by this project to ease the execution of the manual and CI/CD workflow tasks involved in the whole project software development life cycle.

Note that make is a Linux tool. If you are on Windows, you will have to use the Windows Subsystem for Linux (WSL). See WSL Installation instructions.

List of make commands

build-docker                   Builds the Docker image
install-python                 Install the python package with pip
install-python-all             Install the python package with all dependencies used for development
build-python-wheel             Build the python wheel
clean-python-build             Clean the python build directory
install-python-wheel           Install the python wheel
tests                          Run all tests
clean-tests                    Clean the directories generated by the pytest tests
help                           List available make command for this project

Manual installation

Manual installation of TML-CTP consists of three following steps. In a terminal with Python 3.10 available:

  1. Clone this repository locally:
cd <preferred-installation-directory>
git clone git@github.com:TranslationalML/tml-ctp.git
cd tml-ctp
  1. Build the Docker image with make:
make build-docker
  1. Install all the Python development environment with make:
make install-python-all

This will install with pip:

  • all Python dependencies needed for development (such as black for the Python code formatter or pytest for the tests)
  • the package tml_ctp including its main tml_ctp_dat_batcher script and a few other utility scripts
  • all Python dependencies of the package.

You are ready to develop TML-CTP!🚀

How to run the tests

For convenience you can run in a terminal with Python 3.10 available the following command:

make tests

which will take care of (1) re-building the Docker image if necessary, (2) cleaning the test directories, (3) re-installing the package with pip, and (4) executing the tests with pytest.

At the end, code coverage reports in different formats are generated and saved inside the tests/report directory.

License

Copyright 2023-2024 Lausanne University and Lausanne University Hospital, Switzerland & Contributors

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tml_ctp-1.0.0.tar.gz (23.2 kB view hashes)

Uploaded Source

Built Distribution

tml_ctp-1.0.0-py3-none-any.whl (22.6 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page