Python tool to batch anonymization of DICOM files with Dockerized CTP Dicom Anonymizer Tool
Project description
TML-CTP
This project aims to depersonalize imaging data. It is developed by the Translational Machine Learning Lab at the Lausanne University Hospital for internal use as well as for open-source software distribution. The project is not affiliated to the RSNA but builds upon their imaging research tools.
DISCLAIMER: The few template scripts provided by TML-CTP
are only for testing it with your application. They are not intended to be used in a clinical or research setting, and should be considered incomplete test samples. DICOM files filtered through this program and associated scripts are not guaranteed to be free of Protected Health Information (PHI).
Description
TML-CTP
leverages the powerful RSNA MIRC Clinical Trial Processor (CTP) DICOM anonymizer (Legacy Java version) to depersonalize imaging data by providing:
- the
tml-ctp-anonymizer
Docker image which encapsulates the Dicom Anonymizer Tool (DAT) of the Clinical Trial Processor (CTP) - the
tml_ctp
Python package which batches the anonymization process via the Docker image - a few template scripts for testing DAT.
Compared to the legacy Java version of RSNA DICOM Anonymizer, TML-CTP
provides easier parallelisation based on Docker isolation, enables random date shifts per patient (as opposed to project-wide), as well as anonymization (as opposed to coding), whereby the link between patient identity and patient code is not reversible (which is the case with unsalted hashes). Please see if the more recent Python version of RSNA DICOM Anonymizer suits your use case better.
The project publishes for each version release:
- a new tagged Docker image to quay.io as quay.io/translationalml/tml-ctp-anonymizer
- a new Python package to the Python Package Index as tml_ctp.
Pre-requisites
TML-CTP
relies on the two main tools that has to be installed a-priori:
Docker
: Software containerization engine (See Installation instructions)Python 3.10
withpip
.
Installation
The installation of TML_CTP
consists of the two following tasks:
- Pull the
tml-ctp-anonymizer
image fromquay.io
:
docker pull quay.io/translationalml/tml-ctp-anonymizer:1.1.1
- In a Python 3.10 environment, install the Python package
tml_ctp
withpip
:
pip install tml_ctp==1.1.1
This will install the main tml_ctp_dat_batcher
script and a two other utility scripts (tml_ctp_clean_series_tags
and tmp_ctp_delete_identifiable_dicoms
) among with all Python dependencies.
You are ready to use TML-CTP
!🚀
How to use tml_ctp_dat_batcher
Usage
usage: tml_ctp_dat_batcher [-h] -i INPUT_FOLDERS -o OUTPUT_FOLDER -s DAT_SCRIPT [--new-ids NEW_IDS]
[--day-shift DAY_SHIFT] [--image-tag IMAGE_TAG] [--version]
Run DAT.jar (CTP DicomAnonymizerTool) with Docker to anonymize DICOM files.
options:
-h, --help show this help message and exit
-i INPUT_FOLDERS, --input-folders INPUT_FOLDERS
Parent folder including all folders of files to be anonymized.
-o OUTPUT_FOLDER, --output-folder OUTPUT_FOLDER
Folder where the anonymized files will be saved.
-s DAT_SCRIPT, --dat-script DAT_SCRIPT
Script to be used for anonymization by the DAT.jar tool.
--new-ids NEW_IDS JSON file generated by pacsifier-get-pseudonyms containing the mapping between the
old and new patient IDs. It should follow the format {"old_id1": "new_id1",
"old_id2": "new_id2", ...}. If not provided, the script will generate a new ID
randomly.
--day-shift DAY_SHIFT
JSON file containing the day shift / increment to use for each patient ID. The
old patient ID is the key and the day shift is the value, e.g. {"old_id1": 5,
"old_id2": -3, ...}. If not provided, the script will generate a new day shift
randomly.
--image-tag IMAGE_TAG
Tag of the Docker image to use for running DAT.jar (default: tml-ctp-
anonymizer:<version>).
--version show program's version number and exit
Examples
Basic
tml_ctp_dat_batcher \
-i /path/to/input/folder \
-o /path/of/output/folder \
-s /path/to/dat/script
where:
-
/path/to/input/folder
should be structured as follows:/path/to/input/folder |__ sub-<patientID1> | |__ ses-<sessionDate1> | | |__ Series1-Description # Can be any name | | | |__ 001.dcm | | | |__ 002.dcm | | | |__ ... | | |__ Series2-Description # Can be any name | | | |__ 001.dcm | | | |__ 002.dcm | | | |__ ... | | |__ ... | |__ ses-<sessionDate2> | | |__ Series1-Description # Can be any name | | | |__ 001.dcm | | | |__ 002.dcm | | | |__ ... | | |__ Series2-Description # Can be any name | | | |__ 001.dcm | | | |__ 002.dcm | | | |__ ... | | |__ ... | |_ ... |__ sub-<patientID2> | |__ ses-<sessionDate1> | | |__ Series1-Description # Can be any name | | | |__ 001.dcm | | | |__ 002.dcm | | | |__ ... | | |__ Series2-Description # Can be any name | | | |__ 001.dcm | | | |__ 002.dcm | | | |__ ... | | |__ ...
-
/path/of/output/folder
will keep the same structure but the patientIDs and sessionDates will be replaced by the new IDs and Dates -
/path/to/dat/script
should point to the anonymizer script used by DAT. The doc for the syntax can be found here: https://mircwiki.rsna.org/index.php?title=The_CTP_DICOM_Anonymizer.
Advanced Usage with --new-ids
and --day-shift
You can ensure consistency in the depersonalization process by providing JSON files generated by pacsifier-get-pseudonyms that set specific new patient IDs and day shifts for each patient, rather than relying on randomly generated values.
tml_ctp_dat_batcher \
-i /path/to/input/folder \
-o /path/of/output/folder \
-s /path/to/dat/script \
--new-ids /path/to/new_ids.json \
--day-shift /path/to/day_shift.json
## How to use `tml_ctp_clean_series_tags`
After running `tml_ctp_dat_batcher`, you may still need to make sure any PatientID or SeriesDate are not present in the DICOM tags at all level (such as in sequences). You can use `tml_ctp_clean_series_tags` for that.
### Usage
```output
usage: tml_ctp_clean_series_tags [-h] [--CTP_data_folder CTP_DATA_FOLDER] [--original_cohort ORIGINAL_COHORT]
[--ids_file IDS_FILE]
Dangerous tags process and recursive overwrite of DICOM images.
options:
-h, --help show this help message and exit
--CTP_data_folder CTP_DATA_FOLDER
Path to the CTP data folder.
--original_cohort ORIGINAL_COHORT
Path to the original cohort folder.
--ids_file IDS_FILE Path to the IDs file generated byt the CTP batcher file.
How to use tml_ctp_delete_identifiable_dicoms
After running tml_ctp_dat_batcher
, you may still need to delete some files that may
have burn-in patient data, such as dose reports, or visible face, such as T1w MPRAGEs.
You can use delete_identifiable_dicoms.py
for that.
Usage
usage: tml_ctp_delete_identifiable_dicoms [-h] --in_folder IN_FOLDER [--delete_T1w] [--delete_T2w]
Delete DICOM files that could lead to identifying the patient.
options:
-h, --help show this help message and exit
--in_folder IN_FOLDER, -d IN_FOLDER
Root dir to the dicom files to be screened for identifiables files.
--delete_T1w, -t1w Delete potentially identifiable T1-weighted images such as MPRAGE
--delete_T2w, -t2w Delete potentially identifiable T2-weighted images such as FLAIR
For Developers
Extra pre-requisites
You will need make
(See official docs for installation) which is used by this project to ease the execution of the manual and CI/CD workflow tasks involved in the whole project software development life cycle.
Note that make
is a Linux tool. If you are on Windows, you will have to use the Windows Subsystem for Linux (WSL). See WSL Installation instructions.
List of make
commands
build-docker Builds the Docker image
install-python Install the python package with pip
install-python-all Install the python package with all dependencies used for development
build-python-wheel Build the python wheel
clean-python-build Clean the python build directory
install-python-wheel Install the python wheel
tests Run all tests
clean-tests Clean the directories generated by the pytest tests
help List available make command for this project
Manual installation
Manual installation of TML-CTP
consists of three following steps. In a terminal with Python 3.10
available:
- Clone this repository locally:
cd <preferred-installation-directory>
git clone git@github.com:TranslationalML/tml-ctp.git
cd tml-ctp
- Build the Docker image with
make
:
make build-docker
- Install all the Python development environment with
make
:
make install-python-all
This will install with pip
:
- all Python dependencies needed for development (such as
black
for the Python code formatter orpytest
for the tests) - the package
tml_ctp
including its maintml_ctp_dat_batcher
script and a few other utility scripts - all Python dependencies of the package.
You are ready to develop TML-CTP
!🚀
How to run the tests
For convenience you can run in a terminal with Python 3.10 available the following command:
make tests
which will take care of (1) re-building the Docker image if necessary, (2) cleaning the test directories, (3) re-installing the package with pip
, and (4) executing the tests with pytest
.
At the end, code coverage reports in different formats are generated and saved inside the tests/report
directory.
Funding
This project is partially funded by the Lundin Family Brain Tumour Research Center.
Acknowledgments
We would like to thank Alexandre Wetzel and Augustin Augier for their valuable contributions to this project.
Authors
This project was developed and maintained by the following contributors:
- Sébastien Tourbier
- Jonathan Rafael Patino-Lopez
- Elodie Savary
- Jonas Richiardi
License
Copyright 2023-2024 Lausanne University and Lausanne University Hospital, Switzerland & Contributors
This Source Code Form is subject to the terms of the Mozilla Public License, v. 2.0. If a copy of the MPL was not distributed with this file, You can obtain one at http://mozilla.org/MPL/2.0/.
Test Data License
The DICOM series files used for testing in this project are generated from the PACSMAN_data library.
These files are distributed under the Creative Commons Attribution 4.0 International (CC BY 4.0) License. For more information, please refer to the test data README and the CC BY 4.0 License.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file tml_ctp-1.1.1.tar.gz
.
File metadata
- Download URL: tml_ctp-1.1.1.tar.gz
- Upload date:
- Size: 28.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.10.14
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | fa51eb9485833cd2334ecc3c31eaab0ac591305d66937d7095bab4fe327c50e0 |
|
MD5 | 3a4605fa702f06092306b6d3d1673945 |
|
BLAKE2b-256 | 09098727a2be94a6782de44fbfee74e53ae43e3c9f7aa8a6772f4d73507e9775 |
File details
Details for the file tml_ctp-1.1.1-py3-none-any.whl
.
File metadata
- Download URL: tml_ctp-1.1.1-py3-none-any.whl
- Upload date:
- Size: 25.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.10.14
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | fdd574d87d6deae1977f74d6ee6b1b9cca4f8c76b274a729371a72fb7657749f |
|
MD5 | 67f8d8804f3b7500881282615862f495 |
|
BLAKE2b-256 | 1a40d01a32e1983e98438a4c498f906b66b79689e085e4d0a978af01cae388f6 |