Deidentifies a file in place.
Project description
Anonymized/De-identified In Place
Overview
Summary
Profile-based anonymization of a file in flywheel. Files will be anonymized according to a de-id YAML profile and will overwrite or create a new version of the source file.
Currently supported files are:
- Dicom
- JPG
- PNG
- TIFF
- XML
- JSON
- Text file defining key/value pair (e.g. MHD)
- CSV
- TSV
Currently supported field transformations are:
remove: Removes the field from the metadata.replace-with: Replaces the contents of the field with the value provided.increment-date: Offsets the date by the number of days.increment-datetime: Offsets the datetime by the number of days.hash: Replace the contents of the field with a one-way cryptographic hash.hashuid: Replaces a UID field with a hashed version of that field.jitter: Shifts value by a random number.encrypt(non-DICOM): Encrypts the field in place with AES-EAX encryptionencrypt(DICOM): Removes the field from the DICOM and stores the original value in EncryptedAttributesSequence with CMS encryptiondecrypt(non-DICOM): Decrypts the field in place with AES-EAX decryptiondecrypt(DICOM): Replace the contents of the field with the value stored in EncryptedAttributesSequence with CMS decryptionregex-sub: Replace the contents of the field with a value built from other fields and/or group extracted from the field value.keep: Do nothing.
Additionally, for DICOM, pixel data masking is supported based on pre-defined pixel coordinates (doc).
The YAML profile extends the flywheel-migration-toolkit de-id profile to flywheel metadata container. Documentation on how to write YAML configuration for the different supported files can be found in the flywheel-migration doc.
NOTE: Metadat extraction must be rerun on the output file, as the gear itself does not propagate/modify DICOM metadata.
License
MIT
Classification
utility
-
Gear Level:*
-
Project
-
Subject
-
Session
-
Acquisition
-
Analysis
[[TOC]]
Inputs
-
deid-profile
- Name: deid-profile
- Type: file
- Optional: false
- Description: A Flywheel de-identification profile specifying the de-identification actions to perform
-
subject-csv
- Name: subject-csv
- Type: file
- Optional: true
- Description: A CSV file that contains mapping values to apply for subjects during de-identification.
-
input-file
- Name: input-file
- Type: file
- Optional: false
- Description: An input file to be de-identified
deid_profile (required)
This is a YAML file that describes the protocol for de-identifying input-file. This file covers all the same functionality of Flywheel CLI de-identification.
NOTE: By default, flywheel metadata will be removed from the file. If you want
the file's metadata to be passed along to the new deidentified version of the
file, you MUST include the flywheel section:
flywheel:
file:
all: true
A simple example deid_profile.yaml looks like this:
# Configuration for DICOM de-identification
dicom:
# What date offset to use, in number of days
date-increment: -17
# Set patient age from date of birth
patient-age-from-birthdate: true
# Set patient age units as Years
patient-age-units: Y
# Remove private tags
remove-private-tags: true
# filenames block to manipulate output filename based on input filename
filenames:
# input regular expression that match source filename
- input-regex: '.*'
# formatter of the output filename
output: '{SOPInstanceUID}.dcm'
fields:
# Remove a dicom field value (e.g. remove “StationName”)
- name: StationName
remove: true
# Increment a date field by -17 days
- name: StudyDate
increment-date: true
# Increment a datetime field by -17 days
- name: AcquisitionDateTime
increment-datetime: true
# One-Way hash a dicom field to a unique string
- name: AccessionNumber
hash: true
# One-Way hash the ConcatenationUID,
# keeping the prefix (4 nodes) and suffix (2 nodes)
- name: ConcatenationUID
hashuid: true
# Zip profile to handle e.g. .dcm.zip archive. All member file will be de-id accordly
# to that same profile.
zip:
fields:
- name: comment
replace-with: FLYWHEEL
filenames:
- input-regex: (?P<used>.*).dicom.zip$
output: '{used}.dcm.zip'
hash-subdirectories: true
validate-zip-members: true
# The flywheel configuration to handle flywheel metadata de-id.
flywheel:
# subject container
subject:
# If set to true, export all source container metdata to destination container.
all: true
# session container
session:
# If set to false, only export to destination container the metadata defined
# in the fields key.
all: false
date-increment: -17
fields:
- name: operator
replace-with: REDACTED
- name: info.sessiondate
increment-date: true
- name: tags
replace-with:
- deid-exported
acquisition:
all: true
file:
all: true
# If set to true, export the file info header to the destination container.
# If set to false or missing, the file info header will be removed from the
# destination container.
include-info-header: true
subject-csv (optional)
The subject_csv facilitates subject-specific configuration of
de-identification profiles. This is a csv file that contains the column
subject.label with unique values corresponding to the subject.label
values in the project to be exported. If a subject in the project to be
exported is not listed in subject.label in the provided subject_csv
this subject will not be exported.
Subject-level customization with subject-csv and deid-profile
Requirements:
- To update subject fields, the fields must both be represented in the
subject_csv as column header and in the deid_profile as jinja variable
(i.e
"{{ var_name }}"). - If a field is represented in both the deid_profile and the
subject_csv, the value in the deid_profile will be replaced with the
value listed in the corresponding column of the subject_csv for each
subject that has a label listed in the
subject.labelcolumn. - Fields represented in the deid_profile but not in the subject_csv will be the same for all subjects.
Let's walk through an example pairing of subject_csv and deid_profile to illustrate.
The following table represents subject_csv (../tests/data/example-csv-mapping.csv):
| subject.label | DATE_INCREMENT | SUBJECT_ID | PATIENT_BD_BOOL |
|---|---|---|---|
| 001 | -15 | Patient_IDA | false |
| 002 | -20 | Patient_IDB | true |
| 003 | -30 | Patient_IDC | true |
The deid_profile:
dicom:
# date-increment can be any integer value since dicom.date-increment is
# defined in example-csv-mapping.csv
date-increment: "{{ DATE_INCREMENT }}"
# since example-csv-mapping.csv doesn't define dicom.remove-private-tags,
# all subjects will have private tags removed
remove-private-tags: true
fields:
- name: PatientBirthDate
# remove can be any boolean since dicom.fields.PatientBirthDate.remove is defined
# in example-csv-mapping.csv
remove: "{{ PATIENT_BD_BOOL }}"
- name: PatientID
# replace-with can be any string value since dicom.fields.PatientID.replace-with
# is defined in example-csv-mapping.csv
replace-with: "{{ SUBJECT_ID }}"
The resulting profile for subject 003 given the above would be:
dicom:
# date-increment can be any integer value since dicom.date-increment is
# defined in example-csv-mapping.csv
date-increment: -30
remove_private_tags: true
fields:
- name: PatientBirthDate
remove: true
- name: PatientID
replace-with: Patient_IDC
Config
-
debug
- Name: debug
- Type: boolean
- Default: false
- Description: If true, the gear will print debug information to the log.
-
tag
- Name: tag
- Type: string
- Default: "deid-inplace"
- Description: The tag prefix to append to the file after the gear runs.
The tag will be
<prefix>-PASSor<prefix>-FAIL, depending on the gear run status.
-
delete-original
- Name: delete-original
- Type: boolean
- Default: true
- Description: If True, the original file is deleted and replaced with the de-identified file, rendering the original file unrecoverable. If False, the de-identified file overwrites the original, resulting in a file version increment that can be reversed.
-
private_key
- Name: private_key
- Type: string
- Description: Asymmetric decryption: the resolver path and filename of
the private key pem file, formatted as
<group>/<project>/files/<filename>(E.g.,flywheel/test/files/private_key.pem) if the key is saved at the project level, or<group>/<project>/<subject>/files/<filename>if stored at the subject level, or<group>/<project>/<subject>/<session>/<acquisition>/<filename>if stored within an acquisition container.
-
public_key
- Name: public_key
- Type: string
- Description: Asymmetric encryption: the resolver path and filename(s)
of the public key pem file(s), formatted as
<group>/<project>/files/<filename>, with multiple key files separated by ', ' (E.g.flywheel/test/files/public_key1.pem, flywheel/test/files/public_key2.pem) if the key is saved at the project level, or<group>/<project>/<subject>/files/<filename>if stored at the subject level, or<group>/<project>/<subject>/<session>/<acquisition>/<filename>if stored within an acquisition container.
-
secret_key
- Name: secret_key
- Type: string
- Description: Symmetric encryption: the resolver path and filename(s) of
the secret key txt file(s), formatted as
<group>/<project>/files/<filename>(E.g.flywheel/test/files/secret_key.txt) if the key is saved at the project level, or<group>/<project>/<subject>/files/<filename>if stored at the subject level, or<group>/<project>/<subject>/<session>/<acquisition>/<filename>if stored within an acquisition container.
Usage
-
User uploads or identifies a file in Flywheel to deidentify
-
User runs deid-inplace (this utility gear) at the project, subject, or session level and provides the following:
- Files:
- A de-identification profile specifying how to de-identify/anonymize each file type
- an optional csv that contains a column that maps to a Flywheel session or subject metadata field and columns that specify values with which to replace DICOM header tags
- The desired input file to de-identify
- Configuration options:
- delete-original: True/False
- Files:
-
The gear will deidentify the file
-
The gear will erase or overwrite the original file depending on the config option.
Environment
This gear uses poetry as a virtual environment and dependency manager you can interact
with the gear using the following:
- Install poetry
- Install dependencies (from within gear directory):
poetry install - Enter virtual environment:
poetry shell
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file fw_gear_deid_inplace-1.3.0-py3-none-any.whl.
File metadata
- Download URL: fw_gear_deid_inplace-1.3.0-py3-none-any.whl
- Upload date:
- Size: 17.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.1.3 CPython/3.12.10 Linux/5.15.154+
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
cbbfd0050151958af40beda0a7309e1fe17defd2d9349be654e8b270b3705c6c
|
|
| MD5 |
e5f450afa313676092b98fedf0ef52bf
|
|
| BLAKE2b-256 |
b2b4fff072cf39d0380b37e860a6555c10c642abf68d0b0d90b35190d97d77df
|