Deidentifies a file in place.
Project description
Anonymized/De-identified In Place
Overview
Summary
Profile-based anonymization of a file in flywheel. Files will be anonymized according to a de-id YAML profile and will overwrite or create a new version of the source file.
Currently supported files are:
- Dicom
- JPG
- PNG
- TIFF
- XML
- JSON
- Text file defining key/value pair (e.g. MHD)
- CSV
- TSV
Currently supported field transformations are:
remove: Removes the field from the metadata.replace-with: Replaces the contents of the field with the value provided.increment-date: Offsets the date by the number of days.increment-datetime: Offsets the datetime by the number of days.hash: Replace the contents of the field with a one-way cryptographic hash.hashuid: Replaces a UID field with a hashed version of that field.jitter: Shifts value by a random number.encrypt(non-DICOM): Encrypts the field in place with AES-EAX encryptionencrypt(DICOM): Removes the field from the DICOM and stores the original value in EncryptedAttributesSequence with CMS encryptiondecrypt(non-DICOM): Decrypts the field in place with AES-EAX decryptiondecrypt(DICOM): Replace the contents of the field with the value stored in EncryptedAttributesSequence with CMS decryptionregex-sub: Replace the contents of the field with a value built from other fields and/or group extracted from the field value.keep: Do nothing.
Additionally, for DICOM, pixel data masking is supported based on pre-defined pixel coordinates (doc).
The YAML profile extends the flywheel-migration-toolkit de-id profile to flywheel metadata container. Documentation on how to write YAML configuration for the different supported files can be found in the flywheel-migration doc.
NOTE: Metadat extraction must be rerun on the output file, as the gear itself does not propagate/modify DICOM metadata.
License
MIT
Classification
utility
-
Gear Level:*
-
Project
-
Subject
-
Session
-
Acquisition
-
Analysis
[[TOC]]
Inputs
-
deid-profile
- Name: deid-profile
- Type: file
- Optional: false
- Description: A Flywheel de-identification profile specifying the de-identification actions to perform
-
subject-csv
- Name: subject-csv
- Type: file
- Optional: true
- Description: A CSV file that contains mapping values to apply for subjects during de-identification.
-
input-file
- Name: input-file
- Type: file
- Optional: false
- Description: An input file to be de-identified
deid_profile (required)
This is a YAML file that describes the protocol for de-identifying input-file. This file covers all the same functionality of Flywheel CLI de-identification.
NOTE: By default, flywheel metadata will be removed from the file. If you want
the file's metadata to be passed along to the new deidentified version of the
file, you MUST include the flywheel section:
flywheel:
file:
all: true
A simple example deid_profile.yaml looks like this:
# Configuration for DICOM de-identification
dicom:
# What date offset to use, in number of days
date-increment: -17
# Set patient age from date of birth
patient-age-from-birthdate: true
# Set patient age units as Years
patient-age-units: Y
# Remove private tags
remove-private-tags: true
# filenames block to manipulate output filename based on input filename
filenames:
# input regular expression that match source filename
- input-regex: '.*'
# formatter of the output filename
output: '{SOPInstanceUID}.dcm'
fields:
# Remove a dicom field value (e.g. remove “StationName”)
- name: StationName
remove: true
# Increment a date field by -17 days
- name: StudyDate
increment-date: true
# Increment a datetime field by -17 days
- name: AcquisitionDateTime
increment-datetime: true
# One-Way hash a dicom field to a unique string
- name: AccessionNumber
hash: true
# One-Way hash the ConcatenationUID,
# keeping the prefix (4 nodes) and suffix (2 nodes)
- name: ConcatenationUID
hashuid: true
# Zip profile to handle e.g. .dcm.zip archive. All member file will be de-id accordly
# to that same profile.
zip:
fields:
- name: comment
replace-with: FLYWHEEL
filenames:
- input-regex: (?P<used>.*).dicom.zip$
output: '{used}.dcm.zip'
hash-subdirectories: true
validate-zip-members: true
# The flywheel configuration to handle flywheel metadata de-id.
flywheel:
# subject container
subject:
# If set to true, export all source container metdata to destination container.
all: true
# session container
session:
# If set to false, only export to destination container the metadata defined
# in the fields key.
all: false
date-increment: -17
fields:
- name: operator
replace-with: REDACTED
- name: info.sessiondate
increment-date: true
- name: tags
replace-with:
- deid-exported
acquisition:
all: true
file:
all: true
# If set to true, export the file info header to the destination container.
# If set to false or missing, the file info header will be removed from the
# destination container.
include-info-header: true
subject-csv (optional)
The subject-csv facilitates subject or subject/session-specific configuration of de-identification profiles.
When session_level_profile is False and a subject-csv is provided, the csv file
must contain the column subject.label with unique values corresponding to the
subject.label value of the file to be deidentified.
When session_level_profile is True and a subject-csv is provided, the csv file
must contain columns subject.label and session.label with unique values
corresponding to the subject/session of the file to be deidentified.
Subject-level customization with subject-csv and deid-profile
Requirements:
- To update subject fields, the fields must both be represented in the
subject_csv as column header and in the deid_profile as jinja variable
(i.e
"{{ var_name }}"). - If a field is represented in both the deid_profile and the
subject_csv, the value in the deid_profile will be replaced with the
value listed in the corresponding column of the subject_csv for each
subject that has a label listed in the
subject.labelcolumn. - Fields represented in the deid_profile but not in the subject_csv will be the same for all subjects.
Let's walk through an example pairing of subject_csv and deid_profile to illustrate.
The following table represents subject_csv (../tests/data/example-csv-mapping.csv):
| subject.label | DATE_INCREMENT | SUBJECT_ID | PATIENT_BD_BOOL |
|---|---|---|---|
| 001 | -15 | Patient_IDA | false |
| 002 | -20 | Patient_IDB | true |
| 003 | -30 | Patient_IDC | true |
The deid_profile:
dicom:
# date-increment can be any integer value since dicom.date-increment is
# defined in example-csv-mapping.csv
date-increment: "{{ DATE_INCREMENT }}"
# since example-csv-mapping.csv doesn't define dicom.remove-private-tags,
# all subjects will have private tags removed
remove-private-tags: true
fields:
- name: PatientBirthDate
# remove can be any boolean since dicom.fields.PatientBirthDate.remove is defined
# in example-csv-mapping.csv
remove: "{{ PATIENT_BD_BOOL }}"
- name: PatientID
# replace-with can be any string value since dicom.fields.PatientID.replace-with
# is defined in example-csv-mapping.csv
replace-with: "{{ SUBJECT_ID }}"
The resulting profile for subject 003 given the above would be:
dicom:
# date-increment can be any integer value since dicom.date-increment is
# defined in example-csv-mapping.csv
date-increment: -30
remove_private_tags: true
fields:
- name: PatientBirthDate
remove: true
- name: PatientID
replace-with: Patient_IDC
Migrating to Session-level customization
To create customized deid profiles at the Session level instead of at the Subject level, note that the following must be true:
- The subject-csv file must have both a
subject.labelandsession.labelcolumn session_level_profilemust be set toTrue
So, for example, the subject-csv represented in the previous Subject-level
customization would need to be updated to include a session.label column:
| subject.label | session.label | DATE_INCREMENT | SUBJECT_ID | PATIENT_BD_BOOL |
|---|---|---|---|---|
| 001 | SCREEN | -15 | Patient_IDA | false |
| 001 | WK1 | -20 | Patient_IDA | false |
| 001 | WK2 | -25 | Patient_IDA | false |
| 002 | SCREEN | -20 | Patient_IDB | true |
| 003 | SCREEN | -30 | Patient_IDC | true |
Config
-
debug
- Name: debug
- Type: boolean
- Default: false
- Description: If true, the gear will print debug information to the log.
-
tag
- Name: tag
- Type: string
- Default: "deid-inplace"
- Description: The tag prefix to append to the file after the gear runs.
The tag will be
<prefix>-PASSor<prefix>-FAIL, depending on the gear run status.
-
delete-original
- Name: delete-original
- Type: boolean
- Default: true
- Description: If True, the original file is deleted and replaced with the de-identified file, rendering the original file unrecoverable. If False, the de-identified file overwrites the original, resulting in a file version increment that can be reversed.
-
private_key
- Name: private_key
- Type: string
- Description: Asymmetric decryption: the resolver path and filename of
the private key pem file, formatted as
<group>/<project>/files/<filename>(E.g.,flywheel/test/files/private_key.pem) if the key is saved at the project level, or<group>/<project>/<subject>/files/<filename>if stored at the subject level, or<group>/<project>/<subject>/<session>/<acquisition>/<filename>if stored within an acquisition container.
-
public_key
- Name: public_key
- Type: string
- Description: Asymmetric encryption: the resolver path and filename(s)
of the public key pem file(s), formatted as
<group>/<project>/files/<filename>, with multiple key files separated by ', ' (E.g.flywheel/test/files/public_key1.pem, flywheel/test/files/public_key2.pem) if the key is saved at the project level, or<group>/<project>/<subject>/files/<filename>if stored at the subject level, or<group>/<project>/<subject>/<session>/<acquisition>/<filename>if stored within an acquisition container.
-
secret_key
- Name: secret_key
- Type: string
- Description: Symmetric encryption: the resolver path and filename(s) of
the secret key txt file(s), formatted as
<group>/<project>/files/<filename>(E.g.flywheel/test/files/secret_key.txt) if the key is saved at the project level, or<group>/<project>/<subject>/files/<filename>if stored at the subject level, or<group>/<project>/<subject>/<session>/<acquisition>/<filename>if stored within an acquisition container.
-
session-level-profile
- Name: session-level-profile
- Type: boolean
- Description: If enabled, session-level deid profiles will be created, instead of subject-level, according to csv lookup.
- Note: If
session_level_profileis enabled, bothsubject.labelandsession.labelare required columns.
Usage
-
User uploads or identifies a file in Flywheel to deidentify
-
User runs deid-inplace (this utility gear) at the project, subject, or session level and provides the following:
- Files:
- A de-identification profile specifying how to de-identify/anonymize each file type
- an optional csv that contains a column that maps to a Flywheel session or subject metadata field and columns that specify values with which to replace DICOM header tags
- The desired input file to de-identify
- Configuration options:
- delete-original: True/False
- Files:
-
The gear will deidentify the file
-
The gear will erase or overwrite the original file depending on the config option.
Environment
This gear uses poetry as a virtual environment and dependency manager you can interact
with the gear using the following:
- Install poetry
- Install dependencies (from within gear directory):
poetry install - Enter virtual environment:
poetry shell
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file fw_gear_deid_inplace-1.4.0-py3-none-any.whl.
File metadata
- Download URL: fw_gear_deid_inplace-1.4.0-py3-none-any.whl
- Upload date:
- Size: 20.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.10.9 {"installer":{"name":"uv","version":"0.10.9","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Alpine Linux","version":"3.24.0_alpha20260127","id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
12bfb62f2f6049a33f319da692b8bab5a08202402add67118e8fc8da60e1a6f2
|
|
| MD5 |
26478300161086ae2651889110972af9
|
|
| BLAKE2b-256 |
a1f420ea30662a2d352c234e9b6e22c205831e5bc173701fea997447c40dc60c
|