Skip to main content

Deidentifies a file in place.

Project description

Anonymized/De-identified In Place

Overview

Summary

Profile-based anonymization of a file in flywheel. Files will be anonymized according to a de-id YAML profile and will overwrite or create a new version of the source file.

Currently supported files are:

  • Dicom
  • JPG
  • PNG
  • TIFF
  • XML
  • JSON
  • Text file defining key/value pair (e.g. MHD)
  • CSV
  • TSV

Currently supported field transformations are:

  • remove: Removes the field from the metadata.
  • replace-with: Replaces the contents of the field with the value provided.
  • increment-date: Offsets the date by the number of days.
  • increment-datetime: Offsets the datetime by the number of days.
  • hash: Replace the contents of the field with a one-way cryptographic hash.
  • hashuid: Replaces a UID field with a hashed version of that field.
  • jitter: Shifts value by a random number.
  • encrypt (non-DICOM): Encrypts the field in place with AES-EAX encryption
  • encrypt (DICOM): Removes the field from the DICOM and stores the original value in EncryptedAttributesSequence with CMS encryption
  • decrypt (non-DICOM): Decrypts the field in place with AES-EAX decryption
  • decrypt (DICOM): Replace the contents of the field with the value stored in EncryptedAttributesSequence with CMS decryption
  • regex-sub: Replace the contents of the field with a value built from other fields and/or group extracted from the field value.
  • keep: Do nothing.

Additionally, for DICOM, pixel data masking is supported based on pre-defined pixel coordinates (doc).

The YAML profile extends the flywheel-migration-toolkit de-id profile to flywheel metadata container. Documentation on how to write YAML configuration for the different supported files can be found in the flywheel-migration doc.

NOTE: Metadat extraction must be rerun on the output file, as the gear itself does not propagate/modify DICOM metadata.

License

MIT

Classification

utility

  • Gear Level:*

  • Project

  • Subject

  • Session

  • Acquisition

  • Analysis


[[TOC]]


Inputs

  • deid-profile

    • Name: deid-profile
    • Type: file
    • Optional: false
    • Description: A Flywheel de-identification profile specifying the de-identification actions to perform
  • subject-csv

    • Name: subject-csv
    • Type: file
    • Optional: true
    • Description: A CSV file that contains mapping values to apply for subjects during de-identification.
  • input-file

    • Name: input-file
    • Type: file
    • Optional: false
    • Description: An input file to be de-identified

deid_profile (required)

This is a YAML file that describes the protocol for de-identifying input-file. This file covers all the same functionality of Flywheel CLI de-identification.

NOTE: By default, flywheel metadata will be removed from the file. If you want the file's metadata to be passed along to the new deidentified version of the file, you MUST include the flywheel section:

flywheel:
  file:
    all: true

A simple example deid_profile.yaml looks like this:

# Configuration for DICOM de-identification
dicom:
  # What date offset to use, in number of days
  date-increment: -17

  # Set patient age from date of birth
  patient-age-from-birthdate: true
  # Set patient age units as Years
  patient-age-units: Y
  # Remove private tags
  remove-private-tags: true

  # filenames block to manipulate output filename based on input filename
  filenames:
      # input regular expression that match source filename
    - input-regex: '.*'
      # formatter of the output filename
      output: '{SOPInstanceUID}.dcm'

  fields:
    # Remove a dicom field value (e.g. remove “StationName”)
    - name: StationName
      remove: true

    # Increment a date field by -17 days
    - name: StudyDate
      increment-date: true

    # Increment a datetime field by -17 days
    - name: AcquisitionDateTime
      increment-datetime: true

    # One-Way hash a dicom field to a unique string
    - name: AccessionNumber
      hash: true

    # One-Way hash the ConcatenationUID,
    # keeping the prefix (4 nodes) and suffix (2 nodes)
    - name: ConcatenationUID
      hashuid: true

# Zip profile to handle e.g. .dcm.zip archive. All member file will be de-id accordly
# to that same profile. 
zip:
  fields:
  - name: comment
    replace-with: FLYWHEEL
  filenames:
  - input-regex: (?P<used>.*).dicom.zip$
    output: '{used}.dcm.zip'
  hash-subdirectories: true
  validate-zip-members: true

# The flywheel configuration to handle flywheel metadata de-id.
flywheel:
  # subject container
  subject:
    # If set to true, export all source container metdata to destination container.
    all: true

  # session container
  session:
    # If set to false, only export to destination container the metadata defined
    # in the fields key.
    all: false
    date-increment: -17
    fields:
      - name: operator
        replace-with: REDACTED
      - name: info.sessiondate
        increment-date: true
      - name: tags
        replace-with: 
          - deid-exported

  acquisition:
    all: true

  file:
    all: true
    # If set to true, export the file info header to the destination container.
    # If set to false or missing, the file info header will be removed from the 
    # destination container.
    include-info-header: true

subject-csv (optional)

The subject-csv facilitates subject or subject/session-specific configuration of de-identification profiles.

When session_level_profile is False and a subject-csv is provided, the csv file must contain the column subject.label with unique values corresponding to the subject.label value of the file to be deidentified.

When session_level_profile is True and a subject-csv is provided, the csv file must contain columns subject.label and session.label with unique values corresponding to the subject/session of the file to be deidentified.

Subject-level customization with subject-csv and deid-profile

Requirements:

  • To update subject fields, the fields must both be represented in the subject_csv as column header and in the deid_profile as jinja variable (i.e "{{ var_name }}").
  • If a field is represented in both the deid_profile and the subject_csv, the value in the deid_profile will be replaced with the value listed in the corresponding column of the subject_csv for each subject that has a label listed in the subject.label column.
  • Fields represented in the deid_profile but not in the subject_csv will be the same for all subjects.

Let's walk through an example pairing of subject_csv and deid_profile to illustrate.

The following table represents subject_csv (../tests/data/example-csv-mapping.csv):

subject.label DATE_INCREMENT SUBJECT_ID PATIENT_BD_BOOL
001 -15 Patient_IDA false
002 -20 Patient_IDB true
003 -30 Patient_IDC true

The deid_profile:

dicom:
  # date-increment can be any integer value since dicom.date-increment is
  # defined in example-csv-mapping.csv
  date-increment: "{{ DATE_INCREMENT }}"
  # since example-csv-mapping.csv doesn't define dicom.remove-private-tags,
  # all subjects will have private tags removed
  remove-private-tags: true
  fields:
    - name: PatientBirthDate
      # remove can be any boolean since dicom.fields.PatientBirthDate.remove is defined
      # in example-csv-mapping.csv
      remove: "{{ PATIENT_BD_BOOL }}"
    - name: PatientID
      # replace-with can be any string value since dicom.fields.PatientID.replace-with
      # is defined in example-csv-mapping.csv
      replace-with: "{{ SUBJECT_ID }}"

The resulting profile for subject 003 given the above would be:

dicom:
  # date-increment can be any integer value since dicom.date-increment is
  # defined in example-csv-mapping.csv
  date-increment: -30
  remove_private_tags: true
  fields:
    - name: PatientBirthDate
      remove: true
    - name: PatientID
      replace-with: Patient_IDC 
Migrating to Session-level customization

To create customized deid profiles at the Session level instead of at the Subject level, note that the following must be true:

  • The subject-csv file must have both a subject.label and session.label column
  • session_level_profile must be set to True

So, for example, the subject-csv represented in the previous Subject-level customization would need to be updated to include a session.label column:

subject.label session.label DATE_INCREMENT SUBJECT_ID PATIENT_BD_BOOL
001 SCREEN -15 Patient_IDA false
001 WK1 -20 Patient_IDA false
001 WK2 -25 Patient_IDA false
002 SCREEN -20 Patient_IDB true
003 SCREEN -30 Patient_IDC true

Config

  • debug

    • Name: debug
    • Type: boolean
    • Default: false
    • Description: If true, the gear will print debug information to the log.
  • tag

    • Name: tag
    • Type: string
    • Default: "deid-inplace"
    • Description: The tag prefix to append to the file after the gear runs. The tag will be <prefix>-PASS or <prefix>-FAIL, depending on the gear run status.
  • delete-original

    • Name: delete-original
    • Type: boolean
    • Default: true
    • Description: If True, the original file is deleted and replaced with the de-identified file, rendering the original file unrecoverable. If False, the de-identified file overwrites the original, resulting in a file version increment that can be reversed.
  • private_key

    • Name: private_key
    • Type: string
    • Description: Asymmetric decryption: the resolver path and filename of the private key pem file, formatted as <group>/<project>/files/<filename> (E.g., flywheel/test/files/private_key.pem) if the key is saved at the project level, or <group>/<project>/<subject>/files/<filename> if stored at the subject level, or <group>/<project>/<subject>/<session>/<acquisition>/<filename> if stored within an acquisition container.
  • public_key

    • Name: public_key
    • Type: string
    • Description: Asymmetric encryption: the resolver path and filename(s) of the public key pem file(s), formatted as <group>/<project>/files/<filename>, with multiple key files separated by ', ' (E.g. flywheel/test/files/public_key1.pem, flywheel/test/files/public_key2.pem) if the key is saved at the project level, or <group>/<project>/<subject>/files/<filename> if stored at the subject level, or <group>/<project>/<subject>/<session>/<acquisition>/<filename> if stored within an acquisition container.
  • secret_key

    • Name: secret_key
    • Type: string
    • Description: Symmetric encryption: the resolver path and filename(s) of the secret key txt file(s), formatted as <group>/<project>/files/<filename> (E.g. flywheel/test/files/secret_key.txt) if the key is saved at the project level, or <group>/<project>/<subject>/files/<filename> if stored at the subject level, or <group>/<project>/<subject>/<session>/<acquisition>/<filename> if stored within an acquisition container.
  • session-level-profile

    • Name: session-level-profile
    • Type: boolean
    • Description: If enabled, session-level deid profiles will be created, instead of subject-level, according to csv lookup.
    • Note: If session_level_profile is enabled, both subject.label and session.label are required columns.

Usage

  1. User uploads or identifies a file in Flywheel to deidentify

  2. User runs deid-inplace (this utility gear) at the project, subject, or session level and provides the following:

    • Files:
      • A de-identification profile specifying how to de-identify/anonymize each file type
      • an optional csv that contains a column that maps to a Flywheel session or subject metadata field and columns that specify values with which to replace DICOM header tags
      • The desired input file to de-identify
    • Configuration options:
      • delete-original: True/False
  3. The gear will deidentify the file

  4. The gear will erase or overwrite the original file depending on the config option.

Environment

This gear uses poetry as a virtual environment and dependency manager you can interact with the gear using the following:

  1. Install poetry
  2. Install dependencies (from within gear directory): poetry install
  3. Enter virtual environment: poetry shell

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

fw_gear_deid_inplace-1.3.2-py3-none-any.whl (19.8 kB view details)

Uploaded Python 3

File details

Details for the file fw_gear_deid_inplace-1.3.2-py3-none-any.whl.

File metadata

  • Download URL: fw_gear_deid_inplace-1.3.2-py3-none-any.whl
  • Upload date:
  • Size: 19.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.9 {"installer":{"name":"uv","version":"0.10.9","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Alpine Linux","version":"3.24.0_alpha20260127","id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for fw_gear_deid_inplace-1.3.2-py3-none-any.whl
Algorithm Hash digest
SHA256 c952f86ae892551e6455661e20cfb9e6f1561ae67af22e7c9fad79e6064ff65c
MD5 1e193bc689c69181a77b9599f4bd5e73
BLAKE2b-256 7221f1f1f7dccd13eaa4ac2076bd422a648f8d78044e9d98089a8f364974c7bb

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page