Skip to main content

Deidentifies a file in place.

Project description

Anonymized/De-identified In Place

Overview

Summary

Profile-based anonymization of a file in flywheel. Files will be anonymized according to a de-id YAML profile and will overwrite or create a new version of the source file.

Currently supported files are:

  • Dicom
  • JPG
  • PNG
  • TIFF
  • XML
  • JSON
  • Text file defining key/value pair (e.g. MHD)
  • CSV
  • TSV

Currently supported field transformations are:

  • remove: Removes the field from the metadata.
  • replace-with: Replaces the contents of the field with the value provided.
  • increment-date: Offsets the date by the number of days.
  • increment-datetime: Offsets the datetime by the number of days.
  • hash: Replace the contents of the field with a one-way cryptographic hash.
  • hashuid: Replaces a UID field with a hashed version of that field.
  • jitter: Shifts value by a random number.
  • encrypt (non-DICOM): Encrypts the field in place with AES-EAX encryption
  • encrypt (DICOM): Removes the field from the DICOM and stores the original value in EncryptedAttributesSequence with CMS encryption
  • decrypt (non-DICOM): Decrypts the field in place with AES-EAX decryption
  • decrypt (DICOM): Replace the contents of the field with the value stored in EncryptedAttributesSequence with CMS decryption
  • regex-sub: Replace the contents of the field with a value built from other fields and/or group extracted from the field value.
  • keep: Do nothing.

Additionally, for DICOM, pixel data masking is supported based on pre-defined pixel coordinates (doc).

The YAML profile extends the flywheel-migration-toolkit de-id profile to flywheel metadata container. Documentation on how to write YAML configuration for the different supported files can be found in the flywheel-migration doc.

NOTE: Metadat extraction must be rerun on the output file, as the gear itself does not propagate/modify DICOM metadata.

License

MIT

Classification

utility

  • Gear Level:*

  • Project

  • Subject

  • Session

  • Acquisition

  • Analysis


[[TOC]]


Inputs

  • deid-profile

    • Name: deid-profile
    • Type: file
    • Optional: false
    • Description: A Flywheel de-identification profile specifying the de-identification actions to perform
  • subject-csv

    • Name: subject-csv
    • Type: file
    • Optional: true
    • Description: A CSV file that contains mapping values to apply for subjects during de-identification.
  • input-file

    • Name: input-file
    • Type: file
    • Optional: false
    • Description: An input file to be de-identified

deid_profile (required)

This is a YAML file that describes the protocol for de-identifying input-file. This file covers all the same functionality of Flywheel CLI de-identification.

NOTE: By default, flywheel metadata will be removed from the file. If you want the file's metadata to be passed along to the new deidentified version of the file, you MUST include the flywheel section:

flywheel:
  file:
    all: true

A simple example deid_profile.yaml looks like this:

# Configuration for DICOM de-identification
dicom:
  # What date offset to use, in number of days
  date-increment: -17

  # Set patient age from date of birth
  patient-age-from-birthdate: true
  # Set patient age units as Years
  patient-age-units: Y
  # Remove private tags
  remove-private-tags: true

  # filenames block to manipulate output filename based on input filename
  filenames:
      # input regular expression that match source filename
    - input-regex: '.*'
      # formatter of the output filename
      output: '{SOPInstanceUID}.dcm'

  fields:
    # Remove a dicom field value (e.g. remove “StationName”)
    - name: StationName
      remove: true

    # Increment a date field by -17 days
    - name: StudyDate
      increment-date: true

    # Increment a datetime field by -17 days
    - name: AcquisitionDateTime
      increment-datetime: true

    # One-Way hash a dicom field to a unique string
    - name: AccessionNumber
      hash: true

    # One-Way hash the ConcatenationUID,
    # keeping the prefix (4 nodes) and suffix (2 nodes)
    - name: ConcatenationUID
      hashuid: true

# Zip profile to handle e.g. .dcm.zip archive. All member file will be de-id accordly
# to that same profile. 
zip:
  fields:
  - name: comment
    replace-with: FLYWHEEL
  filenames:
  - input-regex: (?P<used>.*).dicom.zip$
    output: '{used}.dcm.zip'
  hash-subdirectories: true
  validate-zip-members: true

# The flywheel configuration to handle flywheel metadata de-id.
flywheel:
  # subject container
  subject:
    # If set to true, export all source container metdata to destination container.
    all: true

  # session container
  session:
    # If set to false, only export to destination container the metadata defined
    # in the fields key.
    all: false
    date-increment: -17
    fields:
      - name: operator
        replace-with: REDACTED
      - name: info.sessiondate
        increment-date: true
      - name: tags
        replace-with: 
          - deid-exported

  acquisition:
    all: true

  file:
    all: true
    # If set to true, export the file info header to the destination container.
    # If set to false or missing, the file info header will be removed from the 
    # destination container.
    include-info-header: true

subject-csv (optional)

The subject_csv facilitates subject-specific configuration of de-identification profiles. This is a csv file that contains the column subject.label with unique values corresponding to the subject.label values in the project to be exported. If a subject in the project to be exported is not listed in subject.label in the provided subject_csv this subject will not be exported.

Subject-level customization with subject-csv and deid-profile

Requirements:

  • To update subject fields, the fields must both be represented in the subject_csv as column header and in the deid_profile as jinja variable (i.e "{{ var_name }}").
  • If a field is represented in both the deid_profile and the subject_csv, the value in the deid_profile will be replaced with the value listed in the corresponding column of the subject_csv for each subject that has a label listed in the subject.label column.
  • Fields represented in the deid_profile but not in the subject_csv will be the same for all subjects.

Let's walk through an example pairing of subject_csv and deid_profile to illustrate.

The following table represents subject_csv (../tests/data/example-csv-mapping.csv):

subject.label DATE_INCREMENT SUBJECT_ID PATIENT_BD_BOOL
001 -15 Patient_IDA false
002 -20 Patient_IDB true
003 -30 Patient_IDC true

The deid_profile:

dicom:
  # date-increment can be any integer value since dicom.date-increment is
  # defined in example-csv-mapping.csv
  date-increment: "{{ DATE_INCREMENT }}"
  # since example-csv-mapping.csv doesn't define dicom.remove-private-tags,
  # all subjects will have private tags removed
  remove-private-tags: true
  fields:
    - name: PatientBirthDate
      # remove can be any boolean since dicom.fields.PatientBirthDate.remove is defined
      # in example-csv-mapping.csv
      remove: "{{ PATIENT_BD_BOOL }}"
    - name: PatientID
      # replace-with can be any string value since dicom.fields.PatientID.replace-with
      # is defined in example-csv-mapping.csv
      replace-with: "{{ SUBJECT_ID }}"

The resulting profile for subject 003 given the above would be:

dicom:
  # date-increment can be any integer value since dicom.date-increment is
  # defined in example-csv-mapping.csv
  date-increment: -30
  remove_private_tags: true
  fields:
    - name: PatientBirthDate
      remove: true
    - name: PatientID
      replace-with: Patient_IDC 

Config

  • debug

    • Name: debug
    • Type: boolean
    • Default: false
    • Description: If true, the gear will print debug information to the log.
  • tag

    • Name: tag
    • Type: string
    • Default: "deid-inplace"
    • Description: The tag prefix to append to the file after the gear runs. The tag will be <prefix>-PASS or <prefix>-FAIL, depending on the gear run status.
  • delete-original

    • Name: delete-original
    • Type: boolean
    • Default: true
    • Description: If True, the original file is deleted and replaced with the de-identified file, rendering the original file unrecoverable. If False, the de-identified file overwrites the original, resulting in a file version increment that can be reversed.
  • private_key

    • Name: private_key
    • Type: string
    • Description: Asymmetric decryption: the resolver path and filename of the private key pem file, formatted as <group>/<project>/files/<filename> (E.g., flywheel/test/files/private_key.pem) if the key is saved at the project level, or <group>/<project>/<subject>/files/<filename> if stored at the subject level, or <group>/<project>/<subject>/<session>/<acquisition>/<filename> if stored within an acquisition container.
  • public_key

    • Name: public_key
    • Type: string
    • Description: Asymmetric encryption: the resolver path and filename(s) of the public key pem file(s), formatted as <group>/<project>/files/<filename>, with multiple key files separated by ', ' (E.g. flywheel/test/files/public_key1.pem, flywheel/test/files/public_key2.pem) if the key is saved at the project level, or <group>/<project>/<subject>/files/<filename> if stored at the subject level, or <group>/<project>/<subject>/<session>/<acquisition>/<filename> if stored within an acquisition container.
  • secret_key

    • Name: secret_key
    • Type: string
    • Description: Symmetric encryption: the resolver path and filename(s) of the secret key txt file(s), formatted as <group>/<project>/files/<filename> (E.g. flywheel/test/files/secret_key.txt) if the key is saved at the project level, or <group>/<project>/<subject>/files/<filename> if stored at the subject level, or <group>/<project>/<subject>/<session>/<acquisition>/<filename> if stored within an acquisition container.

Usage

  1. User uploads or identifies a file in Flywheel to deidentify

  2. User runs deid-inplace (this utility gear) at the project, subject, or session level and provides the following:

    • Files:
      • A de-identification profile specifying how to de-identify/anonymize each file type
      • an optional csv that contains a column that maps to a Flywheel session or subject metadata field and columns that specify values with which to replace DICOM header tags
      • The desired input file to de-identify
    • Configuration options:
      • delete-original: True/False
  3. The gear will deidentify the file

  4. The gear will erase or overwrite the original file depending on the config option.

Environment

This gear uses poetry as a virtual environment and dependency manager you can interact with the gear using the following:

  1. Install poetry
  2. Install dependencies (from within gear directory): poetry install
  3. Enter virtual environment: poetry shell

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

fw_gear_deid_inplace-1.2.7-py3-none-any.whl (17.3 kB view details)

Uploaded Python 3

File details

Details for the file fw_gear_deid_inplace-1.2.7-py3-none-any.whl.

File metadata

File hashes

Hashes for fw_gear_deid_inplace-1.2.7-py3-none-any.whl
Algorithm Hash digest
SHA256 565f9dd6aaccbf246bdccf17e027005101872d94395b5750848de83c83904df2
MD5 3786407e2133a3b22d4d2f08ab0e7753
BLAKE2b-256 f094da4be2f0765903c6d502cfb60302292baa6028cacd4ec033a5fefe64118f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page