Skip to main content

Process DICOM tags and performs substitutions -- part of the pf* family.

Project description

https://badge.fury.io/py/pfdicom_tagSub.svg https://travis-ci.org/FNNDSC/pfdicom_tagSub.svg?branch=master https://img.shields.io/badge/python-3.5%2B-blue.svg

Quick Overview

  • pfdicom_tagSub reads/edits/saves DICOM meta information. It can be used to anonymize DICOM header data.

Overview

pfdicom_tagSub replaces a set of <tag, value> pairs in a DICOM header with values passed in a JSON structure. Individual DICOM tags can be explicitly referenced in the JSON structure, as well as a regular expression construct to capture all tags satisfying that expression (allowing for idiomatic bulk substitution of <tag, value> pairs).

Tag regular expression constructs are python string expressions and are prefixed by "re:<pythonRegex>". For example, "re:.*hysician" will perform some substitution on all tags that contain the letters hysician. The value substitution has access to a special lookup, #tag, which is the current tag hit. It is possible to apply built in functions to the tag hit, for example md5 hashing, using "%_md5|4_#tag",

{
    "re:.*hysician":                "%_md5|4_#tag"
}

will be expanded to

{
    "PerformingPhysiciansName" :    "%_md5|4_PerformingPhysiciansName"
    "PhysicianofRecord"        :    "%_md5|4_PhysicianofRecord"
    "ReferringPhysiciansName"  :    "%_md5|4_ReferringPhysiciansName"
    "RequestingPhysician"      :    "%_md5|4_RequestingPhysician"
}

The tag regular expression construct allows for simple and powerful bulk substition of <tag, value> pairs.

The script accepts an <inputDir>, and then from this point an os.walk() is performed to extract all the subdirs. Each subdir is examined for DICOM files (in the simplest sense by a file extension mapping) are passed to a processing method that reads and replaces specified DICOM tags, saving the result in a corresponding directory and filename in the output tree.

Installation

Dependencies

The following dependencies are installed on your host system/python3 virtual env (they will also be automatically installed if pulled from pypi):

  • pfmisc (various misc modules and classes for the pf* family of objects)
  • pftree (create a dictionary representation of a filesystem hierarchy)
  • pfdicom (handle underlying DICOM file reading)

Using PyPI

The best method of installing this script and all of its dependencies is by fetching it from PyPI

pip3 install pfdicom_tagSub

Command line arguments

-I|--inputDir <inputDir>
Input DICOM directory to examine. By default, the first file in this
directory is examined for its tag information. There is an implicit
assumption that each <inputDir> contains a single DICOM series.

[-i|--inputFile <inputFile>]
An optional <inputFile> specified relative to the <inputDir>. If
specified, then do not perform a directory walk, but convert only
this file.

[-e|--extension <DICOMextension>]
An optional extension to filter the DICOM files of interest from the
<inputDir>.

-O|--outputDir <outputDir>
The output root directory that will contain a tree structure identical
to the input directory, and each "leaf" node will contain the analysis
results.

[--outputLeafDir <outputLeafDirFormat>]
If specified, will apply the <outputLeafDirFormat> to the output
directories containing data. This is useful to blanket describe
final output directories with some descriptive text, such as
'anon' or 'preview'.

This is a formatting spec, so

    --outputLeafDir 'preview-%s'

where %%s is the original leaf directory node, will prefix each
final directory containing output with the text 'preview-' which
can be useful in describing some features of the output set.

[-F|--tagFile <JSONtagFile>]
Parse the tags and their "subs" from a JSON formatted <JSONtagFile>.

[-T|--tagStruct <JSONtagStructure>]
Parse the tags and their "subs" from a JSON formatted <JSONtagStucture>
string passed directly in the command line. Note that sometimes protecting
a JSON string can be tricky, especially when used in scripts or as variable
expansions. If the JSON string is problematic, use the [--tagInfo <string>]
instead.

[--tagInfo <delimited_parameters>]
A token delimited string that is reconstructed into a JSON structure by the
script. This is often useful if the [--tagStruict] JSON string is hard to
parse in scripts and variable passing within scripts. The format of this
string is:

        "<tag1><splitKeyValue><value1><split_token><tag2><splitKeyValue><value2>"

for example:

        --splitToken ","
        --splitKeyValue ':'
        --tagInfo "PatientName:anon,PatientID:%_md5|7_PatientID"

or more complexly (esp if the ':' is part of the key):

        --splitToken "++"
        --splitKeyValue "="
        --tagInfo "PatientBirthDate = %_strmsk|******01_PatientBirthDate ++
                   re:.*hysician"   = %_md5|4_#tag"


[-s|--splitToken <split_token>]
The token on which to split the <delimited_parameters> string.
Default is '++'.

[-k|--splitKeyValue <keyValueSplit>]
The token on which to split the <key> <value> pair. Default is ':'
but this can be problematic if the <key> itself has a ':' (for example
in the regular expression expansion).

[-o|--outputFileStem <outputFileStem>]
The output file stem to store data. This should *not* have a file
extension, or rather, any "." chars. Dots in the name are considered
part of the stem and are *not* considered extensions.

[--threads <numThreads>]
If specified, break the innermost analysis loop into <numThreads>
threads.

[-x|--man]
Show full help.

[-y|--synopsis]
Show brief help.

[--json]
If specified, output a JSON dump of final return.

[--followLinks]
If specified, follow symbolic links.

[-v|--verbosity <level>]
Set the app verbosity level.

    0: No internal output;
    1: Run start / stop output notification;
    2: As with level '1' but with simpleProgress bar in 'pftree';
    3: As with level '2' but with list of input dirs/files in 'pftree';
    5: As with level '3' but with explicit file logging for
            - read
            - analyze
            - write

Examples

Perform a DICOM anonymization by processing specific tags:

pfdicom_tagSub                                      \
    -e dcm                                          \
    -I /var/www/html/normsmall                      \
    -O /var/www/html/anon                           \
    --tagStruct '
    {
        "PatientName":              "%_name|patientID_PatientName",
        "PatientID":                "%_md5|7_PatientID",
        "AccessionNumber":          "%_md5|8_AccessionNumber",
        "PatientBirthDate":         "%_strmsk|******01_PatientBirthDate",
        "re:.*hysician":            "%_md5|4_#tag",
        "re:.*stitution":           "#tag",
        "re:.*ddress":              "#tag"
    }
    ' --threads 0 --printElapsedTime

– OR equivalently –

pfdicom_tagSub                                      \
    -e dcm                                          \
    -I /var/www/html/normsmall                      \
    -O /var/www/html/anon                           \
    --splitToken ","                                \
    --splitKeyValue "="                             \
    --tagInfo '
        PatientName         =  %_name|patientID_PatientName,
        PatientID           =  %_md5|7_PatientID,
        AccessionNumber     =  %_md5|8_AccessionNumber,
        PatientBirthDate    =  %_strmsk|******01_PatientBirthDate,
        re:.*hysician       =  %_md5|4_#tag,
        re:.*stitution      =  #tag,
        re:.*ddress         =  #tag
    ' --threads 0 --printElapsedTime

will replace the explicitly named tags as shown:

  • the PatientName value will be replaced with a Fake Name, seeded on the PatientID;
  • the PatientID value will be replaced with the first 7 characters of an md5 hash of the PatientID;
  • the AccessionNumber value will be replaced with the first 8 characters of an md5 hash of the AccessionNumber;
  • the PatientBirthDate value will set the final two characters, i.e. the day of birth, to 01 and preserve the other birthdate values;
  • any tags with the substring hysician will have their values replaced with the first 4 characters of the corresponding tag value md5 hash;
  • any tags with stitution and ddress substrings in the tag contents will have the corresponding value simply set to the tag name.

NOTE:

Spelling matters! Especially with the substring bulk replace, please make sure that the substring has no typos, otherwise the target tags will most probably not be processed.

_-30-_

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for pfdicom-tagSub, version 2.0.16
Filename, size File type Python version Upload date Hashes
Filename, size pfdicom_tagSub-2.0.16.tar.gz (14.4 kB) File type Source Python version None Upload date Hashes View

Supported by

AWS AWS Cloud computing Datadog Datadog Monitoring DigiCert DigiCert EV certificate Facebook / Instagram Facebook / Instagram PSF Sponsor Fastly Fastly CDN Google Google Object Storage and Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Salesforce Salesforce PSF Sponsor Sentry Sentry Error logging StatusPage StatusPage Status page