Skip to main content

A Python module that compares an original DICOM file with its deidentified counterpart and returns the differences between the two.

Project description

CI PyPI PyPI - Python Version Code style: black

Dicomdiff

A comprehensive Python module for analyzing DICOM file pseudonymization and de-identification processes. Dicomdiff enables researchers and healthcare professionals to evaluate, compare, and validate different pseudonymization methods by providing detailed analysis of how DICOM tags are transformed during de-identification.

What Dicomdiff Does

Core Functionality:

  • Individual File Comparison: Compare original DICOM files with their de-identified versions to detect specific changes
  • Pseudonymization Method Analysis: Evaluate and compare different pseudonymization approaches across multiple files
  • Cross-Method Comparison: Analyze how different pseudonymization tools handle the same source data
  • Consistency Validation: Check whether pseudonymization methods behave consistently across datasets
  • Tag Transformation Tracking: Monitor how specific DICOM tags are modified, removed, or preserved during pseudonymization

Installation

Install the module using pip

  pip install dicomdiff

Usage

Compare two DICOM files in Python

from dicomdiff.main import compare_dicom_files, print_differences

original_file = "path to original dcm file"
deidentified_file = "path to de-identified dcm file"

result = compare_dicom_files(original_file, deidentified_file) # compare the files
print_differences(result) # print the results

Compare two pseudonymization methods

import os
import glob
import pandas as pd

# Import the pseudonymizers you'll be comparing
from dicomdiff.pseudonymizer import IDISPseudonymizer, DicomRakePseudonymizer, InferredPseudonymizer
from dicomdiff.summary import generate_pseudonymization_summary

# Define the paths
input_dir = "path/to/input/data"
mapping_csv = "path/to/mapping"
idisoutput_dir = "/path/to/output/data/" # Note: this is needed if you use InferredPseudonymizer

# Find all files 
dicom_files = glob.glob(os.path.join(input_dir, "**", "*.dcm"), recursive=True)

# Define the pseudonymizers you want to use
dicomrake_pseudonymizer = DicomRakePseudonymizer()
idis_pseudonymizer = InferredPseudonymizer.from_csv(mapping_csv, idisoutput_dir)

# Generate comparison summary between the two pseudonymization methods
summary_df = generate_pseudonymization_summary(
    file_paths=dicom_files,
    pseudonymizer_a=dicomrake_pseudonymizer,
    pseudonymizer_b=idis_pseudonymizer,
    pseudonymizer_a_name="DICOMRake", # If you want to change the name of the pseudonymizers
    pseudonymizer_b_name="IDISPseudonymizer",
)

# Define helper function to identify private DICOM tags (odd group numbers)
def is_private_tag(tag_str):
    try:
        group = int(tag_str.split(",")[0], 16)
        return group % 2 != 0  
    except (ValueError, IndexError):
        return False

# Filter results to show only public tags (even group numbers)
public_tags_df = summary_df[~summary_df["tag"].apply(is_private_tag)]

# Display overall comparison statistics for public tags
print("\nComparison results (public tags only):")
print(public_tags_df["comparison"].value_counts().to_string())

# Configure pandas display options for better output formatting
pd.set_option("display.max_rows", None)
pd.set_option("display.max_columns", None)
pd.set_option("display.width", 1000)
pd.set_option("display.expand_frame_repr", False)
pd.set_option("display.colheader_justify", "center")

# Show detailed differences in public tags where pseudonymizers behaved differently
different_public_tags = public_tags_df[public_tags_df["comparison"] != "Both Unchanged"]
print(f"\nDifferences in public tags ({len(different_public_tags)}):")
print(different_public_tags)

# OPTIONAL: If private tags exist, you can show differences in private tags as well
if len(summary_df) > len(public_tags_df):
    private_tags_df = summary_df[summary_df["tag"].apply(is_private_tag)]
    different_private_tags = private_tags_df[
        private_tags_df["comparison"] != "Both Unchanged"
    ]
    print(f"\nDifferences in private tags ({len(different_private_tags)}):")
    print(different_private_tags)

Compare two DICOM files using CLI

# Compare two DICOM files
dicomdiff compare file1.dcm file2.dcm

# Filter results
dicomdiff compare file1.dcm file.dcm --changed

CLI Flags

Flag Description
--changed Show only tags that have different values between files
--removed Show only tags that exist in original but not in de-identified file
--added Show only tags that exist in de-identified but not in original file
--unchanged Show only tags that have identical values in both files

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dicomdiff-0.5.0.tar.gz (30.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dicomdiff-0.5.0-py3-none-any.whl (32.2 kB view details)

Uploaded Python 3

File details

Details for the file dicomdiff-0.5.0.tar.gz.

File metadata

  • Download URL: dicomdiff-0.5.0.tar.gz
  • Upload date:
  • Size: 30.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.3 CPython/3.12.11 Linux/6.11.0-1018-azure

File hashes

Hashes for dicomdiff-0.5.0.tar.gz
Algorithm Hash digest
SHA256 22d26fa3a02d2d7b3226e742ee5c5e34ad193ff49bfca3925a4904df374f655d
MD5 ae709a680afb4946724e688937c87b08
BLAKE2b-256 83e71c8ffa8f3e99a4d69948b82b68cbccd3687924287b6deb9a1b5f76ba85c1

See more details on using hashes here.

File details

Details for the file dicomdiff-0.5.0-py3-none-any.whl.

File metadata

  • Download URL: dicomdiff-0.5.0-py3-none-any.whl
  • Upload date:
  • Size: 32.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.3 CPython/3.12.11 Linux/6.11.0-1018-azure

File hashes

Hashes for dicomdiff-0.5.0-py3-none-any.whl
Algorithm Hash digest
SHA256 b78518e84c14625ba7f08149028935bf4f549dfa4c39bfde6b86d24798c64548
MD5 bd178d3054b0d55c743690017c3369ad
BLAKE2b-256 b53d458eeb6af5695e979bae79bb8b4661c9f041a1d88f2bb6c6eceeec1d0a98

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page