Skip to main content

A Python module that compares an original DICOM file with its deidentified counterpart and returns the differences between the two.

Project description

CI PyPI PyPI - Python Version Code style: black

Dicomdiff

A comprehensive Python module for analyzing DICOM file pseudonymization and de-identification processes. Dicomdiff enables researchers and healthcare professionals to evaluate, compare, and validate different pseudonymization methods by providing detailed analysis of how DICOM tags are transformed during de-identification.

What Dicomdiff Does

Core Functionality:

  • Individual File Comparison: Compare original DICOM files with their de-identified versions to detect specific changes
  • Pseudonymization Method Analysis: Evaluate and compare different pseudonymization approaches across multiple files
  • Cross-Method Comparison: Analyze how different pseudonymization tools handle the same source data
  • Consistency Validation: Check whether pseudonymization methods behave consistently across datasets
  • Tag Transformation Tracking: Monitor how specific DICOM tags are modified, removed, or preserved during pseudonymization

Installation

Install the module using pip

  pip install dicomdiff

Usage

Compare two DICOM files in Python

from dicomdiff.main import compare_dicom_files, print_differences

original_file = "path to original dcm file"
deidentified_file = "path to de-identified dcm file"

result = compare_dicom_files(original_file, deidentified_file) # compare the files
print_differences(result) # print the results

Compare two pseudonymization methods

import os
import glob
import pandas as pd

# Import the pseudonymizers you'll be comparing
from dicomdiff.pseudonymizer import IDISPseudonymizer, DicomRakePseudonymizer, InferredPseudonymizer
from dicomdiff.summary import generate_pseudonymization_summary

# Define the paths
input_dir = "path/to/input/data"
mapping_csv = "path/to/mapping"
idisoutput_dir = "/path/to/output/data/" # Note: this is needed if you use InferredPseudonymizer

# Find all files 
dicom_files = glob.glob(os.path.join(input_dir, "**", "*.dcm"), recursive=True)

# Define the pseudonymizers you want to use
dicomrake_pseudonymizer = DicomRakePseudonymizer()
idis_pseudonymizer = InferredPseudonymizer.from_csv(mapping_csv, idisoutput_dir)

# Generate comparison summary between the two pseudonymization methods
summary_df = generate_pseudonymization_summary(
    file_paths=dicom_files,
    pseudonymizer_a=dicomrake_pseudonymizer,
    pseudonymizer_b=idis_pseudonymizer,
    pseudonymizer_a_name="DICOMRake", # If you want to change the name of the pseudonymizers
    pseudonymizer_b_name="IDISPseudonymizer",
)

# Define helper function to identify private DICOM tags (odd group numbers)
def is_private_tag(tag_str):
    try:
        group = int(tag_str.split(",")[0], 16)
        return group % 2 != 0  
    except (ValueError, IndexError):
        return False

# Filter results to show only public tags (even group numbers)
public_tags_df = summary_df[~summary_df["tag"].apply(is_private_tag)]

# Display overall comparison statistics for public tags
print("\nComparison results (public tags only):")
print(public_tags_df["comparison"].value_counts().to_string())

# Configure pandas display options for better output formatting
pd.set_option("display.max_rows", None)
pd.set_option("display.max_columns", None)
pd.set_option("display.width", 1000)
pd.set_option("display.expand_frame_repr", False)
pd.set_option("display.colheader_justify", "center")

# Show detailed differences in public tags where pseudonymizers behaved differently
different_public_tags = public_tags_df[public_tags_df["comparison"] != "Both Unchanged"]
print(f"\nDifferences in public tags ({len(different_public_tags)}):")
print(different_public_tags)

# OPTIONAL: If private tags exist, you can show differences in private tags as well
if len(summary_df) > len(public_tags_df):
    private_tags_df = summary_df[summary_df["tag"].apply(is_private_tag)]
    different_private_tags = private_tags_df[
        private_tags_df["comparison"] != "Both Unchanged"
    ]
    print(f"\nDifferences in private tags ({len(different_private_tags)}):")
    print(different_private_tags)

Compare two DICOM files using CLI

# Compare two DICOM files
dicomdiff compare file1.dcm file2.dcm

# Filter results
dicomdiff compare file1.dcm file.dcm --changed

CLI Flags

Flag Description
--changed Show only tags that have different values between files
--removed Show only tags that exist in original but not in de-identified file
--added Show only tags that exist in de-identified but not in original file
--unchanged Show only tags that have identical values in both files

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dicomdiff-0.4.0.tar.gz (28.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dicomdiff-0.4.0-py3-none-any.whl (30.2 kB view details)

Uploaded Python 3

File details

Details for the file dicomdiff-0.4.0.tar.gz.

File metadata

  • Download URL: dicomdiff-0.4.0.tar.gz
  • Upload date:
  • Size: 28.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.3 CPython/3.12.11 Linux/6.11.0-1015-azure

File hashes

Hashes for dicomdiff-0.4.0.tar.gz
Algorithm Hash digest
SHA256 680338eb635982ad492a23f60a6e9c06c35d606496604a0b2e51ea129eb0760d
MD5 9c78929f9eeb7a707d9b0f8cf1a7b9bc
BLAKE2b-256 eabe7965147079594624ec19ee6b9b35d7b79d0cf9029e821de2ec199ff9f99c

See more details on using hashes here.

File details

Details for the file dicomdiff-0.4.0-py3-none-any.whl.

File metadata

  • Download URL: dicomdiff-0.4.0-py3-none-any.whl
  • Upload date:
  • Size: 30.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.3 CPython/3.12.11 Linux/6.11.0-1015-azure

File hashes

Hashes for dicomdiff-0.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 d23b05c17764d0506b8b82243fce4de56f576ee713c014ca737981dee358aff3
MD5 25c59d3fedf589665689cebc5425594b
BLAKE2b-256 6103cd5fcb542097ab8d91bb22e973b92849e1dadb961372a28dabaa91ccd11f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page