Skip to main content

Open-source, self-hosted DICOM file-security scanner + Content Disarm & Reconstruction (CDR)

Project description

DicomLock

Open-source, self-hosted security for DICOM medical-image files. It scans a file for the ways it can be weaponized, then disarms it by rebuilding a clean, clinically identical copy.

PyPI License: Apache 2.0 Python 3.10+

A DICOM file is not just a picture. It is input for a parser, and hospital software has to read and decode all of it before anyone sees an image. That step is the attack surface. DicomLock checks the file for polyglot malware, parser-exploit constructions, and pixel data that routes through vulnerable image or video codecs, then rebuilds a clean version before the file reaches a PACS, viewer, or model. It runs inside your network, so no patient data leaves the building.

This is Content Disarm and Reconstruction (CDR) for DICOM, built to be open and auditable.

Install

pip install dicomlock

The core install pulls in the decoder backends (gdcm and pylibjpeg) so disarm works out of the box. Two optional extras:

pip install "dicomlock[server]"   # web UI and REST API
pip install "dicomlock[full]"     # PHI / de-identification audit and legacy forensics

Python 3.10 or newer.

Usage

dicomlock file.dcm                 # scan one file
dicomlock folder/                  # scan every .dcm in a folder
dicomlock folder/ --disarm         # scan, then disarm or quarantine each file
dicomlock file.dcm --deid          # add the PHI / de-identification audit

As a library:

from scanner.pipeline import run_security_scan, disarm_or_quarantine, is_dangerous

report = run_security_scan("file.dcm")
if is_dangerous(report):
    result = disarm_or_quarantine("file.dcm")   # {"action": "disarmed" | "quarantined", ...}

Web UI and API:

python server.py    # http://localhost:8899

Uploads are scanned in a temp directory and deleted right after, so PHI is never persisted.

How it works

  1. Scan. Deterministic, rule-based checks, no ML: preamble and polyglot signatures, length amplification, sequence-nesting depth, pixel-dimension and decompression bombs, private-tag payloads, codec-CVE exposure, and metadata integrity.
  2. Disarm. For files that are dangerous but recoverable, it zeroes the preamble, transcodes compressed pixels to native off the vulnerable codec (in a sandboxed subprocess), and filters private tags against a vendor allowlist. Lossless sources come out bit-exact. Lossy sources are decoded once with no new compression.
  3. Quarantine. Anything it cannot safely rebuild, such as length bombs and files no backend can decode, is held back. It re-scans its own output, so it never emits a file that still fails a check.

The point of CDR is that it rebuilds from a validated canonical form instead of matching a known signature, so it neutralizes attacks it has never seen. That is the defense that holds up when vulnerabilities turn up faster than anyone can patch them, which is the situation for the systems a patch cycle reaches slowly: legacy and embedded medical devices, and software locked behind FDA recertification.

Why this is a real problem

The 128-byte preamble can hold an executable header, so one file can be a valid scan and working malware at the same time (CVE-2019-11687, extended to Linux devices by ELFDICOM). Attacker-controlled length fields turn a 140-byte file into a multi-gigabyte allocation request. Encapsulated pixel and video data decodes through libjpeg, OpenJPEG, CharLS, and FFmpeg-class libraries that carry long CVE histories. Live examples in clinical software include the Orthanc auth bypass (CVE-2025-0896, CVSS 9.8) and MicroDicom remote code execution (CVE-2025-5943).

DicomLock works on the file. It does not break or weaken encryption and makes no claim to.

Results

All reproducible from the scripts in _attack_test/:

  • Zero false positives across 575 real clinical CT files, and zero on a separate mixed-compression corpus spanning 12 transfer syntaxes.
  • 20 of 20 crafted attack fixtures flagged by the expected check.
  • pydicom, GDCM, and dcmtk accept the weaponized files without complaint; DicomLock flags every one.
  • Disarmed pixels are bit-exact against two independent decoders (GDCM and pylibjpeg) on every lossless sample.
  • The codec decode is sandboxed, so a crashing or hanging decoder is contained and the file is quarantined, not the tool.

The attack fixtures in this repo are inert. Polyglots carry only magic bytes, and payload tags carry a header plus zero padding. No working malware ships here.

Where it fits

Commercial DICOM CDR already exists (OPSWAT, Votiro), and there is academic prior art, so "nobody does this" is not the pitch. DicomLock's reason to exist is that it is open, self-hosted, auditable, and PACS-depth. The transcoding, the vendor allowlist, and the parser-bomb rejection are all readable in source and run inside your network. The natural first users are research-imaging and data-engineering teams that already ingest untrusted external DICOM.

It is a security and sanitization tool, not a medical device. It carries no diagnostic claim and no FDA clearance. See THREAT_MODEL.md for what it does and does not defend.

Documentation

License

Apache-2.0, © 2026 Vijay Thakore. Provided as is, without warranty. DicomLock is a sanitization tool, not a medical device, and makes no diagnostic claim. Validate disarmed files in your own environment before any clinical use, and run on de-identified data where you can.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dicomlock-0.7.0.tar.gz (52.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dicomlock-0.7.0-py3-none-any.whl (59.6 kB view details)

Uploaded Python 3

File details

Details for the file dicomlock-0.7.0.tar.gz.

File metadata

  • Download URL: dicomlock-0.7.0.tar.gz
  • Upload date:
  • Size: 52.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for dicomlock-0.7.0.tar.gz
Algorithm Hash digest
SHA256 a2bcb47a01fa04e3ecfa5fa7bfcb941c42aad05db99882eb712537a351d4526d
MD5 088c76e6da5c775fd29e6ad2aebda990
BLAKE2b-256 ab7868d9621bd43c2a69af913c6f9b19a393c519632a2ecd29207f419ee7c1dc

See more details on using hashes here.

File details

Details for the file dicomlock-0.7.0-py3-none-any.whl.

File metadata

  • Download URL: dicomlock-0.7.0-py3-none-any.whl
  • Upload date:
  • Size: 59.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for dicomlock-0.7.0-py3-none-any.whl
Algorithm Hash digest
SHA256 9d6358d352d231f6ca5e99fb935e0bfd24e4f14dfaf85df02b36e4f2c4378751
MD5 538ca3e3b2af4bda6e55b9de4610c7f4
BLAKE2b-256 6ec68bb487fdb207edeb4e2f0ccc45fbf18c308de08df2389407e6b9498c0803

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page