Open-source, self-hosted DICOM file-security scanner + Content Disarm & Reconstruction (CDR)
Project description
DicomLock
Open-source, self-hosted security for DICOM medical-image files. It scans a file for the ways it can be weaponized, then disarms it by rebuilding a clean, clinically identical copy.
A DICOM file is not just a picture. It is input for a parser, and hospital software has to read and decode all of it before anyone sees an image. That step is the attack surface. DicomLock checks the file for polyglot malware, parser-exploit constructions, and pixel data that routes through vulnerable image or video codecs, then rebuilds a clean version before the file reaches a PACS, viewer, or model. It runs inside your network, so no patient data leaves the building.
This is Content Disarm and Reconstruction (CDR) for DICOM, built to be open and auditable.
Install
pip install dicomlock
The core install pulls in the decoder backends (gdcm and pylibjpeg) so disarm works out of the box. Two optional extras:
pip install "dicomlock[server]" # web UI and REST API
pip install "dicomlock[full]" # PHI / de-identification audit and legacy forensics
Python 3.10 or newer.
Usage
dicomlock file.dcm # scan one file
dicomlock folder/ # scan every .dcm in a folder
dicomlock folder/ --disarm # scan, then disarm or quarantine each file
dicomlock file.dcm --deid # add the PHI / de-identification audit
As a library:
from scanner.pipeline import run_security_scan, disarm_or_quarantine, is_dangerous
report = run_security_scan("file.dcm")
if is_dangerous(report):
result = disarm_or_quarantine("file.dcm") # {"action": "disarmed" | "quarantined", ...}
Web UI and API:
python server.py # http://localhost:8899
Uploads are scanned in a temp directory and deleted right after, so PHI is never persisted.
How it works
- Scan. Deterministic, rule-based checks, no ML: preamble and polyglot signatures, length amplification, sequence-nesting depth, pixel-dimension and decompression bombs, private-tag payloads, codec-CVE exposure, and metadata integrity.
- Disarm. For files that are dangerous but recoverable, it zeroes the preamble, transcodes compressed pixels to native off the vulnerable codec (in a sandboxed subprocess), and filters private tags against a vendor allowlist. Lossless sources come out bit-exact. Lossy sources are decoded once with no new compression.
- Quarantine. Anything it cannot safely rebuild, such as length bombs and files no backend can decode, is held back. It re-scans its own output, so it never emits a file that still fails a check.
The point of CDR is that it rebuilds from a validated canonical form instead of matching a known signature, so it neutralizes attacks it has never seen. That is the defense that holds up when vulnerabilities turn up faster than anyone can patch them, which is the situation for the systems a patch cycle reaches slowly: legacy and embedded medical devices, and software locked behind FDA recertification.
Why this is a real problem
The 128-byte preamble can hold an executable header, so one file can be a valid scan and working malware at the same time (CVE-2019-11687, extended to Linux devices by ELFDICOM). Attacker-controlled length fields turn a 140-byte file into a multi-gigabyte allocation request. Encapsulated pixel and video data decodes through libjpeg, OpenJPEG, CharLS, and FFmpeg-class libraries that carry long CVE histories. Live examples in clinical software include the Orthanc auth bypass (CVE-2025-0896, CVSS 9.8) and MicroDicom remote code execution (CVE-2025-5943).
DicomLock works on the file. It does not break or weaken encryption and makes no claim to.
Results
All reproducible from the scripts in _attack_test/:
- Zero false positives across 575 real clinical CT files, and zero on a separate mixed-compression corpus spanning 12 transfer syntaxes.
- 20 of 20 crafted attack fixtures flagged by the expected check.
- pydicom, GDCM, and dcmtk accept the weaponized files without complaint; DicomLock flags every one.
- Disarmed pixels are bit-exact against two independent decoders (GDCM and pylibjpeg) on every lossless sample.
- The codec decode is sandboxed, so a crashing or hanging decoder is contained and the file is quarantined, not the tool.
The attack fixtures in this repo are inert. Polyglots carry only magic bytes, and payload tags carry a header plus zero padding. No working malware ships here.
Where it fits
Commercial DICOM CDR already exists (OPSWAT, Votiro), and there is academic prior art, so "nobody does this" is not the pitch. DicomLock's reason to exist is that it is open, self-hosted, auditable, and PACS-depth. The transcoding, the vendor allowlist, and the parser-bomb rejection are all readable in source and run inside your network. The natural first users are research-imaging and data-engineering teams that already ingest untrusted external DICOM.
It is a security and sanitization tool, not a medical device. It carries no diagnostic claim and no FDA clearance. See THREAT_MODEL.md for what it does and does not defend.
Documentation
- THREAT_MODEL.md, attacks defended and explicit non-claims
- ARCHITECTURE.md, module specs
- CONTRIBUTING.md, how to help with the CVE map, vendor allowlist, and codecs
- SECURITY.md, vulnerability disclosure
License
Apache-2.0, © 2026 Vijay Thakore. Provided as is, without warranty. DicomLock is a sanitization tool, not a medical device, and makes no diagnostic claim. Validate disarmed files in your own environment before any clinical use, and run on de-identified data where you can.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file dicomlock-0.7.0.tar.gz.
File metadata
- Download URL: dicomlock-0.7.0.tar.gz
- Upload date:
- Size: 52.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a2bcb47a01fa04e3ecfa5fa7bfcb941c42aad05db99882eb712537a351d4526d
|
|
| MD5 |
088c76e6da5c775fd29e6ad2aebda990
|
|
| BLAKE2b-256 |
ab7868d9621bd43c2a69af913c6f9b19a393c519632a2ecd29207f419ee7c1dc
|
File details
Details for the file dicomlock-0.7.0-py3-none-any.whl.
File metadata
- Download URL: dicomlock-0.7.0-py3-none-any.whl
- Upload date:
- Size: 59.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9d6358d352d231f6ca5e99fb935e0bfd24e4f14dfaf85df02b36e4f2c4378751
|
|
| MD5 |
538ca3e3b2af4bda6e55b9de4610c7f4
|
|
| BLAKE2b-256 |
6ec68bb487fdb207edeb4e2f0ccc45fbf18c308de08df2389407e6b9498c0803
|