Secure DICOM Anonymizer & Batch Processor
Project description
Dicomaster — Secure DICOM anonymizer & batch processor
Dicomaster is a compact, production-minded CLI tool for extracting metadata from DICOM files, securely anonymizing PHI, and producing researcher- and clinician-friendly outputs (JSON, CSV, FHIR ImagingStudy, thumbnails, HTML reports and more).
This repository contains the CLI and library logic needed to batch-process DICOM datasets for ML/AI research or clinical data pipelines while keeping an auditable anonymization map.
Highlights
- Secure pseudonymization (PBKDF2 when
cryptographyis available; HMAC fallback otherwise) - Streaming aggregation for large datasets (low memory footprint)
- Threaded batch processing with configurable worker count
- Multiple output formats:
json,csv,html,image,thumbnail,report,fhir,agg-csv,agg-json - Optional extras for thumbnails, progress bars and faster aggregation (
Pillow,tqdm,pandas) - Auditable anonymization maps (JSON) and reproducible pseudonyms via salt
Is this ready for GitHub release?
Short answer: Yes — core functionality, tests, and packaging are in place. I ran the test-suite and packaging build locally. Before a public PyPI release consider these small polish items:
- Finalize
pyproject.tomlmetadata (long description, homepage/URLs, author contact) - Decide on final module layout (top-level module vs package directory). Current packaging uses
dicomaster.pyas the shipped module. - Add a few more integration tests or fixtures (optional) to improve coverage
- Optionally convert the repo to a package layout (
dicomaster/package) for future extensibility
If you want, I can make all of the above and prepare a TestPyPI release workflow.
Quick install
Recommended: create an environment and install editable with extras for full features.
Windows (PowerShell):
python -m venv .dicomaster
\.\.dicomaster\Scripts\Activate.ps1
python -m pip install -U pip
pip install -e .[full]
Or for minimal/core features:
pip install -e .
Quick examples
Single file — colorful STAT (default):
dicomaster .\path\to\file.dcm
Single file metadata + thumbnail:
dicomaster sample_data/mri_1.dcm -o json,thumbnail -v
Batch anonymize and stream combined CSV:
dicomaster --batch C:\data\dicoms -o agg-csv --anonymize --anonymize-salt mysecret --threads 8
Generate FHIR ImagingStudy outputs:
dicomaster /studies -o fhir --batch --remove-private-tags
Use dicomaster --help for the full set of options and flags.
Changelog
See CHANGELOG.md for release highlights.
Developer notes
- Tests: run
pytest -q. The repo includes unit and small integration tests that create a minimal DICOM at runtime. - Build:
python -m build(I validated the sdist and wheel locally). - Editable install with extras:
pip install -e .[full]— installspydicom,Pillow,pandas,cryptography, etc.
Security & best practices
- Prefer supplying
--anonymize-saltfor reproducible pseudonyms across runs. If you omit it, the tool will generate salts and store them in the anonymize map. - Install
cryptographyto use PBKDF2HMAC (stronger) instead of the HMAC fallback. - Always validate your anonymized outputs and mapping before publishing datasets.
Files to include when you push for release
dicomaster.py(module)
- Fork and open a PR.
- Add tests for new features and bug fixes.
- Format code with
black(project style). Runpython -m pytestto confirm tests.
License
MIT — see LICENSE.txt.
Acknowledgements
Built on pydicom, inspired by deid and the DICOM community. Created by Santo Paul to make DICOM preprocessing safer and faster for research and clinical workflows.
If you'd like, I will (choose one):
- add basic pytest tests and a small CI workflow, or
- patch the thread-safety and metadata-key issues I found, or
- prepare a
pyproject.toml+setup.cfgthat declaresextras_requireand aconsole_scriptsentry point.
Tell me which and I'll implement it next.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file dicomaster-0.9.2.tar.gz.
File metadata
- Download URL: dicomaster-0.9.2.tar.gz
- Upload date:
- Size: 31.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ee97e7315ae5958cf54d791aa849f4d75f4ef235d180de3a9131ebad51d43656
|
|
| MD5 |
b050fe2c740e31e7dc113d5dff67569e
|
|
| BLAKE2b-256 |
437beda0bfc41a3d4a231792c76ec6b9d603be94ffcc85c15edaec016d315475
|
File details
Details for the file dicomaster-0.9.2-py3-none-any.whl.
File metadata
- Download URL: dicomaster-0.9.2-py3-none-any.whl
- Upload date:
- Size: 28.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fabef9a8e968ac5444e1eefc8286278caf461a2b8bad8b52da938151caf29ded
|
|
| MD5 |
711da1e12a02c5569a4568c07ce6e231
|
|
| BLAKE2b-256 |
e89687d16636404d8616536f933ba8a09c8f84ed8f364ef9d955225dd30762ec
|