Download a tarballed DICOM dataset from the CFMM DICOM server
Project description
cfmm2tar
Download a tarballed DICOM dataset from the CFMM DICOM server
Overview
cfmm2tar is a command-line tool for querying and downloading DICOM studies from the CFMM (Centre for Functional and Metabolic Mapping) DICOM server. It provides three flexible deployment options to suit different environments and use cases.
Installation & Usage
There are three ways to run cfmm2tar, each with different requirements:
Option 1: Docker Container (Recommended - All-in-One)
This is the easiest method as it includes all dependencies (Python, dcm4che tools, and DicomRaw utilities) in a single container.
Requirements: Docker or Podman
Installation:
# Pull from GitHub Container Registry
docker pull ghcr.io/khanlab/cfmm2tar:latest
# Or build locally
git clone https://github.com/khanlab/cfmm2tar
cd cfmm2tar
docker build -t cfmm2tar .
Usage:
OUTPUT_DIR=/path/to/dir
mkdir -p ${OUTPUT_DIR}
# Show help
docker run --rm cfmm2tar --help
# Download studies
docker run -i -t --rm --volume ${OUTPUT_DIR}:/data cfmm2tar -p 'Everling^Marmoset' -d '20180803' /data
You will be prompted for your UWO username and password. You can only download datasets to which you have read permissions.
Option 2: Apptainer/Singularity Container (For HPC Environments)
Similar to Docker but designed for HPC clusters where Docker may not be available.
Requirements: Apptainer (formerly Singularity)
Installation:
# Build from Docker image
apptainer build cfmm2tar.sif docker://ghcr.io/khanlab/cfmm2tar:latest
# Or build from definition file
apptainer build cfmm2tar.sif Singularity
Usage:
OUTPUT_DIR=/path/to/dir
mkdir -p ${OUTPUT_DIR}
# Show help
apptainer run cfmm2tar.sif --help
# Download studies
apptainer run --bind ${OUTPUT_DIR}:/data cfmm2tar.sif -p 'Khan^Project' -d '20240101' /data
Option 3: PyPI Installation (For Python Environments)
Install cfmm2tar as a Python package. Note: This method requires additional setup for dcm4che tools.
Requirements:
- Python 3.11+
- Either dcm4che tools installed locally OR a container with dcm4che tools
Installation:
# From PyPI (when published)
pip install cfmm2tar
# Or install from source
git clone https://github.com/khanlab/cfmm2tar
cd cfmm2tar
pip install -e .
Setup dcm4che tools:
You have two options:
Option 3a: Install dcm4che locally
export DCM4CHE_VERSION=5.24.1
sudo bash install_dcm4che_ubuntu.sh /opt
export PATH=/opt/dcm4che-${DCM4CHE_VERSION}/bin:$PATH
Option 3b: Use a dcm4che container
# Pull a container with dcm4che tools
apptainer pull docker://ghcr.io/khanlab/cfmm2tar:latest
# Set environment variable
export DCM4CHE_CONTAINER=/path/to/cfmm2tar.sif
Usage:
OUTPUT_DIR=/path/to/dir
mkdir -p ${OUTPUT_DIR}
# If dcm4che tools are in PATH (Option 3a)
cfmm2tar -p 'Khan^Project' -d '20240101' ${OUTPUT_DIR}
# If using a dcm4che container (Option 3b)
cfmm2tar --dcm4che-container /path/to/cfmm2tar.sif -p 'Khan^Project' -d '20240101' ${OUTPUT_DIR}
# Or set environment variable
export DCM4CHE_CONTAINER=/path/to/cfmm2tar.sif
cfmm2tar -p 'Khan^Project' -d '20240101' ${OUTPUT_DIR}
Comparison of Methods
| Method | Pros | Cons | Best For |
|---|---|---|---|
| Docker Container | ✅ All dependencies included ✅ Consistent environment ✅ Easy to use |
❌ Requires Docker | End users, workstations |
| Apptainer Container | ✅ All dependencies included ✅ Works on HPC clusters ✅ No root required |
❌ Need to build/pull container | HPC environments |
| PyPI Install | ✅ Integrates with Python environment ✅ Easy to script |
❌ Requires separate dcm4che setup ❌ More complex setup |
Python developers, scripting |
Usage
Basic Search and Download
Search and download DICOM studies based on search criteria:
# Download all studies for a specific Principal^Project on a specific date
cfmm2tar -p 'Khan^NeuroAnalytics' -d '20240101' output_dir
# Download studies for a specific patient
cfmm2tar -n '*subj01*' output_dir
# Download a specific study by StudyInstanceUID
cfmm2tar -u '1.2.840.113619.2.55.3.1234567890.123' output_dir
Query Metadata Without Downloading
You can query and export study metadata to a TSV file without downloading the actual DICOM files:
# Export metadata for all studies on a specific date
cfmm2tar -M study_metadata.tsv -d '20240101'
# Export metadata for a specific Principal^Project
cfmm2tar -M study_metadata.tsv -p 'Khan^NeuroAnalytics' -d '20240101-20240131'
This creates a TSV file with columns:
StudyInstanceUID: Unique identifier for the studyPatientName: Patient namePatientID: Patient IDStudyDate: Date of the studyStudyDescription: Study description (typically Principal^Project)
Download from UID List
After reviewing the metadata file, you can download specific studies:
# Download all studies from the metadata file
cfmm2tar --uid-from-file study_metadata.tsv output_dir
# Or create a filtered version of the metadata file and download only those
# (e.g., filter in Excel, grep, awk, or Python)
cfmm2tar --uid-from-file study_metadata_filtered.tsv output_dir
# You can also use a simple text file with one UID per line
cfmm2tar --uid-from-file uid_list.txt output_dir
Track Downloaded Studies
Track which studies have already been downloaded:
# Use a tracking file to avoid re-downloading
cfmm2tar -U ~/downloaded_uid_list.txt -p 'Khan^NeuroAnalytics' output_dir
Workflow Example
-
Query and export metadata for review:
cfmm2tar -M all_studies.tsv -p 'Khan^NeuroAnalytics' -d '20240101-20240131'
-
Review and filter the
all_studies.tsvfile (e.g., in Excel or with command-line tools) -
Download filtered studies:
cfmm2tar --uid-from-file all_studies_filtered.tsv output_dir
This workflow is especially useful when:
- You want to review available studies before downloading
- Storage is limited and you need to select specific studies
- You're sharing the metadata with collaborators to decide what to download
- You need to filter studies based on multiple criteria
Development and Testing
Development Setup
For contributors and developers:
# Clone the repository
git clone https://github.com/khanlab/cfmm2tar
cd cfmm2tar
# Install in development mode with dev dependencies
pip install -e .
pip install ruff pre-commit pytest pydicom numpy
# Set up pre-commit hooks (runs quality checks before each commit)
pre-commit install
# Install dcm4che tools (required for integration tests)
export DCM4CHE_VERSION=5.24.1
sudo bash install_dcm4che_ubuntu.sh /opt
Code Quality and Formatting
This project uses ruff for linting and formatting:
# Format code
ruff format .
# Check for lint issues
ruff check .
# Fix auto-fixable issues
ruff check --fix .
# Run pre-commit hooks manually
pre-commit run --all-files
Running Tests
This project includes a comprehensive testing framework using a containerized dcm4che PACS instance.
# Install development dependencies
pip install -e .
pip install pytest pydicom numpy
# Install dcm4che tools (required for integration tests)
export DCM4CHE_VERSION=5.24.1
sudo bash install_dcm4che_ubuntu.sh /opt
# Run unit tests (no PACS server required)
pytest tests/test_dcm4che_utils.py::TestDcm4cheUtilsUnit -v
# Run integration tests (requires Docker)
cd tests
docker compose up -d
sleep 60 # Wait for PACS to be ready
cd ..
pytest tests/test_dcm4che_utils.py::TestDcm4cheUtilsIntegration -v
# Clean up
cd tests
docker compose down -v
See tests/README.md for detailed testing documentation.
Continuous Integration
The project uses GitHub Actions for automated testing. The workflow:
- Runs unit tests on every push and pull request
- Starts a containerized dcm4chee PACS server
- Runs integration tests against the PACS server
- Reports results
See .github/workflows/test.yml for the complete workflow.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file cfmm2tar-2.0.2.tar.gz.
File metadata
- Download URL: cfmm2tar-2.0.2.tar.gz
- Upload date:
- Size: 48.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a8c56d0bbf49d0dc5e862cd121e4c9b23d56fc01e2e3e7c9499a249762ebe15c
|
|
| MD5 |
7446e94da450439264da208a66f53c66
|
|
| BLAKE2b-256 |
e9e4c1ac850ceedd49e4b0d65b4e596c32ecc728a17fc1be627d9cc3b1493106
|
Provenance
The following attestation bundles were made for cfmm2tar-2.0.2.tar.gz:
Publisher:
publish-pypi.yml on khanlab/cfmm2tar
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
cfmm2tar-2.0.2.tar.gz -
Subject digest:
a8c56d0bbf49d0dc5e862cd121e4c9b23d56fc01e2e3e7c9499a249762ebe15c - Sigstore transparency entry: 584705375
- Sigstore integration time:
-
Permalink:
khanlab/cfmm2tar@2930050e3a987b44563e82b3b3def7670ba7d49d -
Branch / Tag:
refs/tags/v2.0.2 - Owner: https://github.com/khanlab
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-pypi.yml@2930050e3a987b44563e82b3b3def7670ba7d49d -
Trigger Event:
release
-
Statement type:
File details
Details for the file cfmm2tar-2.0.2-py3-none-any.whl.
File metadata
- Download URL: cfmm2tar-2.0.2-py3-none-any.whl
- Upload date:
- Size: 34.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bdb7a64640e080cd16f90a69bf8c9f36274f322ee10356b99d6d1a6035740f0a
|
|
| MD5 |
4bb0ccabb7283582aace89f36e05ad1b
|
|
| BLAKE2b-256 |
b855717c44c01b2aa2f07d437a87c265ca85dde484f80987f9f358b5df661ef5
|
Provenance
The following attestation bundles were made for cfmm2tar-2.0.2-py3-none-any.whl:
Publisher:
publish-pypi.yml on khanlab/cfmm2tar
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
cfmm2tar-2.0.2-py3-none-any.whl -
Subject digest:
bdb7a64640e080cd16f90a69bf8c9f36274f322ee10356b99d6d1a6035740f0a - Sigstore transparency entry: 584705377
- Sigstore integration time:
-
Permalink:
khanlab/cfmm2tar@2930050e3a987b44563e82b3b3def7670ba7d49d -
Branch / Tag:
refs/tags/v2.0.2 - Owner: https://github.com/khanlab
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-pypi.yml@2930050e3a987b44563e82b3b3def7670ba7d49d -
Trigger Event:
release
-
Statement type: