Skip to main content

Download a tarballed DICOM dataset from the CFMM DICOM server

Project description

cfmm2tar

Download a tarballed DICOM dataset from the CFMM DICOM server

Overview

cfmm2tar is a command-line tool for querying and downloading DICOM studies from the CFMM (Centre for Functional and Metabolic Mapping) DICOM server. It provides three flexible deployment options to suit different environments and use cases.

Installation & Usage

There are three ways to run cfmm2tar, each with different requirements:

Option 1: Docker Container (Recommended - All-in-One)

This is the easiest method as it includes all dependencies (Python, dcm4che tools, and DicomRaw utilities) in a single container.

Requirements: Docker or Podman

Installation:

# Pull from GitHub Container Registry
docker pull ghcr.io/khanlab/cfmm2tar:latest

# Or build locally
git clone https://github.com/khanlab/cfmm2tar
cd cfmm2tar
docker build -t cfmm2tar .

Usage:

OUTPUT_DIR=/path/to/dir
mkdir -p ${OUTPUT_DIR}

# Show help
docker run --rm cfmm2tar --help

# Download studies
docker run -i -t --rm --volume ${OUTPUT_DIR}:/data cfmm2tar -p 'Everling^Marmoset' -d '20180803' /data

You will be prompted for your UWO username and password. You can only download datasets to which you have read permissions.

Option 2: Apptainer/Singularity Container (For HPC Environments)

Similar to Docker but designed for HPC clusters where Docker may not be available.

Requirements: Apptainer (formerly Singularity)

Installation:

# Build from Docker image
apptainer build cfmm2tar.sif docker://ghcr.io/khanlab/cfmm2tar:latest

# Or build from definition file
apptainer build cfmm2tar.sif Singularity

Usage:

OUTPUT_DIR=/path/to/dir
mkdir -p ${OUTPUT_DIR}

# Show help
apptainer run cfmm2tar.sif --help

# Download studies
apptainer run --bind ${OUTPUT_DIR}:/data cfmm2tar.sif -p 'Khan^Project' -d '20240101' /data

Option 3: PyPI Installation (For Python Environments)

Install cfmm2tar as a Python package. Note: This method requires additional setup for dcm4che tools.

Requirements:

  • Python 3.11+
  • Either dcm4che tools installed locally OR a container with dcm4che tools

Installation:

# From PyPI (when published)
pip install cfmm2tar

# Or install from source
git clone https://github.com/khanlab/cfmm2tar
cd cfmm2tar
pip install -e .

Setup dcm4che tools:

You have two options:

Option 3a: Install dcm4che locally

export DCM4CHE_VERSION=5.24.1
sudo bash install_dcm4che_ubuntu.sh /opt
export PATH=/opt/dcm4che-${DCM4CHE_VERSION}/bin:$PATH

Option 3b: Use a dcm4che container

# Pull a container with dcm4che tools
apptainer pull docker://ghcr.io/khanlab/cfmm2tar:latest

# Set environment variable
export DCM4CHE_CONTAINER=/path/to/cfmm2tar.sif

Usage:

OUTPUT_DIR=/path/to/dir
mkdir -p ${OUTPUT_DIR}

# If dcm4che tools are in PATH (Option 3a)
cfmm2tar -p 'Khan^Project' -d '20240101' ${OUTPUT_DIR}

# If using a dcm4che container (Option 3b)
cfmm2tar --dcm4che-container /path/to/cfmm2tar.sif -p 'Khan^Project' -d '20240101' ${OUTPUT_DIR}

# Or set environment variable
export DCM4CHE_CONTAINER=/path/to/cfmm2tar.sif
cfmm2tar -p 'Khan^Project' -d '20240101' ${OUTPUT_DIR}

Comparison of Methods

Method Pros Cons Best For
Docker Container ✅ All dependencies included
✅ Consistent environment
✅ Easy to use
❌ Requires Docker End users, workstations
Apptainer Container ✅ All dependencies included
✅ Works on HPC clusters
✅ No root required
❌ Need to build/pull container HPC environments
PyPI Install ✅ Integrates with Python environment
✅ Easy to script
❌ Requires separate dcm4che setup
❌ More complex setup
Python developers, scripting

Usage

Basic Search and Download

Search and download DICOM studies based on search criteria:

# Download all studies for a specific Principal^Project on a specific date
cfmm2tar -p 'Khan^NeuroAnalytics' -d '20240101' output_dir

# Download studies for a specific patient
cfmm2tar -n '*subj01*' output_dir

# Download a specific study by StudyInstanceUID
cfmm2tar -u '1.2.840.113619.2.55.3.1234567890.123' output_dir

Query Metadata Without Downloading

You can query and export study metadata to a TSV file without downloading the actual DICOM files:

# Export metadata for all studies on a specific date
cfmm2tar -M study_metadata.tsv -d '20240101'

# Export metadata for a specific Principal^Project
cfmm2tar -M study_metadata.tsv -p 'Khan^NeuroAnalytics' -d '20240101-20240131'

This creates a TSV file with columns:

  • StudyInstanceUID: Unique identifier for the study
  • PatientName: Patient name
  • PatientID: Patient ID
  • StudyDate: Date of the study
  • StudyDescription: Study description (typically Principal^Project)

Download from UID List

After reviewing the metadata file, you can download specific studies:

# Download all studies from the metadata file
cfmm2tar --uid-from-file study_metadata.tsv output_dir

# Or create a filtered version of the metadata file and download only those
# (e.g., filter in Excel, grep, awk, or Python)
cfmm2tar --uid-from-file study_metadata_filtered.tsv output_dir

# You can also use a simple text file with one UID per line
cfmm2tar --uid-from-file uid_list.txt output_dir

Track Downloaded Studies

Track which studies have already been downloaded:

# Use a tracking file to avoid re-downloading
cfmm2tar -U ~/downloaded_uid_list.txt -p 'Khan^NeuroAnalytics' output_dir

Workflow Example

  1. Query and export metadata for review:

    cfmm2tar -M all_studies.tsv -p 'Khan^NeuroAnalytics' -d '20240101-20240131'
    
  2. Review and filter the all_studies.tsv file (e.g., in Excel or with command-line tools)

  3. Download filtered studies:

    cfmm2tar --uid-from-file all_studies_filtered.tsv output_dir
    

This workflow is especially useful when:

  • You want to review available studies before downloading
  • Storage is limited and you need to select specific studies
  • You're sharing the metadata with collaborators to decide what to download
  • You need to filter studies based on multiple criteria

Development and Testing

Development Setup

For contributors and developers:

# Clone the repository
git clone https://github.com/khanlab/cfmm2tar
cd cfmm2tar

# Install in development mode with dev dependencies
pip install -e .
pip install ruff pre-commit pytest pydicom numpy

# Set up pre-commit hooks (runs quality checks before each commit)
pre-commit install

# Install dcm4che tools (required for integration tests)
export DCM4CHE_VERSION=5.24.1
sudo bash install_dcm4che_ubuntu.sh /opt

Code Quality and Formatting

This project uses ruff for linting and formatting:

# Format code
ruff format .

# Check for lint issues
ruff check .

# Fix auto-fixable issues
ruff check --fix .

# Run pre-commit hooks manually
pre-commit run --all-files

Running Tests

This project includes a comprehensive testing framework using a containerized dcm4che PACS instance.

# Install development dependencies
pip install -e .
pip install pytest pydicom numpy

# Install dcm4che tools (required for integration tests)
export DCM4CHE_VERSION=5.24.1
sudo bash install_dcm4che_ubuntu.sh /opt

# Run unit tests (no PACS server required)
pytest tests/test_dcm4che_utils.py::TestDcm4cheUtilsUnit -v

# Run integration tests (requires Docker)
cd tests
docker compose up -d
sleep 60  # Wait for PACS to be ready
cd ..
pytest tests/test_dcm4che_utils.py::TestDcm4cheUtilsIntegration -v

# Clean up
cd tests
docker compose down -v

See tests/README.md for detailed testing documentation.

Continuous Integration

The project uses GitHub Actions for automated testing. The workflow:

  1. Runs unit tests on every push and pull request
  2. Starts a containerized dcm4chee PACS server
  3. Runs integration tests against the PACS server
  4. Reports results

See .github/workflows/test.yml for the complete workflow.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cfmm2tar-2.0.2.tar.gz (48.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

cfmm2tar-2.0.2-py3-none-any.whl (34.8 kB view details)

Uploaded Python 3

File details

Details for the file cfmm2tar-2.0.2.tar.gz.

File metadata

  • Download URL: cfmm2tar-2.0.2.tar.gz
  • Upload date:
  • Size: 48.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for cfmm2tar-2.0.2.tar.gz
Algorithm Hash digest
SHA256 a8c56d0bbf49d0dc5e862cd121e4c9b23d56fc01e2e3e7c9499a249762ebe15c
MD5 7446e94da450439264da208a66f53c66
BLAKE2b-256 e9e4c1ac850ceedd49e4b0d65b4e596c32ecc728a17fc1be627d9cc3b1493106

See more details on using hashes here.

Provenance

The following attestation bundles were made for cfmm2tar-2.0.2.tar.gz:

Publisher: publish-pypi.yml on khanlab/cfmm2tar

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file cfmm2tar-2.0.2-py3-none-any.whl.

File metadata

  • Download URL: cfmm2tar-2.0.2-py3-none-any.whl
  • Upload date:
  • Size: 34.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for cfmm2tar-2.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 bdb7a64640e080cd16f90a69bf8c9f36274f322ee10356b99d6d1a6035740f0a
MD5 4bb0ccabb7283582aace89f36e05ad1b
BLAKE2b-256 b855717c44c01b2aa2f07d437a87c265ca85dde484f80987f9f358b5df661ef5

See more details on using hashes here.

Provenance

The following attestation bundles were made for cfmm2tar-2.0.2-py3-none-any.whl:

Publisher: publish-pypi.yml on khanlab/cfmm2tar

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page