Skip to main content

Medical Imaging Topological Data Analysis - Extract TDA features from medical images for machine learning

Project description

Med-TDA: Medical Imaging Topological Data Analysis Tool

License: MIT Python 3.10+

Med-TDA is a Python library for extracting Topological Data Analysis (TDA) features from medical images. It provides a complete pipeline from image preprocessing to persistence barcode computation and feature vectorization, designed specifically for medical imaging applications in machine learning and radiomics research.

Installation

pip install medtda

Requirements: Python ≥ 3.10

Core Dependencies: NumPy ≥1.21, SciPy ≥1.7, GUDHI ≥3.5, cripser ≥0.0.32, SimpleITK ≥2.1, Pillow ≥9.0, scikit-image ≥0.19, scikit-learn ≥1.0, pandas ≥1.3, matplotlib ≥3.5, seaborn ≥0.11, PyYAML ≥6.0, tqdm ≥4.60

Supported Image Formats

  • 2D Images: PNG, JPG, JPEG, TIFF, TIF, BMP
  • 3D/4D Medical Images: NIfTI (.nii, .nii.gz), NRRD (.nrrd), MetaImage (.mha, .mhd)
  • Mask Support: Single-label and multi-label segmentation masks in any supported format

Feature Extraction

The FeatureExtractor class provides an end-to-end solution that performs preprocessing, persistent homology computation, and feature vectorization in one step. This is the recommended way to extract TDA features.

from medtda import FeatureExtractor

# Initialize with desired settings
extractor = FeatureExtractor(
    normalize=True,
    normalize_method='minmax',
    vectorization_method='persistence_stats'
)

# Extract features from image and mask
features = extractor.execute(
    image='path/to/image.nii.gz',
    mask='path/to/mask.nii.gz'
)

# features is a dictionary of TDA feature vectors ready for ML

Available Vectorization Methods:

  • persistence_stats: Statistical summaries (mean, std, min, max, percentiles)
  • betti_curve: Betti number curves over filtration values
  • persistence_image: 2D histogram representation of persistence diagrams
  • persistence_landscape: Persistence landscape functions
  • entropy_summary: Entropy-based statistical features
  • persistence_silhouette: Silhouette representation
  • persistence_lifespan: Lifespan distribution features
  • persistence_tropical_coordinates: Tropical algebra coordinates

You can use multiple vectorization methods simultaneously:

extractor = FeatureExtractor(
    normalize=True,
    vectorization_method=['persistence_stats', 'betti_curve', 'entropy_summary']
)
features = extractor.execute(image, mask)  # Returns combined features from all methods

Image Preprocessing

The Preprocessor class handles medical image preprocessing independently. Use this when you need standalone preprocessing or want to inspect preprocessed images before computing persistence.

Available Operations:

  • Resampling: Resample 3D/4D images to target voxel spacing (e.g., isotropic resolution)
  • Windowing: Apply intensity windowing (center/width) for CT images
  • Normalization: Normalize intensity values (minmax, z-score, or robust scaling)
  • Masking: Apply binary or multi-label masks with configurable background values
  • ROI Cropping: Automatically crop to region of interest with padding to reduce computation
from medtda import Preprocessor

preprocessor = Preprocessor(
    spacing=(1.0, 1.0, 1.0),  # Resample to 1mm isotropic
    normalize=True,
    normalize_method='minmax',
    crop_to_roi=True
)

preprocessed_image, metadata = preprocessor.preprocess(
    image='ct_scan.nii.gz',
    mask='roi_mask.nii.gz'
)

The metadata dictionary contains information about applied transformations, original and final image ranges, shapes, and cropping details.

Barcode Computation

The BarcodeExtractor class computes raw persistence barcodes (birth-death pairs) from medical images without vectorization. Use this when you need barcodes for custom analysis or visualization.

Persistent Homology Parameters:

  • Filtration Type: sublevel (default) or superlevel filtration
  • Construction: T (default, pixels/voxels as top-cells, 8-neighborhood in 2D) or V (pixels/voxels as 0-cells, 4-neighborhood in 2D)
  • Max Dimension: Maximum homology dimension to compute (auto-detected from image dimensionality)
from medtda import BarcodeExtractor

extractor = BarcodeExtractor(
    normalize=True,
    filtration_type='sublevel',
    max_dimension=2  # Compute H0, H1, H2
)

barcodes = extractor.execute(
    image='image.nii.gz',
    mask='mask.nii.gz'
)

# barcodes is a dict: {'H0': array, 'H1': array, 'H2': array}
# Each array has shape (n_features, 2) for (birth, death) pairs

Homology Dimensions:

  • H0: Connected components (captures regions and holes)
  • H1: Loops and tunnels (1-dimensional holes)
  • H2: Voids and cavities (2-dimensional holes, 3D only)
  • H3: 3-dimensional voids (4D images only)

Visualization

MedTDA provides 8 plotting functions for visualizing persistence barcodes:

from medtda.plotting import plot_persistence_diagram, plot_barcode, plot_betti_curve

# Visualize persistence diagram
plot_persistence_diagram(barcodes)

# Visualize barcode representation
plot_barcode(barcodes)

# Visualize Betti curves
plot_betti_curve(barcodes)

Available Plot Types:

  • plot_persistence_diagram: Birth-death diagram with diagonal
  • plot_barcode: Horizontal bars showing feature lifespans
  • plot_betti_curve: Betti number evolution across filtration values
  • plot_landscape: Persistence landscape functions
  • plot_entropy_summary: Entropy summary curves
  • plot_lifespan: Lifespan distribution curves
  • plot_silhouette: Persistence silhouette visualization
  • plot_tropical_coordinates: Tropical coordinate bar charts

All plots support multiple homology dimensions, custom color palettes, and seaborn styling.

Example Workflow

from medtda import FeatureExtractor

# 1. Initialize feature extractor with preprocessing and vectorization settings
extractor = FeatureExtractor(
    spacing=(1.0, 1.0, 1.0),          # Resample to 1mm isotropic
    normalize=True,                    # Apply normalization
    normalize_method='minmax',         # Use min-max normalization
    crop_to_roi=True,                  # Crop to ROI for efficiency
    filtration_type='sublevel',        # Sublevel filtration
    max_dimension=2,                   # Compute H0, H1, H2
    vectorization_method='persistence_stats',  # Statistical features
    return_barcodes=True               # Also return raw barcodes
)

# 2. Extract features (all preprocessing, PH computation, and vectorization in one call)
features, barcodes = extractor.execute(
    image='medical_image.nii.gz',
    mask='segmentation_mask.nii.gz'
)

# 3. Use features for machine learning
print(features.keys())  # Dictionary of feature vectors
# Example output: ['PersStats_H0_mean', 'PersStats_H0_std', ...]

# 4. Optional: Visualize barcodes
from medtda.plotting import plot_persistence_diagram
plot_persistence_diagram(barcodes)

License

This project is licensed under the MIT License - see the LICENSE file for details.

Citation

If you use MedTDA in your research, please cite:

@software{medtda2026,
  title = {Med-TDA: Medical Imaging Topological Data Analysis Tool},
  author = {Dashti A. Ali, Amber L. Simpson},
  year = {2026},
  url = {https://github.com/dashtiali/medtda}
}

Acknowledgments

  • Persistent homology computation powered by cripser
  • Vectorization methods based on GUDHI

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

medtda-0.1.0a3.tar.gz (1.2 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

medtda-0.1.0a3-py3-none-any.whl (48.8 kB view details)

Uploaded Python 3

File details

Details for the file medtda-0.1.0a3.tar.gz.

File metadata

  • Download URL: medtda-0.1.0a3.tar.gz
  • Upload date:
  • Size: 1.2 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for medtda-0.1.0a3.tar.gz
Algorithm Hash digest
SHA256 1d37e662d266cb9b3e6110dc3b3da64793a5ee6e087ed8aef2f031d7b5bdd51b
MD5 3d31fc743c8c798cff22e06dcc0ffc13
BLAKE2b-256 647ce81c3af8936080d8c9a376b873fd75464995924c39ed10d34b48709bff12

See more details on using hashes here.

File details

Details for the file medtda-0.1.0a3-py3-none-any.whl.

File metadata

  • Download URL: medtda-0.1.0a3-py3-none-any.whl
  • Upload date:
  • Size: 48.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for medtda-0.1.0a3-py3-none-any.whl
Algorithm Hash digest
SHA256 a861c34cf53b6364edb7793e84c0b22642ef055df6616a70ae8140b7f6e51259
MD5 3dfc123a494e0c816c1781331b568caf
BLAKE2b-256 be19f6d0d93db1fe9bd5cf78bb53dbd7b39e4a009b0a8c3bf80a84e5f2f1fcc3

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page