Skip to main content

Medical Imaging Topological Data Analysis - Extract TDA features from medical images for machine learning

Project description

Med-TDA: Medical Imaging Topological Data Analysis

License: MIT Python 3.8+

Med-TDA is a Python library for extracting Topological Data Analysis (TDA) features from medical images. It provides a complete pipeline from image preprocessing to persistence barcode computation and feature vectorization, designed specifically for medical imaging applications in machine learning and radiomics research.

Installation

pip install medtda

Requirements: Python ≥ 3.8

Core Dependencies: NumPy ≥1.21, SciPy ≥1.7, GUDHI ≥3.5, cripser ≥0.0.32, SimpleITK ≥2.1, Pillow ≥9.0, scikit-image ≥0.19, scikit-learn ≥1.0, pandas ≥1.3, matplotlib ≥3.5, seaborn ≥0.11, PyYAML ≥6.0, tqdm ≥4.60

Supported Image Formats

  • 2D Images: PNG, JPG, JPEG, TIFF, TIF, BMP
  • 3D/4D Medical Images: NIfTI (.nii, .nii.gz), NRRD (.nrrd), MetaImage (.mha, .mhd)
  • Mask Support: Single-label and multi-label segmentation masks in any supported format

Feature Extraction

The FeatureExtractor class provides an end-to-end solution that performs preprocessing, persistent homology computation, and feature vectorization in one step. This is the recommended way to extract TDA features.

from medtda import FeatureExtractor

# Initialize with desired settings
extractor = FeatureExtractor(
    normalize=True,
    normalize_method='minmax',
    vectorization_method='persistence_stats'
)

# Extract features from image and mask
features = extractor.execute(
    image='path/to/image.nii.gz',
    mask='path/to/mask.nii.gz'
)

# features is a dictionary of TDA feature vectors ready for ML

Available Vectorization Methods:

  • persistence_stats: Statistical summaries (mean, std, min, max, percentiles)
  • betti_curve: Betti number curves over filtration values
  • persistence_image: 2D histogram representation of persistence diagrams
  • persistence_landscape: Persistence landscape functions
  • entropy_summary: Entropy-based statistical features
  • persistence_silhouette: Silhouette representation
  • persistence_lifespan: Lifespan distribution features
  • persistence_tropical_coordinates: Tropical algebra coordinates

You can use multiple vectorization methods simultaneously:

extractor = FeatureExtractor(
    normalize=True,
    vectorization_method=['persistence_stats', 'betti_curve', 'entropy_summary']
)
features = extractor.execute(image, mask)  # Returns combined features from all methods

Image Preprocessing

The Preprocessor class handles medical image preprocessing independently. Use this when you need standalone preprocessing or want to inspect preprocessed images before computing persistence.

Available Operations:

  • Resampling: Resample 3D/4D images to target voxel spacing (e.g., isotropic resolution)
  • Windowing: Apply intensity windowing (center/width) for CT images
  • Normalization: Normalize intensity values (minmax, z-score, or robust scaling)
  • Masking: Apply binary or multi-label masks with configurable background values
  • ROI Cropping: Automatically crop to region of interest with padding to reduce computation
from medtda import Preprocessor

preprocessor = Preprocessor(
    spacing=(1.0, 1.0, 1.0),  # Resample to 1mm isotropic
    normalize=True,
    normalize_method='minmax',
    crop_to_roi=True
)

preprocessed_image, metadata = preprocessor.preprocess(
    image='ct_scan.nii.gz',
    mask='roi_mask.nii.gz'
)

The metadata dictionary contains information about applied transformations, original and final image ranges, shapes, and cropping details.

Barcode Computation

The BarcodeExtractor class computes raw persistence barcodes (birth-death pairs) from medical images without vectorization. Use this when you need barcodes for custom analysis or visualization.

Persistent Homology Parameters:

  • Filtration Type: sublevel (default) or superlevel filtration
  • Construction: T (default, dual cubical complex) or V (Vietoris-Rips)
  • Max Dimension: Maximum homology dimension to compute (auto-detected from image dimensionality)
from medtda import BarcodeExtractor

extractor = BarcodeExtractor(
    normalize=True,
    filtration_type='sublevel',
    max_dimension=2  # Compute H0, H1, H2
)

barcodes = extractor.execute(
    image='image.nii.gz',
    mask='mask.nii.gz'
)

# barcodes is a dict: {'H0': array, 'H1': array, 'H2': array}
# Each array has shape (n_features, 2) for (birth, death) pairs

Homology Dimensions:

  • H0: Connected components (captures regions and holes)
  • H1: Loops and tunnels (1-dimensional holes)
  • H2: Voids and cavities (2-dimensional holes, 3D only)
  • H3: 3-dimensional voids (4D images only)

Visualization

MedTDA provides 8 plotting functions for visualizing persistence barcodes:

from medtda.plotting import plot_persistence_diagram, plot_barcode, plot_betti_curve

# Visualize persistence diagram
plot_persistence_diagram(barcodes)

# Visualize barcode representation
plot_barcode(barcodes)

# Visualize Betti curves
plot_betti_curve(barcodes)

Available Plot Types:

  • plot_persistence_diagram: Birth-death diagram with diagonal
  • plot_barcode: Horizontal bars showing feature lifespans
  • plot_betti_curve: Betti number evolution across filtration values
  • plot_landscape: Persistence landscape functions
  • plot_entropy_summary: Entropy summary curves
  • plot_lifespan: Lifespan distribution curves
  • plot_silhouette: Persistence silhouette visualization
  • plot_tropical_coordinates: Tropical coordinate bar charts

All plots support multiple homology dimensions, custom color palettes, and seaborn styling.

Example Workflow

from medtda import FeatureExtractor

# 1. Initialize feature extractor with preprocessing and vectorization settings
extractor = FeatureExtractor(
    spacing=(1.0, 1.0, 1.0),          # Resample to 1mm isotropic
    normalize=True,                    # Apply normalization
    normalize_method='minmax',         # Use min-max normalization
    crop_to_roi=True,                  # Crop to ROI for efficiency
    filtration_type='sublevel',        # Sublevel filtration
    max_dimension=2,                   # Compute H0, H1, H2
    vectorization_method='persistence_stats',  # Statistical features
    return_barcodes=True               # Also return raw barcodes
)

# 2. Extract features (all preprocessing, PH computation, and vectorization in one call)
features, barcodes = extractor.execute(
    image='medical_image.nii.gz',
    mask='segmentation_mask.nii.gz'
)

# 3. Use features for machine learning
print(features.keys())  # Dictionary of feature vectors
# Example output: ['PersStats_H0_mean', 'PersStats_H0_std', ...]

# 4. Optional: Visualize barcodes
from medtda.plotting import plot_persistence_diagram
plot_persistence_diagram(barcodes)

License

This project is licensed under the MIT License - see the LICENSE file for details.

Citation

If you use MedTDA in your research, please cite:

@software{medtda2026,
  title = {Med-TDA: Medical Imaging Topological Data Analysis},
  author = {Med-TDA Contributors},
  year = {2026},
  url = {https://github.com/dashtiali/medtda}
}

Acknowledgments

  • Persistent homology computation powered by cripser
  • Vectorization methods based on GUDHI

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

medtda-0.1.0a1.tar.gz (1.2 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

medtda-0.1.0a1-py3-none-any.whl (48.5 kB view details)

Uploaded Python 3

File details

Details for the file medtda-0.1.0a1.tar.gz.

File metadata

  • Download URL: medtda-0.1.0a1.tar.gz
  • Upload date:
  • Size: 1.2 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for medtda-0.1.0a1.tar.gz
Algorithm Hash digest
SHA256 e23b474795ccba0060359b97c158efd34ab48aed6f2f534bc8ee0beafb01bdb4
MD5 8f3c59c3baaf8a6c697195ee805ee408
BLAKE2b-256 8fe0ca06d2e1a14b26fef76be2a7032cb39774f7e75313b3468d0f09a116ec94

See more details on using hashes here.

File details

Details for the file medtda-0.1.0a1-py3-none-any.whl.

File metadata

  • Download URL: medtda-0.1.0a1-py3-none-any.whl
  • Upload date:
  • Size: 48.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for medtda-0.1.0a1-py3-none-any.whl
Algorithm Hash digest
SHA256 d107d1e046252c12b0d7d0d754ca812e3a3385a0204cede09947af4b54a14fdd
MD5 77bfe406aa2f57baa8e3e542da7f2c99
BLAKE2b-256 a5cb311b1968d2b62a82d00f5b5898fb52ad5885fa650024f4e14ed892c292ef

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page