Skip to main content

Medical Imaging Topological Data Analysis - Extract TDA features from medical images for machine learning

Project description

Med-TDA: Medical Imaging Topological Data Analysis

License: MIT Python 3.10+

Med-TDA is a Python library for extracting Topological Data Analysis (TDA) features from medical images. It provides a complete pipeline from image preprocessing to persistence barcode computation and feature vectorization, designed specifically for medical imaging applications in machine learning and radiomics research.

Installation

pip install medtda

Requirements: Python ≥ 3.10

Core Dependencies: NumPy ≥1.21, SciPy ≥1.7, GUDHI ≥3.5, cripser ≥0.0.32, SimpleITK ≥2.1, Pillow ≥9.0, scikit-image ≥0.19, scikit-learn ≥1.0, pandas ≥1.3, matplotlib ≥3.5, seaborn ≥0.11, PyYAML ≥6.0, tqdm ≥4.60

Supported Image Formats

  • 2D Images: PNG, JPG, JPEG, TIFF, TIF, BMP
  • 3D/4D Medical Images: NIfTI (.nii, .nii.gz), NRRD (.nrrd), MetaImage (.mha, .mhd)
  • Mask Support: Single-label and multi-label segmentation masks in any supported format

Feature Extraction

The FeatureExtractor class provides an end-to-end solution that performs preprocessing, persistent homology computation, and feature vectorization in one step. This is the recommended way to extract TDA features.

from medtda import FeatureExtractor

# Initialize with desired settings
extractor = FeatureExtractor(
    normalize=True,
    normalize_method='minmax',
    vectorization_method='persistence_stats'
)

# Extract features from image and mask
features = extractor.execute(
    image='path/to/image.nii.gz',
    mask='path/to/mask.nii.gz'
)

# features is a dictionary of TDA feature vectors ready for ML

Available Vectorization Methods:

  • persistence_stats: Statistical summaries (mean, std, min, max, percentiles)
  • betti_curve: Betti number curves over filtration values
  • persistence_image: 2D histogram representation of persistence diagrams
  • persistence_landscape: Persistence landscape functions
  • entropy_summary: Entropy-based statistical features
  • persistence_silhouette: Silhouette representation
  • persistence_lifespan: Lifespan distribution features
  • persistence_tropical_coordinates: Tropical algebra coordinates

You can use multiple vectorization methods simultaneously:

extractor = FeatureExtractor(
    normalize=True,
    vectorization_method=['persistence_stats', 'betti_curve', 'entropy_summary']
)
features = extractor.execute(image, mask)  # Returns combined features from all methods

Image Preprocessing

The Preprocessor class handles medical image preprocessing independently. Use this when you need standalone preprocessing or want to inspect preprocessed images before computing persistence.

Available Operations:

  • Resampling: Resample 3D/4D images to target voxel spacing (e.g., isotropic resolution)
  • Windowing: Apply intensity windowing (center/width) for CT images
  • Normalization: Normalize intensity values (minmax, z-score, or robust scaling)
  • Masking: Apply binary or multi-label masks with configurable background values
  • ROI Cropping: Automatically crop to region of interest with padding to reduce computation
from medtda import Preprocessor

preprocessor = Preprocessor(
    spacing=(1.0, 1.0, 1.0),  # Resample to 1mm isotropic
    normalize=True,
    normalize_method='minmax',
    crop_to_roi=True
)

preprocessed_image, metadata = preprocessor.preprocess(
    image='ct_scan.nii.gz',
    mask='roi_mask.nii.gz'
)

The metadata dictionary contains information about applied transformations, original and final image ranges, shapes, and cropping details.

Barcode Computation

The BarcodeExtractor class computes raw persistence barcodes (birth-death pairs) from medical images without vectorization. Use this when you need barcodes for custom analysis or visualization.

Persistent Homology Parameters:

  • Filtration Type: sublevel (default) or superlevel filtration
  • Construction: T (default, pixels/voxels as top-cells, 8-neighborhood in 2D) or V (pixels/voxels as 0-cells, 4-neighborhood in 2D)
  • Max Dimension: Maximum homology dimension to compute (auto-detected from image dimensionality)
from medtda import BarcodeExtractor

extractor = BarcodeExtractor(
    normalize=True,
    filtration_type='sublevel',
    max_dimension=2  # Compute H0, H1, H2
)

barcodes = extractor.execute(
    image='image.nii.gz',
    mask='mask.nii.gz'
)

# barcodes is a dict: {'H0': array, 'H1': array, 'H2': array}
# Each array has shape (n_features, 2) for (birth, death) pairs

Homology Dimensions:

  • H0: Connected components (captures regions and holes)
  • H1: Loops and tunnels (1-dimensional holes)
  • H2: Voids and cavities (2-dimensional holes, 3D only)
  • H3: 3-dimensional voids (4D images only)

Visualization

MedTDA provides 8 plotting functions for visualizing persistence barcodes:

from medtda.plotting import plot_persistence_diagram, plot_barcode, plot_betti_curve

# Visualize persistence diagram
plot_persistence_diagram(barcodes)

# Visualize barcode representation
plot_barcode(barcodes)

# Visualize Betti curves
plot_betti_curve(barcodes)

Available Plot Types:

  • plot_persistence_diagram: Birth-death diagram with diagonal
  • plot_barcode: Horizontal bars showing feature lifespans
  • plot_betti_curve: Betti number evolution across filtration values
  • plot_landscape: Persistence landscape functions
  • plot_entropy_summary: Entropy summary curves
  • plot_lifespan: Lifespan distribution curves
  • plot_silhouette: Persistence silhouette visualization
  • plot_tropical_coordinates: Tropical coordinate bar charts

All plots support multiple homology dimensions, custom color palettes, and seaborn styling.

Example Workflow

from medtda import FeatureExtractor

# 1. Initialize feature extractor with preprocessing and vectorization settings
extractor = FeatureExtractor(
    spacing=(1.0, 1.0, 1.0),          # Resample to 1mm isotropic
    normalize=True,                    # Apply normalization
    normalize_method='minmax',         # Use min-max normalization
    crop_to_roi=True,                  # Crop to ROI for efficiency
    filtration_type='sublevel',        # Sublevel filtration
    max_dimension=2,                   # Compute H0, H1, H2
    vectorization_method='persistence_stats',  # Statistical features
    return_barcodes=True               # Also return raw barcodes
)

# 2. Extract features (all preprocessing, PH computation, and vectorization in one call)
features, barcodes = extractor.execute(
    image='medical_image.nii.gz',
    mask='segmentation_mask.nii.gz'
)

# 3. Use features for machine learning
print(features.keys())  # Dictionary of feature vectors
# Example output: ['PersStats_H0_mean', 'PersStats_H0_std', ...]

# 4. Optional: Visualize barcodes
from medtda.plotting import plot_persistence_diagram
plot_persistence_diagram(barcodes)

License

This project is licensed under the MIT License - see the LICENSE file for details.

Citation

If you use MedTDA in your research, please cite:

@software{medtda2026,
  title = {Med-TDA: Medical Imaging Topological Data Analysis},
  author = {Med-TDA Contributors},
  year = {2026},
  url = {https://github.com/dashtiali/medtda}
}

Acknowledgments

  • Persistent homology computation powered by cripser
  • Vectorization methods based on GUDHI

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

medtda-0.1.0a2.tar.gz (1.2 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

medtda-0.1.0a2-py3-none-any.whl (48.6 kB view details)

Uploaded Python 3

File details

Details for the file medtda-0.1.0a2.tar.gz.

File metadata

  • Download URL: medtda-0.1.0a2.tar.gz
  • Upload date:
  • Size: 1.2 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for medtda-0.1.0a2.tar.gz
Algorithm Hash digest
SHA256 befe58599e0e747c3a91dad0d68e6fac46389db36fa58908375cbe2fa9c95b44
MD5 e05472b7f27270ac33288daf369df04c
BLAKE2b-256 1061622b2c3de8b09e567736d384eefec6cd6d76602b54a28af4578da65617b0

See more details on using hashes here.

File details

Details for the file medtda-0.1.0a2-py3-none-any.whl.

File metadata

  • Download URL: medtda-0.1.0a2-py3-none-any.whl
  • Upload date:
  • Size: 48.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for medtda-0.1.0a2-py3-none-any.whl
Algorithm Hash digest
SHA256 15cbae43f6fded5041146580969347264694aff88e156a19ec555d6f2821d38e
MD5 3e871ae977b407088653587a5bb884a8
BLAKE2b-256 ce5ebbbea396929504b28d742d963bd95669f81534d5e2d82ac405ecee7c42d0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page