Skip to main content

Multimodal Epigenetic Sequencing Analysis (MESA) is a flexible and sensitive method of capturing and integrating multimodal epigenetic information of cfDNA using a single experimental assay.

Project description

Multimodal Epigenetic Sequencing Analysis (MESA)

MESA is a Python package for sample-level multimodal cfDNA biomarker modeling. It provides a scikit-learn-style API for preprocessing, feature selection, optional redundancy pruning, modality-specific model fitting, stacked multimodal prediction, and cross-validation.

The package supports both classification and regression.

MESA pipeline overview

Installation

pip install mesa-cfdna

For local development:

pip install -e .
pytest -q tests
python scripts/run_smoke_checks.py

What MESA Does

  • Handles missing-value filtering and imputation
  • Applies variance filtering and univariate feature selection
  • Optionally prunes redundant correlated features after the first selector
  • Uses Boruta for secondary feature selection
  • Trains single-modality predictors and stacked multimodal models
  • Evaluates models with built-in cross-validation helpers

Core API

  • MESA_modality: single-modality pipeline
  • MESA: multimodal stacking ensemble
  • MESA_CV: cross-validation wrapper

Default task-aware estimators:

  • classification: RandomForestClassifier
  • regression: RandomForestRegressor

predict_proba() and transform_predict_proba() are available only in classification mode.

Pipeline Figures

Regenerate both figures with:

source /data/homezvol0/chaoronc/miniconda3/etc/profile.d/conda.sh
conda activate py313
python scripts/generate_pipeline_figures.py

Quick Start

Classification

from mesa import MESA_modality, MESA, MESA_CV

modality_1 = MESA_modality(
    top_n=50,
    missing=0.2,
    normalization=True,
    redundancy_pruning="score",
    redundancy_threshold=0.95,
    random_state=42,
)

modality_2 = MESA_modality(
    top_n=80,
    missing=0.1,
    redundancy_pruning="model",
    redundancy_threshold=0.95,
    random_state=42,
)

modality_1.fit(X1_train, y_train)
proba_1 = modality_1.transform_predict_proba(X1_test)

mesa = MESA([modality_1, modality_2], random_state=42)
mesa.fit([X1_train, X2_train], y_train)
ensemble_proba = mesa.predict_proba([X1_test, X2_test])

cv_eval = MESA_CV(MESA_modality(top_n=50, random_state=42))
cv_eval.fit(X1_train, y_train)
auc = cv_eval.get_performance()

Regression

from mesa import MESA_modality, MESA, MESA_CV

reg_modality_1 = MESA_modality(
    task="regression",
    top_n=50,
    redundancy_pruning="score",
    redundancy_threshold=0.95,
    random_state=42,
)

reg_modality_2 = MESA_modality(
    task="regression",
    top_n=80,
    random_state=42,
)

reg_modality_1.fit(X1_train, y_train_continuous)
pred_1 = reg_modality_1.transform_predict(X1_test)

reg_mesa = MESA(
    [reg_modality_1, reg_modality_2],
    task="regression",
    random_state=42,
)
reg_mesa.fit([X1_train, X2_train], y_train_continuous)
ensemble_pred = reg_mesa.predict([X1_test, X2_test])

cv_eval = MESA_CV(
    MESA_modality(task="regression", top_n=50, random_state=42),
    task="regression",
)
cv_eval.fit(X1_train, y_train_continuous)
r2 = cv_eval.get_performance()
rmse = cv_eval.get_performance("neg_root_mean_squared_error")

Redundancy Pruning

MESA can prune correlated CpG-like features after the first univariate selector and before Boruta.

  • redundancy_pruning="score": keep the best feature in each correlated block using task-aware univariate ranking
  • redundancy_pruning="model": keep the best feature in each correlated block using model-based cross-validated ranking
  • redundancy_threshold: absolute correlation threshold used to define redundant blocks
  • redundancy_method: correlation method, e.g. "pearson"

This step is useful when neighboring or highly correlated features carry redundant signal and would otherwise crowd out other informative loci.

Key Parameters

For MESA_modality:

  • task: "classification" or "regression"
  • top_n: number of Boruta-selected features to keep
  • missing: allowed missing fraction per feature
  • variance_threshold: minimum variance after imputation
  • normalization: whether to apply Normalizer()
  • selector: integer or sklearn-compatible univariate selector
  • predictor: final estimator
  • boruta_estimator: estimator used inside Boruta

For MESA_CV:

  • classification default metric: ROC AUC
  • regression default metric: R²
  • supported regression metrics: r2, neg_mean_squared_error, neg_root_mean_squared_error, pearson, spearman

Validation Assets

Development Notes

  • Use pandas DataFrame inputs when possible so selected feature indices can be mapped back to columns cleanly.
  • For biological interpretation, validate any pruning or selector change on a subset before large runs; these changes can alter feature rankings and downstream performance.
  • Human contributor guidance lives in CONTRIBUTING.md.

Citation

If you use MESA in research, cite:

Li, Y., Xu, J., Chen, C. et al. Multimodal epigenetic sequencing analysis (MESA) of cell-free DNA for non-invasive colorectal cancer detection. Genome Medicine 16, 9 (2024). https://doi.org/10.1186/s13073-023-01280-6

License

This repository is distributed under the terms in LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mesa_cfdna-0.7.1.tar.gz (16.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mesa_cfdna-0.7.1-py3-none-any.whl (13.2 kB view details)

Uploaded Python 3

File details

Details for the file mesa_cfdna-0.7.1.tar.gz.

File metadata

  • Download URL: mesa_cfdna-0.7.1.tar.gz
  • Upload date:
  • Size: 16.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for mesa_cfdna-0.7.1.tar.gz
Algorithm Hash digest
SHA256 5523bb42838cff492d71bc7fc56a97b8bc906378cbe4b3af4c0f549e37875105
MD5 b46cf138f03ae4f73aa5a8ccffb2b671
BLAKE2b-256 7235dd359124fe87a010183a3e89e2438c61fe82e42bc8b47e3773bb7adf8049

See more details on using hashes here.

Provenance

The following attestation bundles were made for mesa_cfdna-0.7.1.tar.gz:

Publisher: python-publish.yml on ChaorongC/mesa_cfdna

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file mesa_cfdna-0.7.1-py3-none-any.whl.

File metadata

  • Download URL: mesa_cfdna-0.7.1-py3-none-any.whl
  • Upload date:
  • Size: 13.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for mesa_cfdna-0.7.1-py3-none-any.whl
Algorithm Hash digest
SHA256 f6710b1f5fde8f30f1d9e1145c7a86ea14fe566529c1213fa03b0da566fba0e2
MD5 856ad8b364f464237b9626f66485da82
BLAKE2b-256 fbdebf6c36b80634184747095bea68a8815aa1efdd6881baf9f0a83bc8849b91

See more details on using hashes here.

Provenance

The following attestation bundles were made for mesa_cfdna-0.7.1-py3-none-any.whl:

Publisher: python-publish.yml on ChaorongC/mesa_cfdna

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page