Multimodal Epigenetic Sequencing Analysis (MESA) is a flexible and sensitive method of capturing and integrating multimodal epigenetic information of cfDNA using a single experimental assay.
Project description
Multimodal Epigenetic Sequencing Analysis (MESA)
MESA is a Python package for sample-level multimodal cfDNA biomarker modeling. It provides a scikit-learn-style API for preprocessing, feature selection, optional redundancy pruning, modality-specific model fitting, stacked multimodal prediction, and cross-validation.
The package supports both classification and regression.
Installation
pip install mesa-cfdna
For local development:
pip install -e .
pytest -q tests
python scripts/run_smoke_checks.py
What MESA Does
- Handles missing-value filtering and imputation
- Applies variance filtering and univariate feature selection
- Optionally prunes redundant correlated features after the first selector
- Uses Boruta for secondary feature selection
- Trains single-modality predictors and stacked multimodal models
- Evaluates models with built-in cross-validation helpers
Core API
MESA_modality: single-modality pipelineMESA: multimodal stacking ensembleMESA_CV: cross-validation wrapper
Default task-aware estimators:
- classification:
RandomForestClassifier - regression:
RandomForestRegressor
predict_proba() and transform_predict_proba() are available only in classification mode.
Pipeline Figures
- Overview figure: compact pipeline summary for README or slides
- Detailed method figure: expanded schematic with task-aware branches
Regenerate both figures with:
source /data/homezvol0/chaoronc/miniconda3/etc/profile.d/conda.sh
conda activate py313
python scripts/generate_pipeline_figures.py
Quick Start
Classification
from mesa import MESA_modality, MESA, MESA_CV
modality_1 = MESA_modality(
top_n=50,
missing=0.2,
normalization=True,
redundancy_pruning="score",
redundancy_threshold=0.95,
random_state=42,
)
modality_2 = MESA_modality(
top_n=80,
missing=0.1,
redundancy_pruning="model",
redundancy_threshold=0.95,
random_state=42,
)
modality_1.fit(X1_train, y_train)
proba_1 = modality_1.transform_predict_proba(X1_test)
mesa = MESA([modality_1, modality_2], random_state=42)
mesa.fit([X1_train, X2_train], y_train)
ensemble_proba = mesa.predict_proba([X1_test, X2_test])
cv_eval = MESA_CV(MESA_modality(top_n=50, random_state=42))
cv_eval.fit(X1_train, y_train)
auc = cv_eval.get_performance()
Regression
from mesa import MESA_modality, MESA, MESA_CV
reg_modality_1 = MESA_modality(
task="regression",
top_n=50,
redundancy_pruning="score",
redundancy_threshold=0.95,
random_state=42,
)
reg_modality_2 = MESA_modality(
task="regression",
top_n=80,
random_state=42,
)
reg_modality_1.fit(X1_train, y_train_continuous)
pred_1 = reg_modality_1.transform_predict(X1_test)
reg_mesa = MESA(
[reg_modality_1, reg_modality_2],
task="regression",
random_state=42,
)
reg_mesa.fit([X1_train, X2_train], y_train_continuous)
ensemble_pred = reg_mesa.predict([X1_test, X2_test])
cv_eval = MESA_CV(
MESA_modality(task="regression", top_n=50, random_state=42),
task="regression",
)
cv_eval.fit(X1_train, y_train_continuous)
r2 = cv_eval.get_performance()
rmse = cv_eval.get_performance("neg_root_mean_squared_error")
Redundancy Pruning
MESA can prune correlated CpG-like features after the first univariate selector and before Boruta.
redundancy_pruning="score": keep the best feature in each correlated block using task-aware univariate rankingredundancy_pruning="model": keep the best feature in each correlated block using model-based cross-validated rankingredundancy_threshold: absolute correlation threshold used to define redundant blocksredundancy_method: correlation method, e.g."pearson"
This step is useful when neighboring or highly correlated features carry redundant signal and would otherwise crowd out other informative loci.
Key Parameters
For MESA_modality:
task:"classification"or"regression"top_n: number of Boruta-selected features to keepmissing: allowed missing fraction per featurevariance_threshold: minimum variance after imputationnormalization: whether to applyNormalizer()selector: integer or sklearn-compatible univariate selectorpredictor: final estimatorboruta_estimator: estimator used inside Boruta
For MESA_CV:
- classification default metric: ROC AUC
- regression default metric: R²
- supported regression metrics:
r2,neg_mean_squared_error,neg_root_mean_squared_error,pearson,spearman
Validation Assets
- demo.ipynb: original example notebook
- pruning_validation_demo.ipynb: pruning-focused synthetic validation
- regression_validation_demo.ipynb: regression validation notebook
- scripts/run_smoke_checks.py: notebook-free smoke test runner
Development Notes
- Use pandas
DataFrameinputs when possible so selected feature indices can be mapped back to columns cleanly. - For biological interpretation, validate any pruning or selector change on a subset before large runs; these changes can alter feature rankings and downstream performance.
- Human contributor guidance lives in CONTRIBUTING.md.
Citation
If you use MESA in research, cite:
Li, Y., Xu, J., Chen, C. et al. Multimodal epigenetic sequencing analysis (MESA) of cell-free DNA for non-invasive colorectal cancer detection. Genome Medicine 16, 9 (2024). https://doi.org/10.1186/s13073-023-01280-6
License
This repository is distributed under the terms in LICENSE.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file mesa_cfdna-0.7.1.tar.gz.
File metadata
- Download URL: mesa_cfdna-0.7.1.tar.gz
- Upload date:
- Size: 16.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5523bb42838cff492d71bc7fc56a97b8bc906378cbe4b3af4c0f549e37875105
|
|
| MD5 |
b46cf138f03ae4f73aa5a8ccffb2b671
|
|
| BLAKE2b-256 |
7235dd359124fe87a010183a3e89e2438c61fe82e42bc8b47e3773bb7adf8049
|
Provenance
The following attestation bundles were made for mesa_cfdna-0.7.1.tar.gz:
Publisher:
python-publish.yml on ChaorongC/mesa_cfdna
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
mesa_cfdna-0.7.1.tar.gz -
Subject digest:
5523bb42838cff492d71bc7fc56a97b8bc906378cbe4b3af4c0f549e37875105 - Sigstore transparency entry: 1078221599
- Sigstore integration time:
-
Permalink:
ChaorongC/mesa_cfdna@7ebefa0b8553274139ee74a7867e1929ec4d4d9f -
Branch / Tag:
refs/tags/v0.7.1 - Owner: https://github.com/ChaorongC
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python-publish.yml@7ebefa0b8553274139ee74a7867e1929ec4d4d9f -
Trigger Event:
release
-
Statement type:
File details
Details for the file mesa_cfdna-0.7.1-py3-none-any.whl.
File metadata
- Download URL: mesa_cfdna-0.7.1-py3-none-any.whl
- Upload date:
- Size: 13.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f6710b1f5fde8f30f1d9e1145c7a86ea14fe566529c1213fa03b0da566fba0e2
|
|
| MD5 |
856ad8b364f464237b9626f66485da82
|
|
| BLAKE2b-256 |
fbdebf6c36b80634184747095bea68a8815aa1efdd6881baf9f0a83bc8849b91
|
Provenance
The following attestation bundles were made for mesa_cfdna-0.7.1-py3-none-any.whl:
Publisher:
python-publish.yml on ChaorongC/mesa_cfdna
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
mesa_cfdna-0.7.1-py3-none-any.whl -
Subject digest:
f6710b1f5fde8f30f1d9e1145c7a86ea14fe566529c1213fa03b0da566fba0e2 - Sigstore transparency entry: 1078221603
- Sigstore integration time:
-
Permalink:
ChaorongC/mesa_cfdna@7ebefa0b8553274139ee74a7867e1929ec4d4d9f -
Branch / Tag:
refs/tags/v0.7.1 - Owner: https://github.com/ChaorongC
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python-publish.yml@7ebefa0b8553274139ee74a7867e1929ec4d4d9f -
Trigger Event:
release
-
Statement type: