Multimodal Epigenetic Sequencing Analysis (MESA) is a flexible and sensitive method of capturing and integrating multimodal epigenetic information of cfDNA using a single experimental assay.

These details have been verified by PyPI

Project links

Homepage

GitHub Statistics

Maintainers

crchen

These details have not been verified by PyPI

Project description

Multimodal Epigenetic Sequencing Analysis (MESA)

A flexible and sensitive method for capturing and integrating multimodal epigenetic information from cell-free DNA (cfDNA) using a single experimental assay.

Overview

MESA (Multimodal Epigenetic Sequencing Analysis) provides a comprehensive framework for analyzing multimodal epigenetic data from cfDNA. The package features a sklearn-compatible API that seamlessly integrates preprocessing, scaling, feature selection, model training, and cross-validation workflows.

Key Features

Multimodal Integration: Combine multiple epigenetic data modalities using ensemble stacking
Advanced Feature Selection: Boruta algorithm combined with univariate selection to keep a balance between computation time and biomarker discovery
Robust Cross-Validation: Built-in evaluation framework with performance metrics for easy finetuning
Flexible Pipeline: Customizable preprocessing and classification components
Missing Value Handling: Intelligent filtering and imputation strategies

Installation

# Install package with pip
pip install mesa-cfdna

Quick Start

from mesa import MESA_modality, MESA, MESA_CV
from sklearn.ensemble import RandomForestClassifier
import pandas as pd

# Load your data
X_train, y_train = load_data()  # Your data loading function

# Single modality analysis
modality_1 = MESA_modality(top_n=50, classifier=RandomForestClassifier(random_state=0), variance_threshold=0, normalization=True)
modality_1.fit(X_train, y_train)
predictions = modality_1.transform_predict_proba(X_test)

modality_2 = MESA_modality(top_n=100, classifier=LogisticRegression(random_state=0), variance_threshold=0, normalization=False, missing=0)
modality_2.fit(X_train, y_train)
predictions = modality_2.transform_predict_proba(X_test)

# Multi-modality ensemble
modalities = [modality_1, modality_2]
mesa = MESA(modalities)
mesa.fit([X1_train, X2_train], y_train)
mesa_predictions = mesa.predict_proba([X1_test, X2_test])

API Reference

MESA_modality

Single modality analysis with comprehensive preprocessing and feature selection pipeline.

Parameters

Parameter	Type	Default	Description
`top_n`	int	100	Number of features to select using Boruta algorithm
`variance_threshold`	float	0	Minimum variance threshold for feature filtering
`normalization`	bool	False	Whether to apply L2 normalization
`missing`	float	0.1	Maximum proportion of missing values allowed per feature
`classifier`	estimator	RandomForestClassifier()	Final classifier for predictions
`selector`	int/estimator	GenericUnivariateSelect()	Univariate feature selector
`boruta_estimator`	estimator	RandomForestClassifier()	Base estimator for Boruta selection
`random_state`	int	0	Random seed for reproducibility

Methods

fit(X, y): Fit the preprocessing pipeline and classifier
transform(X): Apply preprocessing pipeline only
predict(X): Predict class labels for preprocessed data
predict_proba(X): Predict class probabilities for preprocessed data
transform_predict(X): Apply pipeline and predict in one step
transform_predict_proba(X): Apply pipeline and predict probabilities
get_support(step=None): Get indices of selected features
get_params(deep=True): Get model parameters

MESA

Multi-modality ensemble with stacking architecture for integrating multiple data types.

Parameters

Parameter	Type	Default	Description
`modalities`	list	Required	List of MESA_modality objects
`meta_estimator`	estimator	LogisticRegression()	Meta-learner for ensemble combination
`random_state`	int	0	Random seed for reproducibility
`cv`	cv generator	RepeatedStratifiedKFold()	Cross-validation strategy for meta-features

Methods

fit(X_list, y): Fit all modalities and meta-estimator
predict(X_list_test): Predict class labels using ensemble
predict_proba(X_list_test): Predict class probabilities using ensemble
get_support(step=None): Get feature support from all modalities

MESA_CV

Cross-validation wrapper for performance evaluation of MESA models.

Parameters

Parameter	Type	Default	Description
`modality`	estimator	Required	MESA_modality or MESA object to evaluate
`random_state`	int	0	Random seed for reproducibility
`cv`	cv generator	StratifiedKFold(n_splits=5)	Cross-validation strategy

Methods

fit(X, y): Perform cross-validation on provided data
get_performance(): Calculate mean ROC AUC score across CV folds

Attributes

cv_result: List of (y_pred, y_true) tuples from each CV fold
modality: The fitted modality estimator being evaluated

Usage Examples

Example 1: Single Modality Analysis

import pandas as pd
import numpy as np
from mesa import MESA_modality, MESA_CV
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import StratifiedKFold

# Load single modality data
X = pd.read_csv('methylation_data.csv', index_col=0)
y = pd.read_csv('labels.csv', index_col=0).values.ravel()

# Create and configure modality
modality = MESA_modality(
    top_n=50,
    missing=0.2,
    normalization=True,
    classifier=RandomForestClassifier(n_estimators=100, random_state=42)
)

# Fit the modality
modality.fit(X, y)

# Make predictions on new data
X_test = pd.read_csv('test_data.csv', index_col=0)
predictions = modality.transform_predict_proba(X_test)
print(f"Prediction probabilities shape: {predictions.shape}")

# Get selected features
selected_features = modality.get_support()
print(f"Number of selected features: {len(selected_features)}")

# Cross-validation evaluation
cv_eval = MESA_CV(
    modality=MESA_modality(top_n=50, missing=0.2),
    cv=StratifiedKFold(n_splits=10, shuffle=True, random_state=42)
)
cv_eval.fit(X, y)
auc_score = cv_eval.get_performance()
print(f"Cross-validation AUC: {auc_score:.3f}")

Example 2: Multi-Modality Ensemble

import pandas as pd
from mesa import MESA_modality, MESA, MESA_CV
from sklearn.ensemble import RandomForestClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC

# Load multi-modal data
methylation_data = pd.read_csv('methylation.csv', index_col=0)
histone_data = pd.read_csv('histone_marks.csv', index_col=0)
chromatin_data = pd.read_csv('chromatin_accessibility.csv', index_col=0)
y = pd.read_csv('labels.csv', index_col=0).values.ravel()

# Define modality-specific configurations
modalities = [
    MESA_modality(
        top_n=100,
        missing=0.1,
        classifier=RandomForestClassifier(n_estimators=200, random_state=42),
        normalization=True
    ),
    MESA_modality(
        top_n=80,
        missing=0.15,
        classifier=SVC(probability=True, random_state=42),
        normalization=False
    ),
    MESA_modality(
        top_n=60,
        missing=0.2,
        classifier=LogisticRegression(random_state=42),
        normalization=True
    )
]

# Create MESA ensemble
mesa = MESA(
    modalities=modalities,
    meta_estimator=LogisticRegression(random_state=42),
    random_state=42
)

# Fit the ensemble
X_list = [methylation_data, histone_data, chromatin_data]
mesa.fit(X_list, y)

# Make ensemble predictions
X_test_list = [
    pd.read_csv('methylation_test.csv', index_col=0),
    pd.read_csv('histone_test.csv', index_col=0),
    pd.read_csv('chromatin_test.csv', index_col=0)
]
ensemble_predictions = mesa.predict_proba(X_test_list)
print(f"Ensemble predictions shape: {ensemble_predictions.shape}")

# Get feature support from all modalities
feature_supports = mesa.get_support()
for i, support in enumerate(feature_supports):
    print(f"Modality {i+1}: {len(support)} features selected")

Example 3: Cross-Validation for Multi-Modality

from mesa import MESA, MESA_modality, MESA_CV
from sklearn.model_selection import RepeatedStratifiedKFold

# Define ensemble for CV evaluation
modalities = [
    MESA_modality(top_n=50, missing=0.1),
    MESA_modality(top_n=40, missing=0.15),
    MESA_modality(top_n=60, missing=0.2)
]

mesa_ensemble = MESA(
    modalities=modalities,
    cv=RepeatedStratifiedKFold(n_splits=5, n_repeats=5, random_state=42)
)

# Cross-validation evaluation
cv_eval = MESA_CV(
    modality=mesa_ensemble,
    cv=StratifiedKFold(n_splits=5, shuffle=True, random_state=42)
)

# Perform CV on multi-modal data
X_list = [methylation_data, histone_data, chromatin_data]
cv_eval.fit(X_list, y)

# Get performance metrics
mean_auc = cv_eval.get_performance()
print(f"Multi-modal ensemble CV AUC: {mean_auc:.3f}")

# Access individual fold results
for i, (y_pred, y_true) in enumerate(cv_eval.cv_result):
    fold_auc = roc_auc_score(y_true, y_pred[:, 1])
    print(f"Fold {i+1} AUC: {fold_auc:.3f}")

Example 4: Custom Feature Selection Pipeline

from mesa import MESA_modality
from sklearn.feature_selection import SelectKBest, f_classif
from sklearn.ensemble import GradientBoostingClassifier

# Custom modality with different feature selection
custom_modality = MESA_modality(
    top_n=30,
    variance_threshold=0.01,
    missing=0.05,
    normalization=True,
    selector=SelectKBest(score_func=f_classif, k=1000),
    classifier=GradientBoostingClassifier(n_estimators=100, random_state=42),
    boruta_estimator=GradientBoostingClassifier(n_estimators=50, random_state=42)
)

# Fit and evaluate
custom_modality.fit(X, y)
custom_predictions = custom_modality.transform_predict_proba(X_test)

# Compare with default configuration
default_modality = MESA_modality()
cv_custom = MESA_CV(custom_modality)
cv_default = MESA_CV(default_modality)

cv_custom.fit(X, y)
cv_default.fit(X, y)

print(f"Custom configuration AUC: {cv_custom.get_performance():.3f}")
print(f"Default configuration AUC: {cv_default.get_performance():.3f}")

Example 5: Feature Importance Analysis

# Analyze feature importance across modalities
modality = MESA_modality(top_n=100)
modality.fit(X, y)

# Get feature support at different pipeline steps
missing_support = modality.get_support(step=0)  # After missing value filtering
variance_support = modality.get_support(step=1)  # After variance filtering
univariate_support = modality.get_support(step=2)  # After univariate selection
final_support = modality.get_support()  # Final selected features

print(f"Features after missing value filter: {len(missing_support)}")
print(f"Features after variance filter: {len(variance_support)}")
print(f"Features after univariate selection: {len(univariate_support)}")
print(f"Final selected features: {len(final_support)}")

# Get feature names if using DataFrame
if hasattr(X, 'columns'):
    selected_feature_names = X.columns[final_support]
    print(f"Selected features: {selected_feature_names.tolist()}")

Performance Tips

Memory Management: For large datasets, consider reducing top_n and using n_jobs=1 for Boruta
Feature Selection: Adjust missing threshold based on data quality
Cross-Validation: Use fewer repeats for initial exploration, more for final evaluation
Ensemble Size: Start with 2-3 modalities, add more based on performance gains

Citation

If you use MESA in your research, please cite:

Li, Y., Xu, J., Chen, C. et al. Multimodal epigenetic sequencing analysis (MESA) of cell-free DNA for non-invasive colorectal cancer detection. Genome Med 16, 9 (2024). https://doi.org/10.1186/s13073-023-01280-6

Authors

Chaorong Chen - Lead Developer - c.chen@uci.edu
Wei Li - Principal Investigator - wei.li@uci.edu

License

This project is licensed under the MIT License - see the LICENSE file for details.

Contributing

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

Support

For questions and support:

Open an issue on GitHub
Email: c.chen@uci.edu
Documentation: [Link to full documentation]

Keywords: cfDNA, epigenetics, multimodal analysis, machine learning, feature selection, ensemble learning, stacking, bioinformatics, biomarker discovery, methylation, computational biology, early detection

Project details

These details have been verified by PyPI

Project links

Homepage

GitHub Statistics

Maintainers

crchen

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.7.1

Mar 11, 2026

0.6.0

Jun 6, 2025

0.5.0

May 24, 2025

This version

0.2.0

May 23, 2025

0.1.2

Mar 19, 2025

0.1.1

Mar 19, 2025

0.1.0

Mar 19, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mesa_cfdna-0.2.0.tar.gz (18.5 kB view details)

Uploaded May 23, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

mesa_cfdna-0.2.0-py3-none-any.whl (15.8 kB view details)

Uploaded May 23, 2025 Python 3

File details

Details for the file mesa_cfdna-0.2.0.tar.gz.

File metadata

Download URL: mesa_cfdna-0.2.0.tar.gz
Upload date: May 23, 2025
Size: 18.5 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for mesa_cfdna-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`443d8970ab7bc591d5c3765070d98fc8273035aea93a4e98d9de4ac7161e511b`
MD5	`02a0fd7c6252bd7d637031f322224cf3`
BLAKE2b-256	`72dbb223160d6d2e9a88065c71f301a9cd8537b7f8d2fd4f196cb56f2e80c983`

See more details on using hashes here.

Provenance

The following attestation bundles were made for mesa_cfdna-0.2.0.tar.gz:

Publisher: python-publish.yml on ChaorongC/mesa_cfdna

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: mesa_cfdna-0.2.0.tar.gz
- Subject digest: 443d8970ab7bc591d5c3765070d98fc8273035aea93a4e98d9de4ac7161e511b
- Sigstore transparency entry: 219053222
- Sigstore integration time: May 23, 2025
Source repository:
- Permalink: ChaorongC/mesa_cfdna@94f3bde0bc5d1d58574d07418c3f6cae51d9f6a0
- Branch / Tag: refs/tags/0.2.1
- Owner: https://github.com/ChaorongC
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: python-publish.yml@94f3bde0bc5d1d58574d07418c3f6cae51d9f6a0
- Trigger Event: release

File details

Details for the file mesa_cfdna-0.2.0-py3-none-any.whl.

File metadata

Download URL: mesa_cfdna-0.2.0-py3-none-any.whl
Upload date: May 23, 2025
Size: 15.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for mesa_cfdna-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`ca26093cbbb72b4ada736bd56adbdbbe01775042fcd6fb567a068a711c1fcdd5`
MD5	`fad57e02760983f64abd799ddfe358ad`
BLAKE2b-256	`b217f91110bc3e18c387d9144fe38a23097b74232fbd0c2ecd68e6e22890bd92`

See more details on using hashes here.

Provenance

The following attestation bundles were made for mesa_cfdna-0.2.0-py3-none-any.whl:

Publisher: python-publish.yml on ChaorongC/mesa_cfdna

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: mesa_cfdna-0.2.0-py3-none-any.whl
- Subject digest: ca26093cbbb72b4ada736bd56adbdbbe01775042fcd6fb567a068a711c1fcdd5
- Sigstore transparency entry: 219053227
- Sigstore integration time: May 23, 2025
Source repository:
- Permalink: ChaorongC/mesa_cfdna@94f3bde0bc5d1d58574d07418c3f6cae51d9f6a0
- Branch / Tag: refs/tags/0.2.1
- Owner: https://github.com/ChaorongC
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: python-publish.yml@94f3bde0bc5d1d58574d07418c3f6cae51d9f6a0
- Trigger Event: release

mesa-cfdna 0.2.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Project description

Multimodal Epigenetic Sequencing Analysis (MESA)

Overview

Key Features

Installation

Quick Start

API Reference

MESA_modality

Parameters

Methods

MESA

Parameters

Methods

MESA_CV

Parameters

Methods

Attributes

Usage Examples

Example 1: Single Modality Analysis

Example 2: Multi-Modality Ensemble

Example 3: Cross-Validation for Multi-Modality

Example 4: Custom Feature Selection Pipeline

Example 5: Feature Importance Analysis

Performance Tips

Citation

Authors

License

Contributing

Support

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance