dmeth: A toolkit for comprehensive, transparent, and reproducible DNA methylation analysis
Project description
dmeth: Differential Methylation Analysis Toolkit
A fast, statistically rigorous Python framework providing a toolkit for DNA methylation analysis - from raw beta matrices to biomarkers and functional interpretation. dmeth implements the full modern differential methylation pipeline used in high-impact epigenome-wide association studies (EWAS), with performance and correctness on par with established R/bioconductor tools, all in pure Python.
Key Features
| Feature | Implementation | Performance |
|---|---|---|
| Empirical Bayes moderated t-tests | limma-style (Smyth 2004) with exact replication | Numba-accelerated (10–100× faster) |
| Memory-efficient chunked analysis | Automatic fallback for >1M probes | <4 GB RAM typical |
| Cell-type deconvolution | Reference-based NNLS (Houseman/Horvath-style) | Parallel joblib |
| DMR discovery | Sliding-window clustering + gap merging | Vectorized |
| Gene annotation & pathway enrichment | IntervalTree + Fisher’s exact (FDR) | Sub-second on 450k/EPIC |
| Coordinate liftover (hg19 ↔ hg38) | pyliftover integration | Per-region tracking |
| Biomarker panel discovery & validation | RF / Elastic Net + stratified CV | Built-in |
| Robust preprocessing & QC | Missingness, group representation, imputation | Production-safe |
Fully supports Illumina 450K, EPIC (850K), and any custom CpG × sample matrix.
Quick Start
pip install "dmeth[full]"
import pandas as pd
from dmeth.io.readers import load_methylation_data
from dmeth.core.analysis.preparation import filter_cpgs_by_missingness, impute_missing_values
from dmeth.core.analysis.validation import build_design, validate_contrast
from dmeth.core.analysis.core_analysis import fit_differential
from dmeth.core.downstream.annotation import find_dmrs_by_sliding_window
# 1. Load your data
# beta: CpG x samples matrix
# pheno: sample metadata with a 'group' column
beta = pd.read_csv("beta_matrix.csv", index_col=0)
pheno = pd.read_csv("phenotype.csv", index_col=0)
# 2. Preprocessing
# Drop CpGs with too much missingness
beta_clean, _, _ = filter_cpgs_by_missingness(beta, max_missing_rate=0.2)
# Impute remaining missing values (kNN)
beta_imp = impute_missing_values(beta_clean, method="knn", k=10)
# 3. Differential analysis (case vs control)
# Build design matrix from phenotype
design = validate_design(pheno["group"])
contrast = validate_contrast(design, "case-control")
# Fit
res = fit_differential(
M=beta_imp,
design=pd.DataFrame(design, index=beta_imp.columns),
contrast=contrast,
shrink="smyth",
robust=True,
)
# 4. Discover DMRs
annotation = pd.read_csv("cpg_annotation.csv", index_col=0) # must include chr, pos columns
dmrs = find_dmrs_by_sliding_window(
dms=res[res["padj"] < 0.05],
annotation=annotation,
max_gap=500,
min_cpgs=3,
)
print(f"Found {len(dmrs)} DMRs")
print(dmrs.head())
Installation
# Minimal (no speed, annotation, and other extras)
pip install dmeth
# Recommended: full scientific environment
pip install "dmeth[full]"
# Development
pip install "dmeth[full,dev]"
Optional extras (dmeth[full]):
- speed: numba, combat (highly recommended)
- annotation: intervaltree, pyliftover
- parallel: joblib
- format: PyYAML, toml, h5py, xlrd
- plotting: plotly, umap-learn
- io: pyarrow, tables, openpyxl, xlsxwriter
Optional dev extras (dmeth[dev]):
pytest, pytest-cov, black, isort, flake8, flake8-pyproject, flake8-bugbear, bandit, mkdocs, mkdocs-material
Documentation
Full documentation with tutorials, API reference, and reproducibility examples: User Guide
Citation
If you use dmeth in your research, please cite:
@software{dmeth2025,
author = {Afolabi, Dare},
title = {dmeth: A comprehensive Python toolkit for differential DNA methylation analysis with empirical Bayes moderation and biomarker discovery},
version = {0.2.0},
year = {2025},
publisher = {GitHub},
doi = {10.5281/zenodo.17777501},
url = {https://doi.org/10.5281/zenodo.17777501},
}
References
- Smyth, G. K. (2004). Linear models and empirical bayes methods for assessing differential expression in microarray experiments. Statistical Applications in Genetics and Molecular Biology, 3(1).
- Liu, P., & Hwang, J.T.G. (2007). Quick calculation for sample size while controlling false discovery rate with application to microarray analysis. Bioinformatics, 23(6), 739–746.
- Du, P., Zhang, X., Huang, C.-C., Jafari, N., Kibbe, W.A., Hou, L., & Lin, S. (2010). Comparison of Beta-value and M-value methods for quantifying methylation levels by microarray analysis. BMC Bioinformatics, 11:587.
- Jung, S.H., Young, S.S. (2012). Power and sample size calculation for microarray studies. Journal of Biopharmaceutical Statistics, 22(1):30-42.
- Phipson, B. et al. (2016). missMethyl: an R package for analyzing data from Illumina’s HumanMethylation450 platform. Bioinformatics, 32(2), 286-288.
Support
- Issues: GitHub Issues
- Discussions: GitHub Discussions
- Email: dare.afolabi@outlook.com ook.com)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file dmeth-0.2.0.tar.gz.
File metadata
- Download URL: dmeth-0.2.0.tar.gz
- Upload date:
- Size: 99.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0abe42b97ae66c46c4d80ecfbfc2f22c66414b2c3b68d6c0132228f9eea5e2a5
|
|
| MD5 |
02dfe69e12386499b8ee2ffb3b857593
|
|
| BLAKE2b-256 |
b5fdd70a01c1dfdc26d73d6ad6dba840a25a4e2df7630b540fb2d13c9b753b0b
|
File details
Details for the file dmeth-0.2.0-py3-none-any.whl.
File metadata
- Download URL: dmeth-0.2.0-py3-none-any.whl
- Upload date:
- Size: 88.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d32ee1b1a88f05b4eaeeeb949e12a00d1a6bf0edc504fed4e3fae42dd1d46339
|
|
| MD5 |
e8197b57d0f6cd38e73291a9c3b0aef1
|
|
| BLAKE2b-256 |
bab2e67235e562037078af63780d4affa27057eb24162d24591aecfb5b920cec
|