Skip to main content

dmeth: A toolkit for comprehensive, transparent, and reproducible DNA methylation analysis

Project description

dmeth: Differential Methylation Analysis Toolkit

A fast, statistically rigorous Python framework providing a toolkit for DNA methylation analysis - from raw beta matrices to biomarkers and functional interpretation. dmeth implements the full modern differential methylation pipeline used in high-impact epigenome-wide association studies (EWAS), with performance and correctness on par with established R/bioconductor tools, all in pure Python.

Key Features

Feature Implementation Performance
Empirical Bayes moderated t-tests limma-style (Smyth 2004) with exact replication Numba-accelerated (10–100× faster)
Memory-efficient chunked analysis Automatic fallback for >1M probes <4 GB RAM typical
Cell-type deconvolution Reference-based NNLS (Houseman/Horvath-style) Parallel joblib
DMR discovery Sliding-window clustering + gap merging Vectorized
Gene annotation & pathway enrichment IntervalTree + Fisher’s exact (FDR) Sub-second on 450k/EPIC
Coordinate liftover (hg19 ↔ hg38) pyliftover integration Per-region tracking
Biomarker panel discovery & validation RF / Elastic Net + stratified CV Built-in
Robust preprocessing & QC Missingness, group representation, imputation Production-safe

Fully supports Illumina 450K, EPIC (850K), and any custom CpG × sample matrix.

Quick Start

pip install "dmeth[full]"
import pandas as pd

from dmeth.io.readers import load_methylation_data
from dmeth.core.analysis.preparation import filter_cpgs_by_missingness, impute_missing_values
from dmeth.core.analysis.validation import validate_design, validate_contrast
from dmeth.core.analysis.core_analysis import fit_differential
from dmeth.core.downstream.annotation import find_dmrs_by_sliding_window

# 1. Load your data
# beta: CpG x samples matrix
# pheno: sample metadata with a 'group' column
beta = pd.read_csv("beta_matrix.csv", index_col=0)
pheno = pd.read_csv("phenotype.csv", index_col=0)

# 2. Preprocessing
# Drop CpGs with too much missingness
beta_clean, _, _ = filter_cpgs_by_missingness(beta, max_missing_rate=0.2)

# Impute remaining missing values (kNN)
beta_imp = impute_missing_values(beta_clean, method="knn", k=10)

# 3. Differential analysis (case vs control)
# Build design matrix from phenotype
design = validate_design(pheno["group"])
contrast = validate_contrast(design, "case-control")

# Fit
res = fit_differential(
    M=beta_imp,
    design=pd.DataFrame(design, index=beta_imp.columns),
    contrast=contrast,
    shrink="smyth",
    robust=True,
)

# 4. Discover DMRs
annotation = pd.read_csv("cpg_annotation.csv", index_col=0)  # must include chr, pos columns
dmrs = find_dmrs_by_sliding_window(
    dms=res[res["padj"] < 0.05],
    annotation=annotation,
    max_gap=500,
    min_cpgs=3,
)

print(f"Found {len(dmrs)} DMRs")
print(dmrs.head())

Installation

# Minimal (no speed, annotation, and other extras)
pip install dmeth

# Recommended: full scientific environment
pip install "dmeth[full]"

# Development
pip install "dmeth[full,dev]"

Optional extras (dmeth[full]):

  • speed: numba, combat (highly recommended)
  • annotation: intervaltree, pyliftover
  • parallel: joblib
  • format: PyYAML, toml, h5py, xlrd
  • plotting: plotly, umap-learn
  • io: pyarrow, tables, openpyxl, xlsxwriter

Optional dev extras (dmeth[dev]):

pytest, pytest-cov, black, isort, flake8, flake8-pyproject, flake8-bugbear, bandit, mypy, mkdocs, mkdocs-material

Documentation

Full documentation with tutorials, API reference, and reproducibility examples: User Guide

Citation

If you use dmeth in your research, please cite:

@software{dmeth2025,
  author = {Afolabi, Dare},
  title = {dmeth: A comprehensive Python toolkit for differential DNA methylation analysis with empirical Bayes moderation and biomarker discovery},
  version = {0.1.0},
  year = {2025},
  publisher = {GitHub},
  url = {https://github.com/dare-afolabi/dmeth},
  doi = {10.5281/zenodo.XXXXXXX},  # will be assigned upon release
}

References

  • Smyth, G. K. (2004). Linear models and empirical bayes methods for assessing differential expression in microarray experiments. Statistical Applications in Genetics and Molecular Biology, 3(1).
  • Liu, P., & Hwang, J.T.G. (2007). Quick calculation for sample size while controlling false discovery rate with application to microarray analysis. Bioinformatics, 23(6), 739–746.
  • Du, P., Zhang, X., Huang, C.-C., Jafari, N., Kibbe, W.A., Hou, L., & Lin, S. (2010). Comparison of Beta-value and M-value methods for quantifying methylation levels by microarray analysis. BMC Bioinformatics, 11:587.
  • Jung, S.H., Young, S.S. (2012). Power and sample size calculation for microarray studies. Journal of Biopharmaceutical Statistics, 22(1):30-42.
  • Phipson, B. et al. (2016). missMethyl: an R package for analyzing data from Illumina’s HumanMethylation450 platform. Bioinformatics, 32(2), 286-288.

Support

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dmeth-0.1.0.tar.gz (95.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dmeth-0.1.0-py3-none-any.whl (84.7 kB view details)

Uploaded Python 3

File details

Details for the file dmeth-0.1.0.tar.gz.

File metadata

  • Download URL: dmeth-0.1.0.tar.gz
  • Upload date:
  • Size: 95.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for dmeth-0.1.0.tar.gz
Algorithm Hash digest
SHA256 a864169cce2d377520ae1b97ff3ae3add8c1258140da8576925164c023b628ef
MD5 dca676574a9db53a73af9cc56519db27
BLAKE2b-256 8fc28136216793b39b1537fd01897747ff543b694a93e488ca37c6556fb42e74

See more details on using hashes here.

File details

Details for the file dmeth-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: dmeth-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 84.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for dmeth-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 996e508ad496e10b1fcb88caf1a3e4b0e3d1a722ae06df53bd02f21113735cac
MD5 5a776b6eb5a06427c0807bffb65e4fc2
BLAKE2b-256 87ca55f8de6877c06775590ad0f2c6cf935f21d444e8855de80fe9dffd01aba3

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page