Skip to main content

dmeth: A toolkit for comprehensive, transparent, and reproducible DNA methylation analysis

Project description

dmeth: Differential Methylation Analysis Toolkit

A fast, statistically rigorous Python framework providing a toolkit for DNA methylation analysis - from raw beta matrices to biomarkers and functional interpretation. dmeth implements the full modern differential methylation pipeline used in high-impact epigenome-wide association studies (EWAS), with performance and correctness on par with established R/bioconductor tools, all in pure Python.

Key Features

Feature Implementation Performance
Empirical Bayes moderated t-tests limma-style (Smyth 2004) with exact replication Numba-accelerated (10–100× faster)
Memory-efficient chunked analysis Automatic fallback for >1M probes <4 GB RAM typical
Cell-type deconvolution Reference-based NNLS (Houseman/Horvath-style) Parallel joblib
DMR discovery Sliding-window clustering + gap merging Vectorized
Gene annotation & pathway enrichment IntervalTree + Fisher’s exact (FDR) Sub-second on 450k/EPIC
Coordinate liftover (hg19 ↔ hg38) pyliftover integration Per-region tracking
Biomarker panel discovery & validation RF / Elastic Net + stratified CV Built-in
Robust preprocessing & QC Missingness, group representation, imputation Production-safe

Fully supports Illumina 450K, EPIC (850K), and any custom CpG × sample matrix.

Quick Start

pip install "dmeth[full]"
import pandas as pd

from dmeth.io.readers import load_methylation_data
from dmeth.core.analysis.preparation import filter_cpgs_by_missingness, impute_missing_values
from dmeth.core.analysis.validation import build_design, validate_contrast
from dmeth.core.analysis.core_analysis import fit_differential
from dmeth.core.downstream.annotation import find_dmrs_by_sliding_window

# 1. Load your data
# beta: CpG x samples matrix
# pheno: sample metadata with a 'group' column
beta = pd.read_csv("beta_matrix.csv", index_col=0)
pheno = pd.read_csv("phenotype.csv", index_col=0)

# 2. Preprocessing
# Drop CpGs with too much missingness
beta_clean, _, _ = filter_cpgs_by_missingness(beta, max_missing_rate=0.2)

# Impute remaining missing values (kNN)
beta_imp = impute_missing_values(beta_clean, method="knn", k=10)

# 3. Differential analysis (case vs control)
# Build design matrix from phenotype
design = validate_design(pheno["group"])
contrast = validate_contrast(design, "case-control")

# Fit
res = fit_differential(
    M=beta_imp,
    design=pd.DataFrame(design, index=beta_imp.columns),
    contrast=contrast,
    shrink="smyth",
    robust=True,
)

# 4. Discover DMRs
annotation = pd.read_csv("cpg_annotation.csv", index_col=0)  # must include chr, pos columns
dmrs = find_dmrs_by_sliding_window(
    dms=res[res["padj"] < 0.05],
    annotation=annotation,
    max_gap=500,
    min_cpgs=3,
)

print(f"Found {len(dmrs)} DMRs")
print(dmrs.head())

Installation

# Minimal (no speed, annotation, and other extras)
pip install dmeth

# Recommended: full scientific environment
pip install "dmeth[full]"

# Development
pip install "dmeth[full,dev]"

Optional extras (dmeth[full]):

  • speed: numba, combat (highly recommended)
  • annotation: intervaltree, pyliftover
  • parallel: joblib
  • format: PyYAML, toml, h5py, xlrd
  • plotting: plotly, umap-learn
  • io: pyarrow, tables, openpyxl, xlsxwriter

Optional dev extras (dmeth[dev]):

pytest, pytest-cov, black, isort, flake8, flake8-pyproject, flake8-bugbear, bandit, mkdocs, mkdocs-material

Documentation

Full documentation with tutorials, API reference, and reproducibility examples: User Guide

Citation

If you use dmeth in your research, please cite:

@software{dmeth2025,
  author = {Afolabi, Dare},
  title = {dmeth: A comprehensive Python toolkit for differential DNA methylation analysis with empirical Bayes moderation and biomarker discovery},
  version = {0.2.0},
  year = {2025},
  publisher = {GitHub},
  doi = {10.5281/zenodo.17777501},
  url = {https://doi.org/10.5281/zenodo.17777501},
}

References

  • Smyth, G. K. (2004). Linear models and empirical bayes methods for assessing differential expression in microarray experiments. Statistical Applications in Genetics and Molecular Biology, 3(1).
  • Liu, P., & Hwang, J.T.G. (2007). Quick calculation for sample size while controlling false discovery rate with application to microarray analysis. Bioinformatics, 23(6), 739–746.
  • Du, P., Zhang, X., Huang, C.-C., Jafari, N., Kibbe, W.A., Hou, L., & Lin, S. (2010). Comparison of Beta-value and M-value methods for quantifying methylation levels by microarray analysis. BMC Bioinformatics, 11:587.
  • Jung, S.H., Young, S.S. (2012). Power and sample size calculation for microarray studies. Journal of Biopharmaceutical Statistics, 22(1):30-42.
  • Phipson, B. et al. (2016). missMethyl: an R package for analyzing data from Illumina’s HumanMethylation450 platform. Bioinformatics, 32(2), 286-288.

Support

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dmeth-0.2.0.tar.gz (99.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dmeth-0.2.0-py3-none-any.whl (88.9 kB view details)

Uploaded Python 3

File details

Details for the file dmeth-0.2.0.tar.gz.

File metadata

  • Download URL: dmeth-0.2.0.tar.gz
  • Upload date:
  • Size: 99.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for dmeth-0.2.0.tar.gz
Algorithm Hash digest
SHA256 0abe42b97ae66c46c4d80ecfbfc2f22c66414b2c3b68d6c0132228f9eea5e2a5
MD5 02dfe69e12386499b8ee2ffb3b857593
BLAKE2b-256 b5fdd70a01c1dfdc26d73d6ad6dba840a25a4e2df7630b540fb2d13c9b753b0b

See more details on using hashes here.

File details

Details for the file dmeth-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: dmeth-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 88.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for dmeth-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 d32ee1b1a88f05b4eaeeeb949e12a00d1a6bf0edc504fed4e3fae42dd1d46339
MD5 e8197b57d0f6cd38e73291a9c3b0aef1
BLAKE2b-256 bab2e67235e562037078af63780d4affa27057eb24162d24591aecfb5b920cec

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page