Skip to main content

dmeth: A toolkit for comprehensive, transparent, and reproducible DNA methylation analysis

Project description

dmeth: Differential Methylation Analysis Toolkit

A fast, statistically rigorous Python framework providing a toolkit for DNA methylation analysis - from raw beta matrices to biomarkers and functional interpretation. dmeth implements the full modern differential methylation pipeline used in high-impact epigenome-wide association studies (EWAS), with performance and correctness on par with established R/bioconductor tools, all in pure Python.

Key Features

Feature Implementation Performance
Empirical Bayes moderated t-tests limma-style (Smyth 2004) with exact replication Numba-accelerated (10–100× faster)
Memory-efficient chunked analysis Automatic fallback for >1M probes <4 GB RAM typical
Cell-type deconvolution Reference-based NNLS (Houseman/Horvath-style) Parallel joblib
DMR discovery Sliding-window clustering + gap merging Vectorized
Gene annotation & pathway enrichment IntervalTree + Fisher’s exact (FDR) Sub-second on 450k/EPIC
Coordinate liftover (hg19 ↔ hg38) pyliftover integration Per-region tracking
Biomarker panel discovery & validation RF / Elastic Net + stratified CV Built-in
Robust preprocessing & QC Missingness, group representation, imputation Production-safe

Fully supports Illumina 450K, EPIC (850K), and any custom CpG × sample matrix.

Quick Start

pip install "dmeth[full]"
import pandas as pd

from dmeth.io.readers import load_methylation_data
from dmeth.core.analysis.preparation import filter_cpgs_by_missingness, impute_missing_values
from dmeth.core.analysis.validation import validate_design, validate_contrast
from dmeth.core.analysis.core_analysis import fit_differential
from dmeth.core.downstream.annotation import find_dmrs_by_sliding_window

# 1. Load your data
# beta: CpG x samples matrix
# pheno: sample metadata with a 'group' column
beta = pd.read_csv("beta_matrix.csv", index_col=0)
pheno = pd.read_csv("phenotype.csv", index_col=0)

# 2. Preprocessing
# Drop CpGs with too much missingness
beta_clean, _, _ = filter_cpgs_by_missingness(beta, max_missing_rate=0.2)

# Impute remaining missing values (kNN)
beta_imp = impute_missing_values(beta_clean, method="knn", k=10)

# 3. Differential analysis (case vs control)
# Build design matrix from phenotype
design = validate_design(pheno["group"])
contrast = validate_contrast(design, "case-control")

# Fit
res = fit_differential(
    M=beta_imp,
    design=pd.DataFrame(design, index=beta_imp.columns),
    contrast=contrast,
    shrink="smyth",
    robust=True,
)

# 4. Discover DMRs
annotation = pd.read_csv("cpg_annotation.csv", index_col=0)  # must include chr, pos columns
dmrs = find_dmrs_by_sliding_window(
    dms=res[res["padj"] < 0.05],
    annotation=annotation,
    max_gap=500,
    min_cpgs=3,
)

print(f"Found {len(dmrs)} DMRs")
print(dmrs.head())

Installation

# Minimal (no speed, annotation, and other extras)
pip install dmeth

# Recommended: full scientific environment
pip install "dmeth[full]"

# Development
pip install "dmeth[full,dev]"

Optional extras (dmeth[full]):

  • speed: numba, combat (highly recommended)
  • annotation: intervaltree, pyliftover
  • parallel: joblib
  • format: PyYAML, toml, h5py, xlrd
  • plotting: plotly, umap-learn
  • io: pyarrow, tables, openpyxl, xlsxwriter

Optional dev extras (dmeth[dev]):

pytest, pytest-cov, black, isort, flake8, flake8-pyproject, flake8-bugbear, bandit, mypy, mkdocs, mkdocs-material

Documentation

Full documentation with tutorials, API reference, and reproducibility examples: User Guide

Citation

If you use dmeth in your research, please cite:

@software{dmeth2025,
  author = {Afolabi, Dare},
  title = {dmeth: A comprehensive Python toolkit for differential DNA methylation analysis with empirical Bayes moderation and biomarker discovery},
  version = {0.1.1},
  year = {2025},
  publisher = {GitHub},
  url = {https://github.com/dare-afolabi/dmeth},
  doi = {10.5281/zenodo.17684038},
}

References

  • Smyth, G. K. (2004). Linear models and empirical bayes methods for assessing differential expression in microarray experiments. Statistical Applications in Genetics and Molecular Biology, 3(1).
  • Liu, P., & Hwang, J.T.G. (2007). Quick calculation for sample size while controlling false discovery rate with application to microarray analysis. Bioinformatics, 23(6), 739–746.
  • Du, P., Zhang, X., Huang, C.-C., Jafari, N., Kibbe, W.A., Hou, L., & Lin, S. (2010). Comparison of Beta-value and M-value methods for quantifying methylation levels by microarray analysis. BMC Bioinformatics, 11:587.
  • Jung, S.H., Young, S.S. (2012). Power and sample size calculation for microarray studies. Journal of Biopharmaceutical Statistics, 22(1):30-42.
  • Phipson, B. et al. (2016). missMethyl: an R package for analyzing data from Illumina’s HumanMethylation450 platform. Bioinformatics, 32(2), 286-288.

Support

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dmeth-0.1.1.tar.gz (95.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dmeth-0.1.1-py3-none-any.whl (84.6 kB view details)

Uploaded Python 3

File details

Details for the file dmeth-0.1.1.tar.gz.

File metadata

  • Download URL: dmeth-0.1.1.tar.gz
  • Upload date:
  • Size: 95.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for dmeth-0.1.1.tar.gz
Algorithm Hash digest
SHA256 15f1e07449d67a4fa1f81076df6a3c2bcc57d63786533dca0f0fd1022cd9852e
MD5 5a57ace2bc71afab49d3bb3fc9cc9f6c
BLAKE2b-256 bed6741234e70afc3b0a3be11b52378a837e9593f7bc94aef37600d1ee05fa0f

See more details on using hashes here.

File details

Details for the file dmeth-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: dmeth-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 84.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for dmeth-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 60e1c5e6f93d0f6efcf6b535ab789ba59b25cf073fdb7e66164e0c301156be38
MD5 60780062a2e205970ca53dbe8fa4cc83
BLAKE2b-256 2120b1a446f114eb5ea59bb2a8c4f72ad1610aedf3b21279ede513c4f1587a01

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page