Skip to main content

A High-Performance Unified Downstream DNA Methylation & Biological Aging Analysis Suite in Python

Project description

EpiChronos

PyPI Version License: MIT Python Version

EpiChronos is a high-performance, unified downstream DNA methylation and biological aging analysis suite. Written in memory-efficient, multi-threaded Python (built on Polars and NumPy/SciPy), EpiChronos overcomes the memory barriers and platform fragmentation of traditional R-Bioconductor pipelines, providing a scalable solution that integrates microarray, sequencing (WGBS, RRBS, EM-seq), and long-read methylation data in a single tool.


๐Ÿš€ Key Features

  • Unified Multi-Platform Support: Read and align standard Bismark .cov sequencing files, array-based beta matrices, and coordinate datasets seamlessly into a common coordinate-centric format.
  • Order-of-Magnitude Performance Gains: Leverage a fully multi-threaded Polars data engine to run DML/DMR calling on millions of cytosines in seconds, bypassing the R-Bioconductor memory wall.
  • Vectorized Welch's t-test with Welchโ€“Satterthwaite df: High-fidelity statistical comparisons between phenotypic cohorts using highly parallelized matrix algebra and rigorous degrees of freedom to avoid pooled-variance errors.
  • Assembly-Aware Epigenetic Aging Clocks: Built-in Horvath, Hannum, and Pacemaker clock calculations with dynamic coordinate liftover mapping (GRCh37/GRCh38) powered by pyliftover to prevent silent coordinate mismatches.
  • Robust Missing-Value Imputation: High-fidelity cohort-mean and standard public reference-mean imputation for missing CpGs in sparse sequencing samples, resolving a major bottleneck where missing sites cause clock calculations to crash.
  • Hypergeometric Pathway Enrichment (MSigDB Hallmarks): High-speed overrepresentation analysis using Legally compliant, CC BY 4.0 licensed MSigDB Hallmark gene sets.
  • Interactive Standalone HTML Reports: Compile quality control, global PCA projections, Volcano plots of differentially methylated loci (DMLs), and epigenetic age acceleration graphs into a single shareable interactive dashboard.

๐Ÿ“ฆ Installation

To install EpiChronos in development mode:

git clone https://github.com/Rashidmstar12/EpiChronos.git
cd EpiChronos
pip install -e .

Dependencies

EpiChronos is built to be extremely lightweight and requires only:

  • polars >= 0.20.0 (for high-speed lazy-evaluated dataframes)
  • numpy >= 1.24.0 (for vectorized math)
  • scipy >= 1.10.0 (for statistical distributions)
  • plotly >= 5.14.0 (for interactive visualization)
  • pyarrow >= 12.0.0 (for Arrow memory management)
  • pyliftover >= 0.6.1 (for dynamic assembly liftover translation)

โšก Quick Start

Analyze a full sequencing cohort in under 15 lines of Python:

import epichronos as ec

# 1. Load and align sequencing samples by genomic coordinates
samples = ["Ctrl_1", "Ctrl_2", "Treat_1", "Treat_2"]
filepaths = [f"data/{s}.cov" for s in samples]
metadata = {"Ctrl_1": "Young", "Ctrl_2": "Young", "Treat_1": "Old", "Treat_2": "Old"}

dataset = ec.load_bismark_coverage(filepaths, samples, min_cov=5)
dataset.metadata = metadata

# 2. Call Differentially Methylated Loci & Regions (DMLs / DMRs)
dml_df = ec.call_dmls(dataset, ["Ctrl_1", "Ctrl_2"], ["Treat_1", "Treat_2"])
dmr_df = ec.call_dmrs(dml_df, p_cutoff=0.05, max_dist=1000, min_sites=3)

# 3. Calculate Epigenetic Biological Age (Horvath Clock)
true_ages = {"Ctrl_1": 22.0, "Ctrl_2": 26.0, "Treat_1": 60.0, "Treat_2": 65.0}
clock_df = ec.calculate_biological_age(dataset, clock_name="horvath", chronological_ages=true_ages)

# 4. Export a premium interactive HTML report
ec.generate_report(dataset, dml_df, dmr_df, clock_df, "epichronos_dashboard.html")

๐Ÿ’พ Memory Efficiency & RAM Benchmarks

By storing aligned coordinates in memory-efficient Apache Arrow columnar buffers via Polars, EpiChronos eliminates the boxing overhead of Python objects and the R garbage collector. This enables comprehensive analysis of whole-genome datasets on a standard consumer laptop.

Estimated RAM Footprint (Single File Ingestion)

  • Microarray Data (EPIC v2 / EPIC / 450K) (~930k sites): ~35 MB โ€“ 50 MB of RAM
  • Reduced Representation Sequencing (RRBS) (~2M sites): ~80 MB โ€“ 120 MB of RAM
  • Whole Genome Sequencing (WGBS) / Nanopore (~28M sites, 1.5 GB file on disk):
    • Unfiltered (Full Genome): ~1.0 GB โ€“ 1.2 GB of RAM
    • With Coverage Filtering (min_cov=5): ~500 MB โ€“ 700 MB of RAM

๐Ÿ“Š In-Memory Scaling vs. R-Bioconductor

To load and align a single Whole-Genome Bisulfite Sequencing (WGBS) sample (28 million CpGs):

Pipeline / Tool Backend Data Structure RAM Usage (1 WGBS Sample)
Traditional R (bsseq / minfi) R / S4 Objects Fragmented boxed vectors 6.0 GB โ€“ 12.0 GB (Often hits the memory wall)
EpiChronos v0.2.0 Python / Polars / Arrow Contiguous native Arrow buffers 0.5 GB โ€“ 1.2 GB (Order-of-magnitude reduction)

๐Ÿ“ Pipeline Blueprint

Raw Methylation Input
  โ”œโ”€โ”€ Bismark Coverage (.cov)
  โ”œโ”€โ”€ Microarray Beta-Value Matrix
  โ””โ”€โ”€ Long-Read bedGraph
       โ”‚
       โ–ผ
 epichronos.core.MethylationDataset (Polars-aligned coordinate framework)
  โ”œโ”€โ”€ filter_by_coverage()
  โ”œโ”€โ”€ filter_by_variance()
  โ””โ”€โ”€ impute_missing()
       โ”‚
       โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
       โ–ผ                              โ–ผ
epichronos.stats               epichronos.clocks
  โ”œโ”€โ”€ call_dmls()                โ”œโ”€โ”€ calculate_biological_age()
  โ””โ”€โ”€ call_dmrs()                โ””โ”€โ”€ (Cohort & Ref-mean Imputation)
       โ”‚                              โ”‚
       โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                      โ–ผ
               epichronos.viz
                 โ””โ”€โ”€ generate_report() -> Standalone HTML Report

๐Ÿ›ก๏ธ License

Distributed under the MIT License. See LICENSE for more information.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

epichronos-0.1.0.tar.gz (52.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

epichronos-0.1.0-py3-none-any.whl (44.3 kB view details)

Uploaded Python 3

File details

Details for the file epichronos-0.1.0.tar.gz.

File metadata

  • Download URL: epichronos-0.1.0.tar.gz
  • Upload date:
  • Size: 52.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for epichronos-0.1.0.tar.gz
Algorithm Hash digest
SHA256 62491076bd1c2ddff8d0bcf3954f5391fe74a4748cb74a1c7626ec7886348706
MD5 83eadb1d849ed0041eaec0ba5dbc90bf
BLAKE2b-256 725a78369209ede70306b99e8dee4f7256f5ea2e6fac9203e0df1a5f6982ead8

See more details on using hashes here.

Provenance

The following attestation bundles were made for epichronos-0.1.0.tar.gz:

Publisher: publish.yml on Rashidmstar12/EpiChronos

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file epichronos-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: epichronos-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 44.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for epichronos-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 9034a0062f07d0c30dc2349473267dab838809e1342962a0cad7d31d02fc6a97
MD5 53c17a05e8f88278fde643bc3e07684c
BLAKE2b-256 fce648c0e94a1bfe154313c7a89cbccc6431eda8dd46d1605ee8b6de3c253ba3

See more details on using hashes here.

Provenance

The following attestation bundles were made for epichronos-0.1.0-py3-none-any.whl:

Publisher: publish.yml on Rashidmstar12/EpiChronos

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page