Skip to main content

A High-Performance Unified Downstream DNA Methylation & Biological Aging Analysis Suite in Python

Project description

EpiChronos

PyPI Version License: MIT Python Version

EpiChronos is a high-performance, unified downstream DNA methylation and biological aging analysis suite. Written in memory-efficient, multi-threaded Python (built on Polars and NumPy/SciPy), EpiChronos overcomes the memory barriers and platform fragmentation of traditional R-Bioconductor pipelines, providing a scalable solution that integrates microarray, sequencing (WGBS, RRBS, EM-seq), and long-read methylation data in a single tool.


๐Ÿš€ Key Features

  • Unified Multi-Platform Support: Read and align standard Bismark .cov sequencing files, array-based beta matrices, and coordinate datasets seamlessly into a common coordinate-centric format.
  • Order-of-Magnitude Performance Gains: Leverage a fully multi-threaded Polars data engine to run DML/DMR calling on millions of cytosines in seconds, bypassing the R-Bioconductor memory wall.
  • Vectorized Welch's t-test with Welchโ€“Satterthwaite df: High-fidelity statistical comparisons between phenotypic cohorts using highly parallelized matrix algebra and rigorous degrees of freedom to avoid pooled-variance errors.
  • Assembly-Aware Epigenetic Aging Clocks: Built-in Horvath, Hannum, and Pacemaker clock calculations with dynamic coordinate liftover mapping (GRCh37/GRCh38) powered by pyliftover to prevent silent coordinate mismatches.
  • Robust Missing-Value Imputation: High-fidelity cohort-mean and standard public reference-mean imputation for missing CpGs in sparse sequencing samples, resolving a major bottleneck where missing sites cause clock calculations to crash.
  • Hypergeometric Pathway Enrichment (MSigDB Hallmarks): High-speed overrepresentation analysis using Legally compliant, CC BY 4.0 licensed MSigDB Hallmark gene sets.
  • Interactive Standalone HTML Reports: Compile quality control, global PCA projections, Volcano plots of differentially methylated loci (DMLs), and epigenetic age acceleration graphs into a single shareable interactive dashboard.

๐Ÿ“ฆ Installation

To install EpiChronos in development mode:

git clone https://github.com/Rashidmstar12/EpiChronos.git
cd EpiChronos
pip install -e .

Dependencies

EpiChronos is built to be extremely lightweight and requires only:

  • polars >= 0.20.0 (for high-speed lazy-evaluated dataframes)
  • numpy >= 1.24.0 (for vectorized math)
  • scipy >= 1.10.0 (for statistical distributions)
  • plotly >= 5.14.0 (for interactive visualization)
  • pyarrow >= 12.0.0 (for Arrow memory management)
  • pyliftover >= 0.6.1 (for dynamic assembly liftover translation)

โšก Quick Start

Analyze a full sequencing cohort in under 15 lines of Python:

import epichronos as ec

# 1. Load and align sequencing samples by genomic coordinates
samples = ["Ctrl_1", "Ctrl_2", "Treat_1", "Treat_2"]
filepaths = [f"data/{s}.cov" for s in samples]
metadata = {"Ctrl_1": "Young", "Ctrl_2": "Young", "Treat_1": "Old", "Treat_2": "Old"}

dataset = ec.load_bismark_coverage(filepaths, samples, min_cov=5)
dataset.metadata = metadata

# 2. Call Differentially Methylated Loci & Regions (DMLs / DMRs)
dml_df = ec.call_dmls(dataset, ["Ctrl_1", "Ctrl_2"], ["Treat_1", "Treat_2"])
dmr_df = ec.call_dmrs(dml_df, p_cutoff=0.05, max_dist=1000, min_sites=3)

# 3. Calculate Epigenetic Biological Age (Horvath Clock)
true_ages = {"Ctrl_1": 22.0, "Ctrl_2": 26.0, "Treat_1": 60.0, "Treat_2": 65.0}
clock_df = ec.calculate_biological_age(dataset, clock_name="horvath", chronological_ages=true_ages)

# 4. Export a premium interactive HTML report
ec.generate_report(dataset, dml_df, dmr_df, clock_df, "epichronos_dashboard.html")

๐Ÿ’พ Memory Efficiency & RAM Benchmarks

By storing aligned coordinates in memory-efficient Apache Arrow columnar buffers via Polars, EpiChronos eliminates the boxing overhead of Python objects and the R garbage collector. This enables comprehensive analysis of whole-genome datasets on a standard consumer laptop.

Estimated RAM Footprint (Single File Ingestion)

  • Microarray Data (EPIC v2 / EPIC / 450K) (~930k sites): ~35 MB โ€“ 50 MB of RAM
  • Reduced Representation Sequencing (RRBS) (~2M sites): ~80 MB โ€“ 120 MB of RAM
  • Whole Genome Sequencing (WGBS) / Nanopore (~28M sites, 1.5 GB file on disk):
    • Unfiltered (Full Genome): ~1.0 GB โ€“ 1.2 GB of RAM
    • With Coverage Filtering (min_cov=5): ~500 MB โ€“ 700 MB of RAM

๐Ÿ“Š In-Memory Scaling vs. R-Bioconductor

To load and align a single Whole-Genome Bisulfite Sequencing (WGBS) sample (28 million CpGs):

Pipeline / Tool Backend Data Structure RAM Usage (1 WGBS Sample)
Traditional R (bsseq / minfi) R / S4 Objects Fragmented boxed vectors 6.0 GB โ€“ 12.0 GB (Often hits the memory wall)
EpiChronos v0.2.0 Python / Polars / Arrow Contiguous native Arrow buffers 0.5 GB โ€“ 1.2 GB (Order-of-magnitude reduction)

๐Ÿ“ Pipeline Blueprint

Raw Methylation Input
  โ”œโ”€โ”€ Bismark Coverage (.cov)
  โ”œโ”€โ”€ Microarray Beta-Value Matrix
  โ””โ”€โ”€ Long-Read bedGraph
       โ”‚
       โ–ผ
 epichronos.core.MethylationDataset (Polars-aligned coordinate framework)
  โ”œโ”€โ”€ filter_by_coverage()
  โ”œโ”€โ”€ filter_by_variance()
  โ””โ”€โ”€ impute_missing()
       โ”‚
       โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
       โ–ผ                              โ–ผ
epichronos.stats               epichronos.clocks
  โ”œโ”€โ”€ call_dmls()                โ”œโ”€โ”€ calculate_biological_age()
  โ””โ”€โ”€ call_dmrs()                โ””โ”€โ”€ (Cohort & Ref-mean Imputation)
       โ”‚                              โ”‚
       โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                      โ–ผ
               epichronos.viz
                 โ””โ”€โ”€ generate_report() -> Standalone HTML Report

๐Ÿ›ก๏ธ License

Distributed under the MIT License. See LICENSE for more information.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

epichronos-0.1.2.tar.gz (52.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

epichronos-0.1.2-py3-none-any.whl (44.3 kB view details)

Uploaded Python 3

File details

Details for the file epichronos-0.1.2.tar.gz.

File metadata

  • Download URL: epichronos-0.1.2.tar.gz
  • Upload date:
  • Size: 52.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for epichronos-0.1.2.tar.gz
Algorithm Hash digest
SHA256 561e54b5c04cc3b3b2ddad864463dd507fd392f891e483dfde0b6eb58206a039
MD5 b81d8ddbb88edd5fa0bb248e55c8beee
BLAKE2b-256 2fd694f3934c5e8b96536ad97caab8049abf9e9faf19ccd2ebe90e05fe9d3b6a

See more details on using hashes here.

Provenance

The following attestation bundles were made for epichronos-0.1.2.tar.gz:

Publisher: publish.yml on Rashidmstar12/EpiChronos

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file epichronos-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: epichronos-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 44.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for epichronos-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 81c5447e64836714f48d7cfc411670500271971f979824857ab03707bea28c5f
MD5 70a783e5ca076b9fdaaa0df9e7be1fdc
BLAKE2b-256 a9c4085df513150bba88d290ef66fdbfa2dc1bc81ff4ebae1e6050a487ec1a35

See more details on using hashes here.

Provenance

The following attestation bundles were made for epichronos-0.1.2-py3-none-any.whl:

Publisher: publish.yml on Rashidmstar12/EpiChronos

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page