A High-Performance Unified Downstream DNA Methylation & Biological Aging Analysis Suite in Python
Project description
EpiChronos
EpiChronos is a high-performance, unified downstream DNA methylation and biological aging analysis suite. Written in memory-efficient, multi-threaded Python (built on Polars and NumPy/SciPy), EpiChronos overcomes the memory barriers and platform fragmentation of traditional R-Bioconductor pipelines, providing a scalable solution that integrates microarray, sequencing (WGBS, RRBS, EM-seq), and long-read methylation data in a single tool.
๐ Key Features
- Unified Multi-Platform Support: Read and align standard Bismark
.covsequencing files, array-based beta matrices, and coordinate datasets seamlessly into a common coordinate-centric format. - Order-of-Magnitude Performance Gains: Leverage a fully multi-threaded Polars data engine to run DML/DMR calling on millions of cytosines in seconds, bypassing the R-Bioconductor memory wall.
- Vectorized Welch's t-test with WelchโSatterthwaite df: High-fidelity statistical comparisons between phenotypic cohorts using highly parallelized matrix algebra and rigorous degrees of freedom to avoid pooled-variance errors.
- Assembly-Aware Epigenetic Aging Clocks: Built-in Horvath, Hannum, and Pacemaker clock calculations with dynamic coordinate liftover mapping (GRCh37/GRCh38) powered by
pyliftoverto prevent silent coordinate mismatches. - Robust Missing-Value Imputation: High-fidelity cohort-mean and standard public reference-mean imputation for missing CpGs in sparse sequencing samples, resolving a major bottleneck where missing sites cause clock calculations to crash.
- Hypergeometric Pathway Enrichment (MSigDB Hallmarks): High-speed overrepresentation analysis using Legally compliant, CC BY 4.0 licensed MSigDB Hallmark gene sets.
- Interactive Standalone HTML Reports: Compile quality control, global PCA projections, Volcano plots of differentially methylated loci (DMLs), and epigenetic age acceleration graphs into a single shareable interactive dashboard.
๐ฆ Installation
To install EpiChronos in development mode:
git clone https://github.com/Rashidmstar12/EpiChronos.git
cd EpiChronos
pip install -e .
Dependencies
EpiChronos is built to be extremely lightweight and requires only:
polars >= 0.20.0(for high-speed lazy-evaluated dataframes)numpy >= 1.24.0(for vectorized math)scipy >= 1.10.0(for statistical distributions)plotly >= 5.14.0(for interactive visualization)pyarrow >= 12.0.0(for Arrow memory management)pyliftover >= 0.6.1(for dynamic assembly liftover translation)
โก Quick Start
Analyze a full sequencing cohort in under 15 lines of Python:
import epichronos as ec
# 1. Load and align sequencing samples by genomic coordinates
samples = ["Ctrl_1", "Ctrl_2", "Treat_1", "Treat_2"]
filepaths = [f"data/{s}.cov" for s in samples]
metadata = {"Ctrl_1": "Young", "Ctrl_2": "Young", "Treat_1": "Old", "Treat_2": "Old"}
dataset = ec.load_bismark_coverage(filepaths, samples, min_cov=5)
dataset.metadata = metadata
# 2. Call Differentially Methylated Loci & Regions (DMLs / DMRs)
dml_df = ec.call_dmls(dataset, ["Ctrl_1", "Ctrl_2"], ["Treat_1", "Treat_2"])
dmr_df = ec.call_dmrs(dml_df, p_cutoff=0.05, max_dist=1000, min_sites=3)
# 3. Calculate Epigenetic Biological Age (Horvath Clock)
true_ages = {"Ctrl_1": 22.0, "Ctrl_2": 26.0, "Treat_1": 60.0, "Treat_2": 65.0}
clock_df = ec.calculate_biological_age(dataset, clock_name="horvath", chronological_ages=true_ages)
# 4. Export a premium interactive HTML report
ec.generate_report(dataset, dml_df, dmr_df, clock_df, "epichronos_dashboard.html")
๐พ Memory Efficiency & RAM Benchmarks
By storing aligned coordinates in memory-efficient Apache Arrow columnar buffers via Polars, EpiChronos eliminates the boxing overhead of Python objects and the R garbage collector. This enables comprehensive analysis of whole-genome datasets on a standard consumer laptop.
Estimated RAM Footprint (Single File Ingestion)
- Microarray Data (EPIC v2 / EPIC / 450K) (~930k sites): ~35 MB โ 50 MB of RAM
- Reduced Representation Sequencing (RRBS) (~2M sites): ~80 MB โ 120 MB of RAM
- Whole Genome Sequencing (WGBS) / Nanopore (~28M sites, 1.5 GB file on disk):
- Unfiltered (Full Genome): ~1.0 GB โ 1.2 GB of RAM
- With Coverage Filtering (
min_cov=5): ~500 MB โ 700 MB of RAM
๐ In-Memory Scaling vs. R-Bioconductor
To load and align a single Whole-Genome Bisulfite Sequencing (WGBS) sample (28 million CpGs):
| Pipeline / Tool | Backend | Data Structure | RAM Usage (1 WGBS Sample) |
|---|---|---|---|
Traditional R (bsseq / minfi) |
R / S4 Objects | Fragmented boxed vectors | 6.0 GB โ 12.0 GB (Often hits the memory wall) |
| EpiChronos v0.2.0 | Python / Polars / Arrow | Contiguous native Arrow buffers | 0.5 GB โ 1.2 GB (Order-of-magnitude reduction) |
๐ Pipeline Blueprint
Raw Methylation Input
โโโ Bismark Coverage (.cov)
โโโ Microarray Beta-Value Matrix
โโโ Long-Read bedGraph
โ
โผ
epichronos.core.MethylationDataset (Polars-aligned coordinate framework)
โโโ filter_by_coverage()
โโโ filter_by_variance()
โโโ impute_missing()
โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โผ โผ
epichronos.stats epichronos.clocks
โโโ call_dmls() โโโ calculate_biological_age()
โโโ call_dmrs() โโโ (Cohort & Ref-mean Imputation)
โ โ
โโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโ
โผ
epichronos.viz
โโโ generate_report() -> Standalone HTML Report
๐ก๏ธ License
Distributed under the MIT License. See LICENSE for more information.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file epichronos-0.1.0.tar.gz.
File metadata
- Download URL: epichronos-0.1.0.tar.gz
- Upload date:
- Size: 52.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
62491076bd1c2ddff8d0bcf3954f5391fe74a4748cb74a1c7626ec7886348706
|
|
| MD5 |
83eadb1d849ed0041eaec0ba5dbc90bf
|
|
| BLAKE2b-256 |
725a78369209ede70306b99e8dee4f7256f5ea2e6fac9203e0df1a5f6982ead8
|
Provenance
The following attestation bundles were made for epichronos-0.1.0.tar.gz:
Publisher:
publish.yml on Rashidmstar12/EpiChronos
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
epichronos-0.1.0.tar.gz -
Subject digest:
62491076bd1c2ddff8d0bcf3954f5391fe74a4748cb74a1c7626ec7886348706 - Sigstore transparency entry: 1631980651
- Sigstore integration time:
-
Permalink:
Rashidmstar12/EpiChronos@120f3aa4795d290c621c4535d4ec22ddeda99726 -
Branch / Tag:
refs/tags/v0.1.2 - Owner: https://github.com/Rashidmstar12
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@120f3aa4795d290c621c4535d4ec22ddeda99726 -
Trigger Event:
push
-
Statement type:
File details
Details for the file epichronos-0.1.0-py3-none-any.whl.
File metadata
- Download URL: epichronos-0.1.0-py3-none-any.whl
- Upload date:
- Size: 44.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9034a0062f07d0c30dc2349473267dab838809e1342962a0cad7d31d02fc6a97
|
|
| MD5 |
53c17a05e8f88278fde643bc3e07684c
|
|
| BLAKE2b-256 |
fce648c0e94a1bfe154313c7a89cbccc6431eda8dd46d1605ee8b6de3c253ba3
|
Provenance
The following attestation bundles were made for epichronos-0.1.0-py3-none-any.whl:
Publisher:
publish.yml on Rashidmstar12/EpiChronos
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
epichronos-0.1.0-py3-none-any.whl -
Subject digest:
9034a0062f07d0c30dc2349473267dab838809e1342962a0cad7d31d02fc6a97 - Sigstore transparency entry: 1631980673
- Sigstore integration time:
-
Permalink:
Rashidmstar12/EpiChronos@120f3aa4795d290c621c4535d4ec22ddeda99726 -
Branch / Tag:
refs/tags/v0.1.2 - Owner: https://github.com/Rashidmstar12
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@120f3aa4795d290c621c4535d4ec22ddeda99726 -
Trigger Event:
push
-
Statement type: