Skip to main content

unimeth

Project description

Unimeth: A Unified Transformer Framework for DNA Methylation Detection from Nanopore Reads

License Python DOI

描述文字 Unimeth is a unified deep learning framework for accurate and efficient detection of DNA methylation (5mC, 6mA) from Oxford Nanopore sequencing data. Built on a transformer-based architecture, Unimeth supports multiple sequencing chemistries (R9.4.1, R10.4.1 4kHz/5kHz), handles both plant and mammalian genomes, and achieves state-of-the-art performance across diverse genomic contexts.


🧬 Features

  • Unified Detection: Simultaneously detects 5mC (CpG, CHG, CHH) and 6mA methylation.
  • Multi-Chemistry Support: Compatible with R9.4.1, R10.4.1 4kHz, and R10.4.1 5kHz chemistries.
  • Patch-Based Transformer: Captures contextual dependencies between neighboring methylation sites.
  • Multi-Phase Training: Pre-training, read-level fine-tuning, and site-level calibration for robust performance.
  • Low False Positive Rate: Especially effective in non-CpG contexts and low-methylation regions.
  • Easy-to-Use: Standard input/output formats (POD5 and BAM, BED).

📦 Installation

Prerequisites

  • Python 3.12+
  • Dorado for basecalling

Install from Source

git clone https://github.com/sekeyWang/unimeth.git
cd unimeth

conda create -n unimeth python=3.12
conda activate unimeth

pip install -e .

Use unimeth -v to validate it successfully installed if it shows the version.


🚀 Quick Start

1. Basecalling and Alignment

Use dorado to basecall and align your nanopore reads:

dorado basecaller --emit-moves dna_r10.4.1_e8.2_400bps_sup@v5.0.0 pod5/ > calls.bam

2. Download model checkpoints and sample data

  • Model: Download unimeth_r10.4.1_5kHz_5mC.pt from Google Drive to the checkpoints folder
  • Sample Data: Download the demo dataset using one of the following methods:
mkdir demo
pip install gdown
gdown --folder https://drive.google.com/drive/folders/1Gu7hgOQbHSUULG1MXjdE_qJ3na-6AdLi -O demo/

The demo dataset includes:

  • demo.bam - aligned reads
  • subset_18.pod5 - raw signal data

3. Methylation Calling with Unimeth

Run Unimeth to detect methylation (use --accelerator to enable multi-GPUs if available):

unimeth \
--pod5_dir demo/subset_18.pod5 \
--bam_dir demo/demo.bam \
--model_dir checkpoints/unimeth_r10.4.1_5kHz_5mC.pt \
--out_dir results/arab.bed \
--cpg 1 \
--chg 1 \
--chh 1 \
--batch_size 256 \
--pore_type R10.4.1 \
--frequency 5khz \
--dorado_version 0.71

3. Output

Unimeth outputs read-level methylation calls in tsv format. A sample output is as follows:

Chromosome Ref pos Strand Dorado pred Read id Read pos Motif tyle Pred positive Pred negative Pred(0/1) .
Chr2 15338477 - 9 28752a76-7007-40d7-8ede-f2939fe2ab26 0 [CpG] 0.985 0.014 0 .
Chr2 15338471 - 5 28752a76-7007-40d7-8ede-f2939fe2ab26 6 [CHG] 0.990 0.009 0 .
Chr2 15338465 - 6 28752a76-7007-40d7-8ede-f2939fe2ab26 12 [CHH] 0.998 0.001 0 .
Chr2 15338462 - -1 28752a76-7007-40d7-8ede-f2939fe2ab26 15 [CHH] 0.998 0.001 0 .
Chr2 15338457 - -1 28752a76-7007-40d7-8ede-f2939fe2ab26 20 [CHH] 0.999 0.000 0 .

🧪 Models

We provide pre-trained models for:

  • Plant 5mC (R10.4.1 5kHz, R9.4.1)
  • Human CpG (R10.4.1 5kHz/4kHz, R9.4.1)
  • 6mA Detection (R10.4.1)

Download models from the Google Drive page.


📊 Performance Highlights

描述文字

  • Outperforms DeepPlant, Dorado, Rockfish, and DeepMod2 in cross-species benchmarks.
  • Superior accuracy in repetitive regions (centromeres, transposons).
  • Lower false positive rates in CHH and 6mA contexts.
  • Robust to batch effects and unseen species.

For detailed benchmarks, see the manuscript.


📁 Input/Output Formats

Input Format Description
POD5 Raw nanopore signals
BAM Basecalled and aligned reads
Output Format Description
tsv Per-read methylation calls with modified

📚 Citation

If you use Unimeth in your research, please cite:

Wang S, Xiao Y, Sheng T, et al. Unimeth: A unified transformer framework for accurate DNA methylation detection from nanopore reads[J]. bioRxiv, 2025: 2025.12. 05.692231..


📄 License

This project is licensed under the BSD 3-Clause Clear License. See LICENSE for details.


📬 Contact

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

unimeth-0.0.2.tar.gz (15.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

unimeth-0.0.2-py3-none-any.whl (20.7 kB view details)

Uploaded Python 3

File details

Details for the file unimeth-0.0.2.tar.gz.

File metadata

  • Download URL: unimeth-0.0.2.tar.gz
  • Upload date:
  • Size: 15.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.11

File hashes

Hashes for unimeth-0.0.2.tar.gz
Algorithm Hash digest
SHA256 bad2d6dd04486d74551ddc765701c390c3541b0a4325aec7dedbe4c4ed01099a
MD5 1fca1572becfe33aadf8e4669af5cca0
BLAKE2b-256 2e28eace3cdcb18a0fe622aa887b545ecf06c2c68cbaa74f43028c56059e02c5

See more details on using hashes here.

File details

Details for the file unimeth-0.0.2-py3-none-any.whl.

File metadata

  • Download URL: unimeth-0.0.2-py3-none-any.whl
  • Upload date:
  • Size: 20.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.11

File hashes

Hashes for unimeth-0.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 0b459553aa35b63fd0318d9cf856f6ae1026088f64548412e6e7f0e3341e5d6d
MD5 e3081ddc3c0cd7105a40ccc62fec0304
BLAKE2b-256 bb1f1d59c1e34c175527a8147b0158dd9d857a5ceba893f9fad7434119e94d69

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page