Skip to main content

No project description provided

Project description

DENSECALL2

DenseCall2: de novo base-calling of modifications using nanopore sequencing

Contents

Overview

DenseCall2 is an updated base-caller built on an optimised Conformer architecture for nanopore-signal processing, enabling simultaneous base-calling and modification detection.

image1

Requirements

Hardware

  • RAM: 2 GB minimum; 16 GB or more recommended.
  • CPU: 4 cores minimum, ≥ 2.3 GHz per core.
  • GPU: NVIDIA RTX 4090 or newer (required for DenseCall2).

Benchmarks were collected on an ASUSTeK SVR TS700-E9-RS8 workstation
(Xeon Silver 4214 @ 2.20 GHz, 64 GB RAM, RTX 4090 24 GB).

Software

Supported operating systems

  • Linux: Ubuntu 22.04 or newer.
  • Windows and macOS are not yet supported.

Python

  • Version 3.10 or higher is required.

Installation

Densecall2

First, set up a new environment and install the necessary Python packages using conda and pip:

# 1.Create a new conda environment
conda create -n densecall python=3.10 -y
conda activate densecall

# 2. install Densecall2 package from PyPI
pip install densecall

# Or download and install Densecall2 from source

git clone https://github.com/LuChenLab/DENSECALL2.git
cd DENSECALL2
pip install -r requirements.txt
python setup.py develop


# 3. To install flash-attn, run the following command

pip install flash-attn==2.8.3 --no-build-isolation --no-cache-dir

Basecalling

Modcall

After installing Densecall2, download the pre-trained models for human-specific models from Pre-trained basecalling models. Available models include dna_r9.4.1_hac_CG@v1.0.tar.gz for r9.4.1 data and dna_r10.4.1_hac_CG@v1.0.tar.gz for r10.4.1 data.

Densecall2 provides a method for transforming .fast5 or .pod5 files into .sam format. Follow the commands below to perform basecalling:

# Activate the Densecall2 conda environment
conda activate densecall

# Download and extract the models
tar -xzvf dna_r9.4.1_hac_CG@v1.0.tar.gz 

# Perform basecalling on the .fast5 files to generate .sam files
densecall basecaller dna_r9.4.1_hac_CG@v1.0 /path/to/signal/ \
--mod --chunksize 12000 --overlap 600 \
--reference chr22.mmi  --recursive --alignment-threads 12 >mod.sam 
                     

Normal basecall

without --mod option, the basecalling process is the same as normal basecalling.

densecall basecaller dna_r9.4.1_hac_CG@v1.0 /path/to/signal/ \
--chunksize 12000 --overlap 600 \
--recursive >result.fq

(optional) Training your own basecalling model

densecall train - train a densecall2 model.

To train a model using your own reads, first get trained model from Remora.

remora model download 
densecall basecaller  dna_r10.4.1_e8.2_400bps_hac@v3.5.2 ./chr1_fast5 --batchsize 64 --chunksize 5000 \
--reference chr1.mmi  --recursive --save-ctc --min-accuracy-save-ctc 0.9 \
--alphabet NACZGT \
--modified-codes Z \
--modified-base-model /path/to/dna_r10.4.1_e8.2_400bps_hac_v3.5.1_5mc_CG_v2.pt \
--max-reads 100000 --overlap 100 >r10_train_data/test.sam

Training a new model from scratch.

densecall train test  --directory r10_train_data/ -f --batch 64  --epochs 30  \
--no-quantile-grad-clip --lr 0.002    --alphabet NACZGT \
--config conformer.toml   --new --compile

All training calls use Automatic Mixed Precision to speed up training.

This must be manually installed as the flash-attn packaging system prevents it from being listed as a normal dependency.

Downstream Analysis

The results were analyzed using the ONT tool modkit, which processes BAM files containing MM/ML tags to generate comprehensive statistical reports. This study specifically employed modkit's "validate" and "pileup" functions.

Citing

A pre-print is going to be uploaded soon.

License

...

Acknowledgements

We thank Bonito for providing the source code. DenseCall2 is developed on the basic framework of Bonito's code. (The parts of save-ctc and converting outputs of the Conformer-based model to modcall sequences are revised based on Bonito's code following it's License.)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

densecall-0.0.2.6.9.tar.gz (93.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

densecall-0.0.2.6.9-py3-none-any.whl (104.0 kB view details)

Uploaded Python 3

File details

Details for the file densecall-0.0.2.6.9.tar.gz.

File metadata

  • Download URL: densecall-0.0.2.6.9.tar.gz
  • Upload date:
  • Size: 93.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.9

File hashes

Hashes for densecall-0.0.2.6.9.tar.gz
Algorithm Hash digest
SHA256 c4db0e35deac81dec135d9617ce75568ad26a670dd64fcc9290a7ef4890e673c
MD5 f475454f267addd7d56a0e960202a3ca
BLAKE2b-256 e62705fb0b706a45902b8fbbc982c50d4784f3d9becfa7c68c9f4c31722e571b

See more details on using hashes here.

File details

Details for the file densecall-0.0.2.6.9-py3-none-any.whl.

File metadata

  • Download URL: densecall-0.0.2.6.9-py3-none-any.whl
  • Upload date:
  • Size: 104.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.9

File hashes

Hashes for densecall-0.0.2.6.9-py3-none-any.whl
Algorithm Hash digest
SHA256 511cd0eb9956fe9f30782b7e4bb1972b6cfbc8cc82252c48cc552d6353c43aba
MD5 7cae4c467daed9c9149c5250e9195403
BLAKE2b-256 608291871c7e88d39e89d0598041758abf97be76fc17646ab4ddf196e7a9a007

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page