No project description provided
Project description
DENSECALL2
DenseCall2: de novo base-calling of DNA modifications using nanopore sequencing
Contents
Overview
DenseCall2 is an updated base-caller built on an optimised Conformer architecture for nanopore-signal processing, enabling simultaneous base-calling and modification detection.
Requirements
Hardware
- RAM: 2 GB minimum; 16 GB or more recommended.
- CPU: 4 cores minimum, ≥ 2.3 GHz per core.
- GPU: NVIDIA RTX 4090 or newer (required for DenseCall2).
Benchmarks were collected on an ASUSTeK SVR TS700-E9-RS8 workstation
(Xeon Silver 4214 @ 2.20 GHz, 64 GB RAM, RTX 4090 24 GB).
Software
Supported operating systems
- Linux: Ubuntu 22.04 or newer.
- Windows and macOS are not yet supported.
Python
- Version 3.10 or higher is required.
Installation
Densecall2
First, set up a new environment and install the necessary Python packages using conda and pip:
# 1.Create a new conda environment
conda create -n densecall python=3.10 -y
conda activate densecall
# 2. Download and install Densecall2 from source
git clone https://github.com/LuChenLab/DENSECALL2.git
cd DENSECALL2
pip install -r requirements.txt
python setup.py develop
# Or install Densecall2 package from PyPI
pip install densecall
### 3. Basecalling of FAST5 files
After installing Densecall2, download the pre-trained models for human-specific models from [Pre-trained basecalling models](). Available models include `dna_r9.4.1_hac@v1.0.tar.gz` for r9.4.1 data and `dna_r10.4.1_hac@v1.0.tar.gz` for r10.4.1 data.
Densecall2 provides a method for transforming `.fast5` files into `.sam` format. Follow the commands below to perform basecalling:
# Activate the Densecall2 conda environment
conda activate densecall2
# Navigate to the directory where you want to download the models
cd /path/to/Densecall2/densecall/models/
# Download and extract the models
# Note: Ensure you have already downloaded the .tar.gz files to this directory
tar -xzvf dna_r9.4.1_hac@v1.0.tar.gz
# Perform basecalling on the .fast5 files to generate .sam files
densecall basecaller dna_r9.4.1_hac@v1.0 /path/to/fast5_data/ --mod --chunksize 12000 --overlap 600 --reference chr22.mmi --recursive --alignment-threads 12 >mod.sam \
Conformer Models
The densecall.conformer package requires flash-attn.
To install flash-attn, run the following command:
pip install flash-attn==2.8.3 --no-build-isolation --no-cache-dir
(optional) Training your own basecalling model
densecall2 train - train a densecall2 model.
To train a model using your own reads, first get trained model from Remora.
remora model download
densecall basecaller dna_r10.4.1_e8.2_400bps_hac@v3.5.2 ./chr1_fast5 --batchsize 64 --chunksize 5000 \
--reference chr1.mmi --recursive --save-ctc --min-accuracy-save-ctc 0.9 \
--alphabet NACZGT \
--modified-codes Z \
--modified-base-model /path/to/dna_r10.4.1_e8.2_400bps_hac_v3.5.1_5mc_CG_v2.pt \
--max-reads 100000 --overlap 100 >r10_train_data/test.sam
Training a new model from scratch.
densecall train test --directory r10_train_data/ -f --batch 64 --epochs 30 \
--no-quantile-grad-clip --lr 0.002 --alphabet NACZGT \
--config conformer.toml --new --compile
All training calls use Automatic Mixed Precision to speed up training.
This must be manually installed as the flash-attn packaging system prevents it from being listed as a normal dependency.
Downstream Analysis
The results were analyzed using the ONT tool modkit, which processes BAM files containing MM/ML tags to generate comprehensive statistical reports. This study specifically employed modkit's "validate" and "pileup" functions.
Citing
A pre-print is going to be uploaded soon.
License
GNU General Public License v3.0
Acknowledgements
We thank Bonito for providing the source code. DenseCall2 is developed on the basic framework of Bonito's code. (The parts of save-ctc and converting outputs of the Conformer-based model to modcall sequences are revised based on Bonito's code following it's License.)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file densecall-0.0.2.6.4-py3-none-any.whl.
File metadata
- Download URL: densecall-0.0.2.6.4-py3-none-any.whl
- Upload date:
- Size: 104.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
24d1d476d1795b2517ecc0d5b9723be44826ef64a1b2262006df17f0e4f689fa
|
|
| MD5 |
988ea808e8344793ebbefe7fe0e68b73
|
|
| BLAKE2b-256 |
766b7e12a31d2fb25f0701601c2c837a49b36edd605d14e1ebbf0420c09fda4f
|