A deep neural network basecaller for nanopore sequencing.
Project description
Xron (ˈkairɑn) is a methylation basecaller that could identify m6A methylation modification from ONT direct RNA sequencing.
Using a deep learning CNN+RNN+CTC structure to establish end-to-end basecalling for the nanopore sequencer.
The name is inherited from Chiron
Built with PyTorch and python 3.8+
m6A-aware RNA basecall one-liner:
xron call -i <input_fast5_folder> -o <output_folder> -m models/ENEYFT --boostnano
Table of contents
Install
For either installation method, recommend to create a vritual environment first using conda or venv, take conda for example
conda create --name YOUR_VIRTUAL_ENVIRONMENT python=3.8
conda activate YOUR_VIRTUAL_ENVIRONMENT
Then you can install from our pypi repository or install the newest version from github repository.
Install
pip install xron
Xron requires at least PyTorch 1.11.0 to be installed. If you have not yet installed PyTorch, install it via guide from official repository.
Basecall
Before running basecall using Xron, you need to download the models from our AWS s3 bucket by running xron init
xron init
This will automatically download the models and put them into the models folder. We provided sample code in xron-samples folder to achieve m6A-aware basecall and identify m6A site. To run xron on raw fast5 files:
xron call -i ${INPUT_FAST5} -o ${OUTPUT} -m models/ENEYFT --fast5 --beam 50 --chunk_len 2000
Segmentation using NHMM
Prepare chunk dataset
Xron also include a non-homegeneous HMM (NHMM) for signal re-sqquigle. To use it: Firstly we need to extract the chunk and basecalled sequence using prepare module
xron prepare -i ${FAST5_FOLDER} -o ${CHUNK_FOLDER} --extract_seq --basecaller guppy --reference ${REFERENCE} --mode rna_meth --extract_kmer -k 5 --chunk_len 4000 --write_correction
Replace the FAST5_FOLDER, CHUNK_FOLDER and REFERENCE with your basecalled fast5 file folder, your output folder and the path to the reference genome fasta file.
Realign the signal using NHMM.
Then run the NHMM to realign ("resquiggle") the signal.
xron relabel -i ${CHUNK_FOLDER} -m ${MODEL} --device $DEVICE
This will generate a paths.py file under CHUNK_FOLDER which gives the kmer segmentation of the chunks.
Training
To train a new Xron model using your own dataset, you need to prepare your own training dataset, the dataset should includes a signal file (chunks.npy), labelled sequences (seqs.npy) and sequence length for each read (seq_lens.npy), and then run the xron supervised training module
xron train -i chunks.npy --seq seqs.npy --seq_len seq_lens.npy --model_folder OUTPUT_MODEL_FOLDER
Training Xron model from scratch is hard, I would recommend to fine-tune our model by specify --load flag, for example we can finetune the provided ENEYFT model (model trained using cross-linked ENE dataset and finetuned on Yeast dataset):
xron train -i chunks.npy --seq seqs.npy --seq_len seq_lens.npy --model_folder models/ENEYFT --load
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file xron-1.0.7.tar.gz
.
File metadata
- Download URL: xron-1.0.7.tar.gz
- Upload date:
- Size: 134.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.8.0 pkginfo/1.9.6 readme-renderer/34.0 requests/2.27.1 requests-toolbelt/1.0.0 urllib3/1.26.18 tqdm/4.64.1 importlib-metadata/4.8.3 keyring/23.4.1 rfc3986/1.5.0 colorama/0.4.5 CPython/3.6.10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 75980d86776433214edf49b814200a126a2c60d6ecfab895212467f492dd0ec8 |
|
MD5 | d37203d2a3c35835f701c07b8bbafa23 |
|
BLAKE2b-256 | aa50182c84813b05518fc040b3fab2067a3254cf90d49ff087e6d024ef65cef2 |