De novo construction of isoforms from long-read data
Project description
isONform- an algorithm capable of recovering isoforms from long read sequencing data
Table of contents
Installation
Dependencies
networkx
ordered-set
matplotlib
parasail
edlib
pyinstrument
namedtuple
recordclass
Installation guide
- Create a new environment for isONform (at least python 3.7 required):
conda create -n isonform python=3.10 pip
conda activate isonform
- Install isONcorrect and SPOA (strongly recommended)
pip install isONcorrect
conda install -c bioconda spoa
- Install other dependencies of isONform:
conda install networkx
pip install ordered-set
conda install matplotlib
pip install parasail
pip install pyinstrument
conda install -c cerebis recordclass
- clone this repository
Introduction
This tool generates isoforms out of clustered and corrected long reads.
For this a graph is built up using the networkx api and different simplification strategies are applied to it, such as bubble popping and node merging.
The algorithm uses spoa to generate the final isoforms.
Output
The algorithm produces two files:
-mapping.txt contains information about which reads were mapped together into which consensus. It has the following form:
Line1:consensusID
Line2: List of read names
-spoa.fa contains the actual isoforms stored in the fasta format:
Line1: >consensusID
Line2: consensus sequence
Running the code
To run the test analysis pipeline:
./generateTestResults.sh </path/to/input/reference.fa> <output_root>
If you want to generate Simulated Isoforms for testing,(On my machine:)
python generateTestCases.py --ref /home/alexanderpetri/Desktop/RAWDATA_PhD1/Isoform_Test_data.fa
--sim_genome_len 1344 --nr_reads 10 --outfolder testout
--coords 50 100 150 200 250 300 350 400 450 500
--probs 0.4 0.4 0.4 0.4 0.4 --n_isoforms 8
python generateTestCases.py --ref /path/to/Isoform_Test_data.fa
--sim_genome_len 1344 --nr_reads 10 --outfolder testout
--coords 50 100 150 200 250 300 350 400 450 500
--probs 0.4 0.4 0.4 0.4 0.4 --n_isoforms 8
Actual algorithm
To run the actual algorithm:
(On my machine:)
python main.py --fastq ~/PHDProject1/testout/isoforms.fa --k 9 --w 10 --xmin 14 --xmax 80 --exact --max_seqs_to_spoa 200 --max_bubblesize 2 --delta_len 3 --outfolder testout
python main.py --fastq /path/to/isoforms.fa --k 9 --w 10 --xmin 14 --xmax 80 --exact --max_seqs_to_spoa 200 --max_bubblesize 2 --delta_len 3 --outfolder testout
Credits
Please cite [1] when using isONform.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for isONform-0.1.0-py2.py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8636a1fba5261d366a33b6a51facfa945c348e098eb33b39207f00a67f2e2caa |
|
MD5 | e1a4bd0cd3fca0105777ecd72eaf721e |
|
BLAKE2b-256 | c3e0f643f995f1c27215b5cfce5784d0a64dd78105b847b7730b46beade6e222 |