Skip to main content

De novo construction of isoforms from long-read data

Project description

isONform- an algorithm capable of recovering isoforms from long read sequencing data

Table of contents

  1. Installation
  2. Introduction
  3. Output
  4. Running the test script
    1. Running isONform
  5. Credits

Installation

Dependencies

  1. networkx
  2. ordered-set
  3. matplotlib
  4. parasail
  5. edlib
  6. pyinstrument
  7. namedtuple
  8. recordclass

Installation guide

  1. Create a new environment for isONform (at least python 3.7 required):
    conda create -n isonform python=3.10 pip
    conda activate isonform
  2. Install isONcorrect and SPOA (strongly recommended)
    pip install isONcorrect
    conda install -c bioconda spoa
  3. Install other dependencies of isONform:
    conda install networkx
    pip install ordered-set
    conda install matplotlib
    pip install parasail
    pip install pyinstrument
    conda install -c cerebis recordclass
  4. clone this repository

Introduction

This tool generates isoforms out of clustered and corrected long reads. For this a graph is built up using the networkx api and different simplification strategies are applied to it, such as bubble popping and node merging. The algorithm uses spoa to generate the final isoforms.

Output

The algorithm produces two files:
-mapping.txt contains information about which reads were mapped together into which consensus. It has the following form:
Line1:consensusID
Line2: List of read names

-spoa.fa contains the actual isoforms stored in the fasta format:
Line1: >consensusID
Line2: consensus sequence

Running the code

To run the test analysis pipeline:

./generateTestResults.sh  </path/to/input/reference.fa> <output_root>

If you want to generate Simulated Isoforms for testing,(On my machine:)

python generateTestCases.py --ref /home/alexanderpetri/Desktop/RAWDATA_PhD1/Isoform_Test_data.fa 
					--sim_genome_len 1344 --nr_reads 10 --outfolder testout 
					--coords 50 100 150 200 250 300 350 400 450 500 
					--probs 0.4 0.4 0.4 0.4 0.4 --n_isoforms 8 
python generateTestCases.py --ref /path/to/Isoform_Test_data.fa 
							--sim_genome_len 1344 --nr_reads 10 --outfolder testout 
							--coords 50 100 150 200 250 300 350 400 450 500 
							--probs 0.4 0.4 0.4 0.4 0.4 --n_isoforms 8

Actual algorithm

To run the actual algorithm:
(On my machine:)

python main.py --fastq ~/PHDProject1/testout/isoforms.fa --k 9 --w 10 --xmin 14 --xmax 80 --exact --max_seqs_to_spoa 200 --max_bubblesize 2 --delta_len 3 --outfolder testout
python main.py --fastq /path/to/isoforms.fa --k 9 --w 10 --xmin 14 --xmax 80 --exact --max_seqs_to_spoa 200 --max_bubblesize 2 --delta_len 3 --outfolder testout

Credits

Please cite [1] when using isONform.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

isONform-0.1.0.tar.gz (33.4 kB view hashes)

Uploaded Source

Built Distribution

isONform-0.1.0-py2.py3-none-any.whl (33.0 kB view hashes)

Uploaded Python 2 Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page