Skip to main content

De novo construction of isoforms from long-read data

Project description

isONform - Reference-free isoform reconstruction from long read sequencing data

Table of contents

  1. Installation
  2. Introduction
  3. Output
  4. Input data
  5. Running isONform
    1. Running a test
  6. Credits

Installation

Via pip

pip install isONform

This command installs isONforms dependencies:

  1. networkx
  2. ordered-set
  3. matplotlib
  4. parasail
  5. edlib
  6. pyinstrument
  7. namedtuple
  8. recordclass

From github source

  1. Create a new environment for isONform (at least python 3.7 required):
    conda create -n isonform python=3.10 pip
    conda activate isonform

  2. Install isONcorrect and SPOA
    pip install isONcorrect
    conda install -c bioconda spoa

  3. Install other dependencies of isONform:
    conda install networkx
    pip install parasail

  4. clone this repository

Introduction

IsONform generates isoforms out of clustered and corrected long reads. For this a graph is built up using the networkx api and different simplification strategies are applied to it, such as bubble popping and node merging. The algorithm uses spoa to generate the final isoforms.

Input data

The isONpipeline takes .fastq files generated with long-read sequencing techniques (ONT or Pacbio) as an input that additionally have been cleaned of barcodes. Please make sure that you run the isONpipeline on data that have been processed with LIMA (Pacbio data) or Pychopper (ONT data) so that all the barcodes are removed from the reads

Running isONform

To only run the isONform algorithm:

isONform_parallel --fastq_folder path/to/input/files --t <nr_cores> --outfolder /path/to/outfolder --split_wrt_batches 

Note: Please always use absolute paths to the files or folders

The full isON-pipeline (isONclust, isONcorrect, isONform) can be found here and is run via:

./isON_pipeline.sh --raw_reads </absolute/path/to/raw_reads.fq>  --outfolder <outfolder>  --num_cores <num_cores> --isONform_folder <isONform_folder> --iso_abundance <iso_abundance> --mode <mode>

(Please note that this requires isONclust LINK and isONcorrect LINK to be installed in addition to isONform)

To receive more information about the arguments used for the isON_pipeline script:

./isON_pipeline.sh --help

Outputs

IsONform outputs three main files: transcriptome.fasta, mapping.txt, and support.txt. For each isoform that isONform reconstructs the id has the following form: x_y_z.

'x' denotes the isONclust cluster that the isoform stems from. As we cluster reads as in isONcorrect in batches of 1000 reads the 'y' denotes from which batch the isoform was reconstructed. The 'z' denotes a unique identifier which enables us to have unique ids for each isoform that we reconstructed. In mapping.txt it is indicated from which original reads an isoform has been reconstructed. support_txt gives the support (i.e. how many original reads make up the isoform).

Contact

If you encounter any problems, please raise an issue on the issues page, you can also contact the developer of this repository via: alexander.petri[at]math.su.se

Credits

Please cite [1] when using isONform.

  1. Petri, A. J., & Sahlin, K. (2023). isONform: reference-free transcriptome reconstruction from Oxford Nanopore data. Bioinformatics, 39(Supplement_1), i222-i231. https://academic.oup.com/bioinformatics/article/39/Supplement_1/i222/7210488 .

Please additionally cite [2] and [3] when running the full pipeline.

  1. Kristoffer Sahlin, Paul Medvedev. De Novo Clustering of Long-Read Transcriptome Data Using a Greedy, Quality-Value Based Algorithm, Journal of Computational Biology 2020, 27:4, 472-484. Link.
  2. Sahlin, K., Medvedev, P. Error correction enables use of Oxford Nanopore technology for reference-free transcriptome analysis. Nat Commun 12, 2 (2021). https://doi.org/10.1038/s41467-020-20340-8 Link.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

isonform-0.3.9.tar.gz (61.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

isONform-0.3.9-py3-none-any.whl (65.8 kB view details)

Uploaded Python 3

File details

Details for the file isonform-0.3.9.tar.gz.

File metadata

  • Download URL: isonform-0.3.9.tar.gz
  • Upload date:
  • Size: 61.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.8.3

File hashes

Hashes for isonform-0.3.9.tar.gz
Algorithm Hash digest
SHA256 744efc1ed4ea1247687cdc2c3d4a3361a1c9355beedda06270cca81448f1f43e
MD5 27525846c42df061602e5464f302f104
BLAKE2b-256 258bd1fb4abec0ca1d281e8b70d896db4c7dd20e47f2bf411bac1b5d9b6ab687

See more details on using hashes here.

File details

Details for the file isONform-0.3.9-py3-none-any.whl.

File metadata

  • Download URL: isONform-0.3.9-py3-none-any.whl
  • Upload date:
  • Size: 65.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.8.3

File hashes

Hashes for isONform-0.3.9-py3-none-any.whl
Algorithm Hash digest
SHA256 502259f8772eb39827684f90b9bc92f0d892453d06e6f39c3be4c187b7d54703
MD5 6b4c97b3207706f60a88858fa73d9974
BLAKE2b-256 c8288885891a4f74762d04cdb15a506926ae05ae9bf1abc0ce682aba39c09e4f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page