Skip to main content

De novo error-correction of long-read transcriptome reads.

Project description


isONcorrect is a tool for error-correcting Oxford Nanopore cDNA reads. It is designed to handle highly variable coverage and exon variation within reads and achieves about a 0.5-1% median error rate after correction (see preprint for details). It leverages regions shared between reads from different isoforms achieve low error rates even for low abundant transcripts. See preprint for details.

Processing and error correction of full-length ONT cDNA reads is acheved by the pipeline of running pychopper --> isONclust --> isONcorrect

isONcorrect is distributed as a python package supported on Linux / OSX with python v>=3.4. Build Status.

Table of Contents


Using conda

Conda is the preferred way to install isONcorrect.

  1. Create and activate a new environment called isoncorrect
conda create -n isoncorrect python=3 pip 
source activate isoncorrect
  1. Install isONcorrect
pip install isONcorrect
  1. You should now have 'isONcorrect' installed; try it:
isONcorrect --help

Upon start/login to your server/computer you need to activate the conda environment "isonclust" to run isONcorrect as:

source activate isoncorrect

Using pip

To install isONcorrect, run:

pip install isONcorrect

pip will install the dependencies automatically for you. pip is pythons official package installer and is included in most python versions. If you do not have pip, it can be easily installed from here and upgraded with pip install --upgrade pip.

Downloading source from GitHub


Make sure the below listed dependencies are installed (installation links below). Versions in parenthesis are suggested as isONcorrect has not been tested with earlier versions of these libraries. However, isONcorrect may also work with earliear versions of these libaries.

In addition, please make sure you use python version >=3.

With these dependencies installed. Run

git clone
cd isONcorrect

Testing installation

You can verify successul installation by running isONcorrect on this small dataset. Simply download the test dataset and run:

isONcorrect --fastq [path to isONcorrect basefolder]/test_data/isoncorrect/0.fastq --outfolder [output path]



For a file with raw ONT cDNA reads the following pipeline is recommended (bash script provided below)

  1. Get full-length ONT cDNA sequences produced by pychopper (a.k.a. cdna_classifier)
  2. Cluster the full length reads where a cluster corresponds to a gene/gene-family
  3. Make fastq files of each cluster
  4. Correct individual clusters
  5. Optional (join reads from separate clusters back to a single file)

Below shows specific pipeline script to go from raw reads raw_reads.fq to corrected full-length reads all_corrected_reads.fq (please modify/remove arguments as needed).


# isonano pipeline to get high quality full length reads from transcriots  raw_reads.fq outfolder/reads_full_length.fq \
                      [-t cores]  [-w outfolder/rescued.fq  -u outfolder/unclassified.fq  -S outfolder/stats.txt] 

isONclust  --ont --fastq outfolder/reads_full_length.fq \
             --outfolder outfolder/clustering  [--t cores] 
isONclust write_fastq --N 1 --clusters outfolder/clustering/final_clusters.csv \
          --fastq reads_full_length.fq --outfolder  outfolder/clustering/fastq_files 
run_isoncorrect --fastq_folder outfolder/clustering/fastq_files  --outfolder /outfolder/correction/ 

touch all_corrected_reads.fq
for f in in outfolder/clustering/fastq_files/*/corrected_reads.fastq; 
  cat {f} >> all_corrected_reads.fq

isONcorrect does not need ONT reads to be full-length (i.e., produced by pychopper), but unless you have specific other goals, it is advised to run pychopper for any kind of downstream analysis to guarantee full-length reads.


The output of run_isoncorrect are one file per cluster with identical headers to the original reads.


Please cite [1] when using isONcorrect.

  1. Kristoffer Sahlin, Botond Sipos, Phillip L James, Daniel Turner, Paul Medvedev (2020) Link.

Bib record:


GPL v3.0, see LICENSE.txt.

Project details

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for isONcorrect, version 0.0.1
Filename, size File type Python version Upload date Hashes
Filename, size isONcorrect-0.0.1.tar.gz (34.8 kB) File type Source Python version None Upload date Hashes View

Supported by

AWS AWS Cloud computing Datadog Datadog Monitoring DigiCert DigiCert EV certificate Facebook / Instagram Facebook / Instagram PSF Sponsor Fastly Fastly CDN Google Google Object Storage and Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Salesforce Salesforce PSF Sponsor Sentry Sentry Error logging StatusPage StatusPage Status page