Skip to main content

Reconstructs viral consensus sequences from a set of ONT reads.

Project description

NGSpeciesID

NGSpeciesID is a tool for clustering and consensus forming of targeted ONT reads. This repository is a modified version of [isONclust](https://github.com/ksahlin/isONclust), where consensus and polishing feautures have been added.

NGSpeciesID is distributed as a python package supported on Linux / OSX with python v3.6. [![Build Status](https://travis-ci.org/ksahlin/NGSpeciesID.svg?branch=master)](https://travis-ci.org/ksahlin/NGSpeciesID).

Table of Contents

  • [INSTALLATION](#INSTALLATION) * [Using conda](#Using-conda) * [Testing installation](#testing-installation)

  • [USAGE](#USAGE) * [Removing primers](#removing-primers) * [Output](#Output)

  • [CREDITS](#CREDITS)

  • [LICENCE](#LICENCE)

INSTALLATION

NOTE: If you are experiencing issues (e.g. [this one](https://github.com/rvaser/spoa/issues/26)) with the third party tools [spoa](https://github.com/rvaser/spoa) or [medaka](https://github.com/nanoporetech/medaka) in the all-in-one installation instructions below, please install the tools manually with their respective installation instructions [here](https://github.com/rvaser/spoa#installation) and [here](https://github.com/nanoporetech/medaka#installation).

### Using conda Conda is the preferred way to install NGSpeciesID.

  1. Create and activate a new environment called NGSpeciesID

` conda create -n NGSpeciesID python=3.6 pip conda activate NGSpeciesID `

  1. Install NGSpeciesID

` pip install NGSpeciesID conda install --yes -c conda-forge -c bioconda medaka==0.11.5 openblas==0.3.3 spoa racon minimap2 ` 3. You should now have ‘NGSpeciesID’ installed; try it: ` NGSpeciesID --help `

Upon start/login to your server/computer you need to activate the conda environment “NGSpeciesID” to run NGSpeciesID as: ` conda activate NGSpeciesID `

### Testing installation

Assuming you are in the NGSpeciesID directory, you can try the installation with

` python NGSpeciesID --ont --fastq test/sample_h1.fastq --outfolder ~/tmp/sample_h1 --consensus --medaka `

If you do not have a test file test/sample_h1.fastq available in your local installation of NGSpeciesID, it is found [here](https://github.com/ksahlin/NGSpeciesID/tree/master/test) (filesize 390kb). You can also downlowad the file using the terminal on Linux or Mac OSx systems as

` curl -O https://github.com/ksahlin/NGSpeciesID/blob/master/test/sample_h1.fastq ` where the file will be downloaded to the folder that you are in, and be named sample_h1.fastq. Assuming you were in the NGSpeciesID directory when you downloaded the file, you can now run

` python NGSpeciesID --ont --fastq sample_h1.fastq --outfolder ~/tmp/sample_h1 --consensus --medaka `

USAGE

NGSpeciesID needs a fastq file generated by an Oxford Nanopore basecaller.

` NGSpeciesID --ont --consensus --medaka --fastq [reads.fastq] --outfolder [/path/to/output] ` The argument –ont simply means –k 13 –w 20. These arguments can be set manually without the –ont flag. Specify number of cores with –t.

NGSpeciesID can also run with racon as polisher. For example

` NGSpeciesID --ont --consensus --racon --racon_iter 3 --fastq [reads.fastq] --outfolder [/path/to/output] ` will polish the consensus sequences with racon three times.

### Removing primers

If customized primers are to be expected in the reads thay can be detected and removed. The primer file is expected to be in fasta format. Here is an example of a primer file:

` >MCB869_ONT_R CGATCAATCCCCTAACAAACTAGG >MCB398_ONT_F TACCATGAGGACAAATATCATTCTG ` NGSpeciesID searches for primes in a window of Xbp (parameter, default 150bp) at the beginning and end of each consensus.

Trimming of primers is performed after consensus forming and can be invoked as ` NGSpeciesID --ont --consensus --medaka --fastq [reads.fastq] --outfolder [/path/to/output] --primer_file [primers.fa] `

NGSpeciesID can also remove universal tails. Trimming of tails is performed after consensus forming and can be invoked as

` NGSpeciesID --ont --consensus --medaka --fastq [reads.fastq] --outfolder [/path/to/output] --remove_universal_tails `

The two options are mutually exclusive, i.e., only one of them can be run.

### Output

The output consists of the polished consensus sequences along with some information about clustering.

  • Polished consensus sequence(s). A folder named “medaka_cl_id_X”[/”racon_cl_id_X”] is created for each predicted consensus. Each such folder contains a sequence “consensus.fasta” which is the final output of NGSpeciesID.

  • Draft spoa consensus sequences of each of the clusters are given as consensus_reference_X.fasta (where X is a number).

  • The final cluster information is given in a tsv file final_clusters.tsv present in the specified output folder.

In the cluster TSV-file, the first column is the cluster ID and the second column is the read accession. For example:

` 0 read_X_acc 0 read_Y_acc ... n read_Z_acc ` if there are n reads there will be n rows. Some reads might be singletons. The rows are ordered with respect to the size of the cluster (largest first).

CREDITS

Please cite [1] when using NGSpeciesID.

  1. TBA

LICENCE

GPL v3.0, see [LICENSE.txt](https://github.com/ksahlin/NGSpeciesID/blob/master/LICENCE.txt).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

NGSpeciesID-0.1.0.1.tar.gz (545.7 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page