De novo clustering of long-read transcriptome reads.
Project description
NGSpeciesID
NGSpeciesID is a tool for clustering and consensus forming of targeted ONT reads. This repository is a modified version of [isONclust](https://github.com/ksahlin/isONclust), where consensus and polishing feautures have been added.
NGSpeciesID is distributed as a python package supported on Linux / OSX with python v3.6. [![Build Status](https://travis-ci.org/ksahlin/NGSpeciesID.svg?branch=master)](https://travis-ci.org/ksahlin/NGSpeciesID).
Table of Contents
[INSTALLATION](#INSTALLATION) * [Using conda](#Using-conda) * [Testing installation](#testing-installation)
[USAGE](#USAGE) * [Output](#Output) * [Parameters](#Parameters)
[CREDITS](#CREDITS)
[LICENCE](#LICENCE)
INSTALLATION
### Using conda Conda is the preferred way to install NGSpeciesID.
Create and activate a new environment called NGSpeciesID
` conda create -n NGSpeciesID python=3.6 pip source activate NGSpeciesID `
Install NGSpeciesID
` pip install NGSpeciesID conda install --yes -c bioconda medaka ` 3. You should now have ‘NGSpeciesID’ installed; try it: ` NGSpeciesID --help `
Upon start/login to your server/computer you need to activate the conda environment “NGSpeciesID” to run NGSpeciesID as: ` source activate NGSpeciesID `
Install [medaka](https://github.com/nanoporetech/medaka).
### Testing installation
TBD
USAGE
NGSpeciesID needs a fastq file generated by an Oxford Nanopore basecaller.
` NGSpeciesID --ont --consensus --medaka --fastq [reads.fastq] --outfolder [/path/to/output] ` The argument –ont simply means –k 13 –w 20. These arguments can be set manually without the –ont flag. Specify number of cores with –t.
### Output
The output consists of clustering and consensus information.
The final cluster information is given in a tsv file final_clusters.tsv present in the specified output folder.
Draft spoa consensus sequences of each of the clusters are given as consensus_reference_X.fasta (where X is a number).
A folder named “medaka_cl_id_X” is created for each spoa consensus. Each medaka outfolder contains medakas output, including the final polished consensus named (by medaka) as “consensus.fasta”.
In the cluster TSV-file, the first column is the cluster ID and the second column is the read accession. For example:
` 0 read_X_acc 0 read_Y_acc ... n read_Z_acc ` if there are n reads there will be n rows. Some reads might be singletons. The rows are ordered with respect to the size of the cluster (largest first).
CREDITS
Please cite [1] when using NGSpeciesID.
TBA
LICENCE
GPL v3.0, see [LICENSE.txt](https://github.com/ksahlin/NGSpeciesID/blob/master/LICENCE.txt).
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.