NGSpeciesID

De novo clustering of long-read transcriptome reads.

These details have not been verified by PyPI

Project links

Homepage

Project description

NGSpeciesID

NGSpeciesID is a tool for clustering and consensus forming of targeted ONT reads. This repository is a modified version of [isONclust](https://github.com/ksahlin/isONclust), where consensus and polishing feautures have been added.

NGSpeciesID is distributed as a python package supported on Linux / OSX with python v3.6. [![Build Status](https://travis-ci.org/ksahlin/NGSpeciesID.svg?branch=master)](https://travis-ci.org/ksahlin/NGSpeciesID).

[INSTALLATION](#INSTALLATION) * [Using conda](#Using-conda) * [Testing installation](#testing-installation)

[USAGE](#USAGE) * [Output](#Output) * [Parameters](#Parameters)

[CREDITS](#CREDITS)

[LICENCE](#LICENCE)

INSTALLATION

### Using conda Conda is the preferred way to install NGSpeciesID.

Create and activate a new environment called NGSpeciesID

` conda create -n NGSpeciesID python=3.6 pip source activate NGSpeciesID `

Install NGSpeciesID

` pip install NGSpeciesID conda install --yes -c bioconda medaka ` 3. You should now have ‘NGSpeciesID’ installed; try it: ` NGSpeciesID --help `

Upon start/login to your server/computer you need to activate the conda environment “NGSpeciesID” to run NGSpeciesID as: ` source activate NGSpeciesID `

Install [medaka](https://github.com/nanoporetech/medaka).

### Testing installation

TBD

USAGE

NGSpeciesID needs a fastq file generated by an Oxford Nanopore basecaller.

` NGSpeciesID --ont --consensus --medaka --fastq [reads.fastq] --outfolder [/path/to/output] ` The argument –ont simply means –k 13 –w 20. These arguments can be set manually without the –ont flag. Specify number of cores with –t.

### Output

The output consists of clustering and consensus information.

The final cluster information is given in a tsv file final_clusters.tsv present in the specified output folder.
Draft spoa consensus sequences of each of the clusters are given as consensus_reference_X.fasta (where X is a number).
A folder named “medaka_cl_id_X” is created for each spoa consensus. Each medaka outfolder contains medakas output, including the final polished consensus named (by medaka) as “consensus.fasta”.

In the cluster TSV-file, the first column is the cluster ID and the second column is the read accession. For example:

` 0 read_X_acc 0 read_Y_acc ... n read_Z_acc ` if there are n reads there will be n rows. Some reads might be singletons. The rows are ordered with respect to the size of the cluster (largest first).

CREDITS

Please cite [1] when using NGSpeciesID.

LICENCE

GPL v3.0, see [LICENSE.txt](https://github.com/ksahlin/NGSpeciesID/blob/master/LICENCE.txt).

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

0.3.0

Jun 26, 2023

0.1.3

Dec 7, 2021

0.1.2.2

Oct 13, 2021

0.1.2.1

Sep 12, 2021

0.1.2

Sep 12, 2021

0.1.1.1

Nov 26, 2020

0.1.1.0

Nov 24, 2020

0.1.0.1

Nov 21, 2020

0.1.0.0

May 29, 2020

0.0.9.1

May 15, 2020

0.0.9

May 3, 2020

0.0.8.5

Mar 3, 2020

0.0.8.4

Feb 16, 2020

0.0.8.3

Feb 16, 2020

0.0.8.2

Feb 16, 2020

0.0.8.1

Feb 16, 2020

0.0.8

Feb 16, 2020

This version

0.0.7

Feb 16, 2020

0.0.6

Feb 11, 2020

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

NGSpeciesID-0.0.7.tar.gz (541.4 kB view hashes)

Uploaded Feb 16, 2020 Source

Hashes for NGSpeciesID-0.0.7.tar.gz

Hashes for NGSpeciesID-0.0.7.tar.gz
Algorithm	Hash digest
SHA256	`12a0639e7e861dfe2f7768da8b2afb33ae4f71e3b885b4e840e2a53f3d55bccc`
MD5	`bd43877c7731300be6274c4328f7d08a`
BLAKE2b-256	`7c1a52799599834d2cbc63ed1ca0601ce1da0e15b5cf2f141c5f3107297ee582`