Skip to main content

Rapid and accurate classification of FASTQ(A) sequence files

Project description

SeqWho - An accruate and rapid FASTQ(A) file origin classifier

This is the official SeqWho Repository.

SeqWho is a reliable and extremely rapid program designed to determine a FASTQ(A) sequencing file identity, both source protocol and species of origin. This is accomplished using an alignment-free algorithm that leverages a Random Forest classifier that learns from biases in k-mer frequencies and repeat sequence identity. SeqWho is capable of achieving greater than 96% accuracy in its ability to classify files.

You can find the Documentation for SeqWho at: https://daehwankimlab.github.io/seqwho/

First time setup

SeqWho is written in Python 3 and we recommend using a conda environment built from the environment.yml included with SeqWho for optimal performance.

Please read https://daehwankimlab.github.io/seqwho/manual/ for more details.

Download pre-trained SeqWho index

Species Libraries Index
Human, Mouse Amplicon, ChIP-Seq, WGS, WES, miRNA-Seq, RNA-Seq, Bisulfite-Seq, DNase-Seq, ATAC-Seq Index(md5sum)
Training File List
Testing File List
Human, Mouse, Rattus norvegicus ChIP-Seq, WGS, RNA-Seq Index(md5sum)
Training File List
Testing File List
Human, Mouse, Arabidopsis thaliana, Caenorhabditis elegans, Drosophila melanogaster, Saccharomyces cerevisiae ChIP-Seq, WGS, RNA-Seq Index(md5sum)
Training File List
Testing File List

Current release

v1.0.3 - Added option to select number of reads drawn from files during model building

v1.0.2 - Removed extra commas in some fields to facilitate CSV conversion

v1.0.1 - Addition of test files and scripts

v1.0.0 - Initial public release

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

seqwho-1.0.3.tar.gz (27.7 kB view hashes)

Uploaded Source

Built Distribution

seqwho-1.0.3-py3-none-any.whl (30.4 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page