Simple program to more efficiently run blast in multicore systems, as well as rough taxonomomic annoation using BASTA LCA

Project description

justblast

This is a simple program to more efficiently run blast on multicore machines, have a simple extension to run and plot the last common ancestor (LCA) using Tim Kahlke's BASTA, and allowing the input to be in fastq format.

Requirements

READ THIS BEFORE INSTALLING

To run this program you will need to have the blast+ tools install in your machine. Go to the link above and follow the instructions. The installation instruction below should try to install BASTA, however, BASTA has as requirement the database LevelDB that needs to be installed on your system. On Linux systems you can do:

sudo apt-get update
sudo apt-get install python-leveldb

If you dont have administrative privileges, contact your sys admin.

On mac OS:

brew install leveldb

A note for Compute Canada Users

Before installing you will need to load the python packages and the levelDB modules by:

module load nixpkgs/16.09 scipy-stack/2018b # python modules
module load gcc/5.4.0 leveldb/1.20 # leveldb

Installation

justblast in on the PyPI repository, and can be installed by:

python3 -m pip install justblast

Or you can clone this repository and run the setup.py install command.

If you do not have admin privileges, you can add the --user option.

Usage

You can explore the options by typing:

justblast -h

and you will get

usage: justblast [-h] [-e EVALUE] [-p PERCENT_ID] [-m MAX_TARGET_SEQS]
            [-q QUERY_COVERAGE] [-c CPUS] [-i] [-o OUT_FILENAME] [-f OUTFMT]
            query db

positional arguments:
  query                 Fasta file with query sequences
  db                    path to blast database

optional arguments:
  -h, --help            show this help message and exit
  -e EVALUE, --evalue EVALUE
                        evalue for blast search (default: 10)
  -p PERCENT_ID, --percent_id PERCENT_ID
                        Minimum percent identity on blast search (default: 0)
  -m MAX_TARGET_SEQS, --max_target_seqs MAX_TARGET_SEQS
                        Number of aligned sequences to keep (default: 500)
  -q QUERY_COVERAGE, --query_coverage QUERY_COVERAGE
                        Minimum query coverage to retain (default: None)
  -c CPUS, --cpus CPUS  Number of cpus to use (default: -1)
  -i, --identify        Whether to use basta to assign taxopnomy to the hits
                        based on LCA. This is a rough estimate and should be
                        revised carefully (default: False)
  -o OUT_FILENAME, --out_filename OUT_FILENAME
                        name of output (filtered) file (default: hit.hits)
  -f OUTFMT, --outfmt OUTFMT
                        Custom format for BLAST (default: qseqid sseqid pident
                        evalue qcovs qlen length staxid stitle)

There are only two positional arguments, the query file and the path to the BLAST database. Most of the optional characters will filter and/or modify the blast search. The two exceptions are identify, which will run basta, and cpus, that can be tailored to your machine (by default it uses all cores in your machine). NOTE: if you are in Compute Canada you HAVE to pass this value matching the number of cores you requested.

Notes on the BASTA run

justblast performs a rough assignment of taxonomy based on BASTA. Here I use the following parameters:

-m 10: A minimum of 10 hits have to agree to assign the given taxonomy
-n 50: Uses the top 50 hits to make the assignment, regardless of you MAX_TARGET_SEQS
The rest of parameters are either default, or use the same as for the blast.

For basta to run your outfmt must contain AT LEAST:

qseqid
sseqid
length
evalue
pident

Dummy Example

Let's say that you have a fasta file called seqs.fasta, and you want to run a blast against the nucleotide database (nt) located on you home folder (/home/user). You want to restrict your blast to an evalue of 1E-10, a percent id of 95%, and retrieve only 50 target sequences that have a query coverage of over 90%. You also want to explore roughly the taxonomic landscape using BASTA. Then you can call the program by:

justblast.py seqs.fasta /home/user/nt -e 1E-10 -p 95 -m 50 -q 90 -i -o results.hits

This will generate a hits file named results.hits and will contain the following columns (note that the outfmt was left default):

qseqid
sseqid
pident
evalue
qcovs
qlen
length
staxid
stitle

Also a file called results_annotated.hits that besides the columns above, will also contain the column lineage.

It will also contain a PDF with the histograms of all the taxonomic levels identified called `results_taxadist.pdf'

Project details

Release history Release notifications | RSS feed

This version

2020.0.4

Sep 21, 2020

2020.0.3

Jun 3, 2020

2020.0.1

Apr 16, 2020

2020.0.0

Feb 13, 2020

2019.0.8

Dec 14, 2019

2019.0.7

Nov 21, 2019

2019.0.6

Nov 21, 2019

2019.0.5

Nov 21, 2019

2019.0.4

Nov 12, 2019

2019.0.3

Nov 12, 2019

2019.0.2

Nov 12, 2019

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

justblast-2020.0.4.tar.gz (11.7 kB view details)

Uploaded Sep 21, 2020 Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

justblast-2020.0.4-py3.6.egg (22.0 kB view details)

Uploaded Sep 21, 2020 Egg

justblast-2020.0.4-py3-none-any.whl (12.2 kB view details)

Uploaded Sep 21, 2020 Python 3

File details

Details for the file justblast-2020.0.4.tar.gz.

File metadata

Download URL: justblast-2020.0.4.tar.gz
Upload date: Sep 21, 2020
Size: 11.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/50.3.0 requests-toolbelt/0.9.1 tqdm/4.49.0 CPython/3.7.5

File hashes

Hashes for justblast-2020.0.4.tar.gz
Algorithm	Hash digest
SHA256	`24c81eae423610beca378b4a34852b263a9e069b2592008b6f84ac2203be0484`
MD5	`68f8b3825ac2b880a7aaddee17a038e6`
BLAKE2b-256	`27a709716699e681043a21d57b0a377578d0cacf353210e4ba93dccff6d46614`

See more details on using hashes here.

File details

Details for the file justblast-2020.0.4-py3.6.egg.

File metadata

Download URL: justblast-2020.0.4-py3.6.egg
Upload date: Sep 21, 2020
Size: 22.0 kB
Tags: Egg
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/50.3.0 requests-toolbelt/0.9.1 tqdm/4.49.0 CPython/3.7.5

File hashes

Hashes for justblast-2020.0.4-py3.6.egg
Algorithm	Hash digest
SHA256	`b2138b5c26ff584e078bdc53ae258d8d6d08ea2c2abb27ffddc8fbb4eee5aef9`
MD5	`783663b799a5a135b03eed0ade0fe84d`
BLAKE2b-256	`ccb182b881af890abd306bc30b5e4f69a241d7559d0172cdc17aaccfef073446`

See more details on using hashes here.

File details

Details for the file justblast-2020.0.4-py3-none-any.whl.

File metadata

Download URL: justblast-2020.0.4-py3-none-any.whl
Upload date: Sep 21, 2020
Size: 12.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/50.3.0 requests-toolbelt/0.9.1 tqdm/4.49.0 CPython/3.7.5

File hashes

Hashes for justblast-2020.0.4-py3-none-any.whl
Algorithm	Hash digest
SHA256	`bbf4407e8022ec596ce32481f195ca94e8bcd95589e169e6d4d92174cdc83d6f`
MD5	`d6e36720fe320e81fead1b7b2aecd98a`
BLAKE2b-256	`e59c9e3cc38737c32525d9cd335898563225013f15177ddb9c27efa2b38be3bf`

See more details on using hashes here.

justblast 2020.0.4

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

justblast

Requirements

READ THIS BEFORE INSTALLING

A note for Compute Canada Users

Installation

Usage

Notes on the BASTA run

Dummy Example

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distributions

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes