Lightweight High level Python 3 API for NCBI BLAST
Project description
Blastpy3
Lightweight High level Python 3 API for NCBI BLAST+ blastn
Blastn
This class contain the wrapper for Blastn and require the installation of ncbi Blast+ 2.2.28+.
Setup Blastn object: Create subject database
Upon instantiation, a database is created from the user-provided subject sequence. Database files are created in a temporary directory. The following parameters can be customized at Blastn objects instantiation
- ref_path: Path to the reference fasta file (not gzipped). Mandatory
- makeblastdb_exec: Path of the makeblastdb executable. Default = "makeblastdb"
- makeblastdb_opt: makeblastdb command line options as a string. Default = ""
To ensure a proper database files deletion at the end of the execution it is possible to call the object using the with
statement.
Alternatively you can call the rm_db
method at the end of the Blastn usage.
Code
with Blastn(ref_path="./subject.fa") as blastn:
print (blastn)
Output
CREATE DATABASE: makeblastdb -dbtype nucl -input_type fasta -in subject.fa -out temp_dir
MAKEBLASTDB CLASS Parameters list
db_dir /tmp/tmplbkdwzm2
db_path /tmp/tmplbkdwzm2/Yeast
makeblastdb_exec makeblastdb
makeblastdb_opt
ref_path ./data/Yeast.fa
verbose False
Cleaning up blast DB files for "subject"
Calling Blastn object: Perform Blastn and return a list of hits
The "align" method of a Blastn object can then be called with a query fasta file (query_path) or directly with a sequence string (query_seq).. The following parameters can be customized at Blastn objects calling:
- query_path: Path to a fasta file containing the query sequences (not gzipped). Mandatory
- query_seq: sequence string
- blast_exec: Path of the blast executable. By Default blastn will be used. Default = "blastn"
- blastn_opt: Blastn command line options as a string. Default = ""
- task: Type of blast to be performed ('blastn' 'blastn-short' 'dc-megablast' 'megablast' 'rmblastn'). Default = "dc-megablast"
- evalue: E Value cuttoff to retain alignments. Default = 1
- best_query_hit: find and return only the best hit per query. Default = False
A list containing 1 BlastHit object for each query hit found in the subject will be returned, except if not hit were found in which situation 'None' will be returned. If the best_query_hit flag was set to True, Only the best hit per query sequence from the query file will be returned.
Code
with Blastn(ref_path="./subject.fa") as blastn:
hit_list = blastn(query_path="./query.fa")
for hit in hit_list:
print (hit)
Output
CREATE DATABASE: makeblastdb -dbtype nucl -input_type fasta -in ./subject.fa -out /tmp/tmp1ZBlfT/subject
MAKE BLAST: blastn -num_threads 4 -task dc-megablast -evalue 1 -outfmt "6 std qseq" -dust no -query ./query.fa -db /tmp/tmp1ZBlfT/subject
2 hits found
HIT 0 Query query1:0-48(+)
Subject subject:19-67(+)
Lenght : 48 Identity : 100.0% Evalue : 2e-23 Bit score : 87.8
Aligned query seq : GCATGCTCGATCAGTAGCTCTCAGTACGCATACGCTAGCATCACGACT
HIT 1 Query query2:0-48(+)
Subject subject:89-137(+)
Lenght : 48 Identity : 100.0% Evalue : 2e-23 Bit score : 87.8
Aligned query seq : CGCATCGACTCGATCTGATCAGCTCACAGTCAGCATCAGCTACGATCA
Cleaning up blast DB files for "subject"
BlastHit
Python object representing a hit found by blastn. The object contains the following public fields:
- id: Auto incremented unique identifier [INT]
- q_id: Query sequence name [STR]
- s_id: Subject sequence name [STR]
- identity: % of identity in the hit [FLOAT 0:100]
- length: length of the hit [INT >=0]
- mis: Number of mismatch in the hit [INT >=0]
- gap: Number of gap in the hit [INT >=0]
- q_start: Hit start position of the query sequence [INT >=0]
- q_end: Hit end position of the query sequence [INT >=0]
- s_start: Hit start position of the subject sequence [INT >=0]
- s_end: Hit end position of the subject sequence [INT >=0]
- evalue: E value of the alignment [FLOAT >=0]
- bscore: Bit score of the alignment[FLOAT >=0]
- q_seq: Sequence of the query aligned on the subject sequence [STR]
- q_orient: Orientation of the query sequence [+ or -]
- s_orient: Orientation of the subject sequence [+ or -]
The validity of numeric value is checked upon instantiation. Invalid values will raise assertion errors.
BlastHit Objects can return a comprehensive report of themselves under the form of an ordered dictionnary:
code
# Interactive import
from BlastHit import BlastHit
# Create a default BlastHit object
h = BlastHit()
# Call the report method
h.get_report(full = True)
Output
OrderedDict([('Query', 'query:0-10(+)'), ('Subject', 'subject:0-10(+)'), ('Identity', 100.0), ('Evalue', 0.0), ('Bit Score', 0.0), ('Hit length', 10), ('Number of gap', 0), ('Number of mismatch', 0)])
Testing pyBlast module
The module can be easily tested thanks to pytest
- Install pytest with pip
pip instal pytest
- Run test with py.test-2.7 -v
Example of output if successful. Please note than some tests might fail due to the random sampling of DNA sequences, and uncertainties of Blastn algorithm.
========================================== test session starts ===========================================
platform linux2 -- Python 2.7.5 -- py-1.4.27 -- pytest-2.7.0 -- /usr/bin/python
rootdir: /home/adrien/Programming/Python/pyBlast, inifile:
collected 21 items
test_pyBlast.py::test_BlastHit[4.16866907958-57-98-69-88-12-100-43-1.40452897105-47.3666242716] PASSED
test_pyBlast.py::test_BlastHit[-1-7-10-20-73-54-25-45-98.7921480151-45.2397166228] xfail
test_pyBlast.py::test_BlastHit[8.92741377413--1-100-36-34-33-14-71-18.8547135761-97.6604693294] xfail
test_pyBlast.py::test_BlastHit[10.5987790458-46--1-45-78-81-86-86-73.8740266727-56.887410005] xfail
test_pyBlast.py::test_BlastHit[66.8213911219-62-48--1-91-10-60-20-88.7850139735-81.7901609219] xfail
test_pyBlast.py::test_BlastHit[86.6626174287-29-83-34--1-53-57-68-17.9799756069-7.83036609495] xfail
test_pyBlast.py::test_BlastHit[5.23985331666-43-85-33-7--1-14-3-74.2130782704-88.9289495285] xfail
test_pyBlast.py::test_BlastHit[75.6935977321-8-78-68-10-39--1-74-44.1447867052-22.5203082483] xfail
test_pyBlast.py::test_BlastHit[39.8692596061-60-5-49-77-9-31--1-2.59963139531-46.3133849683] xfail
test_pyBlast.py::test_BlastHit[15.7192632366-24-92-1-64-82-83-90--1-75.5540618409] xfail
test_pyBlast.py::test_BlastHit[18.6627439886-34-57-60-5-45-26-40-77.7840842678--1] xfail
test_pyBlast.py::test_Blastn[blastn-Queries from Subject] PASSED
test_pyBlast.py::test_Blastn[blastn-Random queries] xfail
test_pyBlast.py::test_Blastn[blastn-short-Queries from Subject] PASSED
test_pyBlast.py::test_Blastn[blastn-short-Random queries] xfail
test_pyBlast.py::test_Blastn[dc-megablast-Queries from Subject] PASSED
test_pyBlast.py::test_Blastn[dc-megablast-Random queries] xfail
test_pyBlast.py::test_Blastn[megablast-Queries from Subject] PASSED
test_pyBlast.py::test_Blastn[megablast-Random queries] xfail
test_pyBlast.py::test_Blastn[rmblastn-Queries from Subject] PASSED
test_pyBlast.py::test_Blastn[rmblastn-Random queries] xfail
================================== 6 passed, 15 xfailed in 5.91 seconds ==================================
Dependencies
- ncbi Blast+ 2.2.28+
- python package pytest:
pip instal pytest
Authors and Contact
Adrien Leger - 2015
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.