It queries the BOLD database to get identification of taxa based on COI sequences
Project description
This script accepts FASTA files containing COI sequences. It queries the BOLD database http://boldsystems.org/ in order to get the taxa identification based on the sequences.
Run this way
clone repository:
cd $USERAPPL git clone https://github.com/carlosp420/bold_retriever.git
install dependencies (python2.7):
cd bold_retriever module load biopython-env pip install -r requirements.txt
run software
You have to choose one of the databases available from BOLD http://www.boldsystems.org/index.php/resources/api?type=idengine and enter it as argument:
COX1_SPECIES
COX1
COX1_SPECIES_PUBLIC
COX1_L640bp
For example:
python bold_retriever.py -f ZA2013-0565.fasta -db COX1_SPECIES
output:
seq_id bold_id similarity division class order family species collection_country OTU_99 FBNE064-11 1 animal Insecta Neuroptera Hemerobiidae Hemerobius pini Germany OTU_99 NEUFI079-11 1 animal Insecta Neuroptera Hemerobiidae Hemerobius pini Finland OTU_99 FBNE172-13 0.9937 animal Insecta Neuroptera Hemerobiidae Hemerobius atrifrons Germany OTU_99 FBNE162-13 0.9936 animal Insecta Neuroptera Hemerobiidae Hemerobius contumax Austria OTU_99 TTSOW138-09 0.9811 animal Insecta Neuroptera Hemerobiidae Hemerobius ovalis Canada OTU_99 CNPAH380-13 0.9811 animal Insecta Neuroptera Hemerobiidae Hemerobius Canada OTU_99 CNKOF1602-14 0.9811 animal Insecta Neuroptera Hemerobiidae Hemerobius pinidumus Canada OTU_99 NRAS173-11 0.9748 animal Insecta Neuroptera Hemerobiidae Hemerobius conjunctus Canada OTU_99 SSBAE2911-13 0.9748 animal Collembola None None Collembola Canada OTU_99 CNPAQ117-13 0.9686 animal Insecta Neuroptera Hemerobiidae Hemerobius humulinus Canada
Speed
bold_retriever uses the library Twisted for performing asynchronous calls. This speeds up the total processing time:
Full documentation
See the full documentation at http://bold-retriever.readthedocs.org
History
v1.0.0: Using Twisted for asynchronous calls and increase in speed.
- v0.2.4: Reorganizing columns in output file. Querying the API for family
name of taxa.
v0.2.2: Killed bug taxon search.
v0.2.1: Killed bug in scraping web Public_BIN for species ID.
v0.2.0: Scraping web Public_BIN for species ID.
v0.1.9: Added request_id test and option to run fuction in debug mode.
v0.1.8: Fixed bug for exception when BOLD sends empty list of taxon names.
v0.1.7: Fixed bug for exception when BOLD sends empty list of taxon names.
v0.1.6: Append taxon identification results to file as we get them.
v0.1.5: Additionat tests coverage 92%
v0.1.4: Fixed bug in taxon_search function
v0.1.3: Coverage 75%
v0.1.2: Pep8 and test coverage 69%
v0.1.1: Packaged as Python module.
v0.1.0: You can specify which BOLD datase should be used for BLAST of FASTA sequences.
v0.0.7: Catching exception for NULL, list and text returned instead of XML from BOLD.
v0.0.6: Catching exception for malformed XML from BOLD.
v0.0.5: Catch exception when BOLD sends funny data such as {"481541":[]}.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file bold_retriever-1.0.0.tar.gz
.
File metadata
- Download URL: bold_retriever-1.0.0.tar.gz
- Upload date:
- Size: 25.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | c5fc2f760ba3aaa097cbe21630602e6abd323e01c67d6ebb848f1e74b29856cf |
|
MD5 | 8bed9f1bd826f533f39fa958e0b64765 |
|
BLAKE2b-256 | 72deafe1a354d8376ca80955b097374d0b1b713348bd26b0446309d865d6d2fe |