It queries the BOLD database to get identification of taxa based on COI sequences
Project description
This script accepts FASTA files containing COI sequences. It queries the BOLD database http://boldsystems.org/ in order to get the taxa identification based on the sequences.
Run this way
clone repository:
cd $USERAPPL git clone https://github.com/carlosp420/bold_retriever.git
install dependencies (python2.7):
cd bold_retriever module load biopython-env pip install -r requirements.txt
run software
You have to choose one of the databases available from BOLD http://www.boldsystems.org/index.php/resources/api?type=idengine and enter it as argument:
COX1_SPECIES
COX1
COX1_SPECIES_PUBLIC
COX1_L640bp
For example:
python bold_retriever.py -f ZA2013-0565.fasta -db COX1_SPECIES
output:
seq_id bold_id similarity division class order family species collection_country OTU_99 FBNE064-11 1 animal Insecta Neuroptera Hemerobiidae Hemerobius pini Germany OTU_99 NEUFI079-11 1 animal Insecta Neuroptera Hemerobiidae Hemerobius pini Finland OTU_99 FBNE172-13 0.9937 animal Insecta Neuroptera Hemerobiidae Hemerobius atrifrons Germany OTU_99 FBNE162-13 0.9936 animal Insecta Neuroptera Hemerobiidae Hemerobius contumax Austria OTU_99 TTSOW138-09 0.9811 animal Insecta Neuroptera Hemerobiidae Hemerobius ovalis Canada OTU_99 CNPAH380-13 0.9811 animal Insecta Neuroptera Hemerobiidae Hemerobius Canada OTU_99 CNKOF1602-14 0.9811 animal Insecta Neuroptera Hemerobiidae Hemerobius pinidumus Canada OTU_99 NRAS173-11 0.9748 animal Insecta Neuroptera Hemerobiidae Hemerobius conjunctus Canada OTU_99 SSBAE2911-13 0.9748 animal Collembola None None Collembola Canada OTU_99 CNPAQ117-13 0.9686 animal Insecta Neuroptera Hemerobiidae Hemerobius humulinus Canada
Speed
bold_retriever uses the library Twisted for performing asynchronous calls. This speeds up the total processing time:
Full documentation
See the full documentation at http://bold-retriever.readthedocs.org
History
v1.0.0: Using Twisted for asynchronous calls and increase in speed.
- v0.2.4: Reorganizing columns in output file. Querying the API for family
name of taxa.
v0.2.2: Killed bug taxon search.
v0.2.1: Killed bug in scraping web Public_BIN for species ID.
v0.2.0: Scraping web Public_BIN for species ID.
v0.1.9: Added request_id test and option to run fuction in debug mode.
v0.1.8: Fixed bug for exception when BOLD sends empty list of taxon names.
v0.1.7: Fixed bug for exception when BOLD sends empty list of taxon names.
v0.1.6: Append taxon identification results to file as we get them.
v0.1.5: Additionat tests coverage 92%
v0.1.4: Fixed bug in taxon_search function
v0.1.3: Coverage 75%
v0.1.2: Pep8 and test coverage 69%
v0.1.1: Packaged as Python module.
v0.1.0: You can specify which BOLD datase should be used for BLAST of FASTA sequences.
v0.0.7: Catching exception for NULL, list and text returned instead of XML from BOLD.
v0.0.6: Catching exception for malformed XML from BOLD.
v0.0.5: Catch exception when BOLD sends funny data such as {"481541":[]}.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.