It queries the BOLD database to get identification of taxa based on COI sequences
Project description
This script accepts FASTA files containing COI sequences. It queries the BOLD database http://boldsystems.org/ in order to get the taxa identification based on the sequences.
Run this way
clone repository:
cd $USERAPPL git clone https://github.com/carlosp420/bold_retriever.git
install dependencies:
cd bold_retriever module load biopython-env pip install -r requirements.txt
run software
You have to choose one of the databases available from BOLD http://www.boldsystems.org/index.php/resources/api?type=idengine and enter it as argument:
COX1_SPECIES
COX1
COX1_SPECIES_PUBLIC
COX1_L640bp
For example:
python bold_retriever.py -f ZA2013-0565.fasta -db COX1_SPECIES
output:
bold_id seq_id similarity collection_country division taxon class order family FIDIP558-11 TE-14-27_FHYP_av 0.9884 Finland animal Diptera Insecta Diptera None GBDP6413-09 TE-14-27_FHYP_av 0.9242 None animal Ornithomya anchineura Insecta Diptera Hippoboscidae GBDP2916-07 TE-14-27_FHYP_av 0.922 None animal Stenepteryx hirundinis Insecta Diptera Hippoboscidae GBDP2919-07 TE-14-27_FHYP_av 0.9149 None animal Ornithomya biloba Insecta Diptera Hippoboscidae GBDP2908-07 TE-14-27_FHYP_av 0.9078 None animal Ornithoctona sp. P-20 Insecta Diptera Hippoboscidae GBDP2918-07 TE-14-27_FHYP_av 0.9076 None animal Ornithomya chloropus Insecta Diptera Hippoboscidae GBDP2935-07 TE-14-27_FHYP_av 0.8936 None animal Crataerina pallida Insecta Diptera Hippoboscidae GBMIN26225-13 TE-14-27_FHYP_av 0.8889 None animal Lucilia sericata Insecta Diptera Calliphoridae GBDP5820-09 TE-14-27_FHYP_av 0.8833 None animal Coenosia tigrina Insecta Diptera Muscidae GBMIN26204-13 TE-14-27_FHYP_av 0.883 None animal Lucilia cuprina Insecta Diptera Calliphoridae GBMIN18768-13 TE-14-27_FHYP_av 0.8823 Brazil animal Ornithoctona erythrocephala Insecta Diptera Hippoboscidae
Full documentation
See the full documentation at http://bold-retriever.readthedocs.org
History
v0.2.2: Killed bug taxon search.
v0.2.1: Killed bug in scraping web Public_BIN for species ID.
v0.2.0: Scraping web Public_BIN for species ID.
v0.1.9: Added request_id test and option to run fuction in debug mode.
v0.1.8: Fixed bug for exception when BOLD sends empty list of taxon names.
v0.1.7: Fixed bug for exception when BOLD sends empty list of taxon names.
v0.1.6: Append taxon identification results to file as we get them.
v0.1.5: Additionat tests coverage 92%
v0.1.4: Fixed bug in taxon_search function
v0.1.3: Coverage 75%
v0.1.2: Pep8 and test coverage 69%
v0.1.1: Packaged as Python module.
v0.1.0: You can specify which BOLD datase should be used for BLAST of FASTA sequences.
v0.0.7: Catching exception for NULL, list and text returned instead of XML from BOLD.
v0.0.6: Catching exception for malformed XML from BOLD.
v0.0.5: Catch exception when BOLD sends funny data such as {"481541":[]}.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for bold_retriever-0.2.2-py2.py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 755a9fdb4c1a57211901b61fb07321724919fd656c41ab20ef81d3ba2737fe5d |
|
MD5 | 873e101f126d40bce84e70c70e3cd070 |
|
BLAKE2b-256 | bd346fb37d7a6826af98a95168b3428b6059c148e2477d6ca321879900ae0fc4 |