Skip to main content

It queries the BOLD database to get identification of taxa based on COI sequences

Project description

Pypi index Build Status Cover alls Dependencies status Downloads

This script accepts FASTA files containing COI sequences. It queries the BOLD database http://boldsystems.org/ in order to get the taxa identification based on the sequences.

Run this way

  • clone repository:

    cd $USERAPPL
    git clone https://github.com/carlosp420/bold_retriever.git
    
  • install dependencies (python2.7):

    cd bold_retriever
    module load biopython-env
    pip install -r requirements.txt
    
  • run software

You have to choose one of the databases available from BOLD http://www.boldsystems.org/index.php/resources/api?type=idengine and enter it as argument:

  • COX1_SPECIES
  • COX1
  • COX1_SPECIES_PUBLIC
  • COX1_L640bp

For example:

python bold_retriever.py -f ZA2013-0565.fasta -db COX1_SPECIES
  • output:

    seq_id  bold_id       similarity  division  class       order       family        species                collection_country
    OTU_99  FBNE064-11    1           animal    Insecta     Neuroptera  Hemerobiidae  Hemerobius pini        Germany
    OTU_99  NEUFI079-11   1           animal    Insecta     Neuroptera  Hemerobiidae  Hemerobius pini        Finland
    OTU_99  FBNE172-13    0.9937      animal    Insecta     Neuroptera  Hemerobiidae  Hemerobius atrifrons   Germany
    OTU_99  FBNE162-13    0.9936      animal    Insecta     Neuroptera  Hemerobiidae  Hemerobius contumax    Austria
    OTU_99  TTSOW138-09   0.9811      animal    Insecta     Neuroptera  Hemerobiidae  Hemerobius ovalis      Canada
    OTU_99  CNPAH380-13   0.9811      animal    Insecta     Neuroptera  Hemerobiidae  Hemerobius             Canada
    OTU_99  CNKOF1602-14  0.9811      animal    Insecta     Neuroptera  Hemerobiidae  Hemerobius pinidumus   Canada
    OTU_99  NRAS173-11    0.9748      animal    Insecta     Neuroptera  Hemerobiidae  Hemerobius conjunctus  Canada
    OTU_99  SSBAE2911-13  0.9748      animal    Collembola  None        None          Collembola             Canada
    OTU_99  CNPAQ117-13   0.9686      animal    Insecta     Neuroptera  Hemerobiidae  Hemerobius humulinus   Canada
    

Speed

bold_retriever uses the library Twisted for performing asynchronous calls. This speeds up the total processing time:

benchmarks

Full documentation

See the full documentation at http://bold-retriever.readthedocs.org

History

  • v1.0.0: Using Twisted for asynchronous calls and increase in speed.
  • v0.2.4: Reorganizing columns in output file. Querying the API for family
    name of taxa.
  • v0.2.2: Killed bug taxon search.
  • v0.2.1: Killed bug in scraping web Public_BIN for species ID.
  • v0.2.0: Scraping web Public_BIN for species ID.
  • v0.1.9: Added request_id test and option to run fuction in debug mode.
  • v0.1.8: Fixed bug for exception when BOLD sends empty list of taxon names.
  • v0.1.7: Fixed bug for exception when BOLD sends empty list of taxon names.
  • v0.1.6: Append taxon identification results to file as we get them.
  • v0.1.5: Additionat tests coverage 92%
  • v0.1.4: Fixed bug in taxon_search function
  • v0.1.3: Coverage 75%
  • v0.1.2: Pep8 and test coverage 69%
  • v0.1.1: Packaged as Python module.
  • v0.1.0: You can specify which BOLD datase should be used for BLAST of FASTA sequences.
  • v0.0.7: Catching exception for NULL, list and text returned instead of XML from BOLD.
  • v0.0.6: Catching exception for malformed XML from BOLD.
  • v0.0.5: Catch exception when BOLD sends funny data such as {"481541":[]}.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Filename, size & hash SHA256 hash help File type Python version Upload date
bold_retriever-1.0.0.tar.gz (25.6 kB) Copy SHA256 hash SHA256 Source None Nov 6, 2014

Supported by

Elastic Elastic Search Pingdom Pingdom Monitoring Google Google BigQuery Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN DigiCert DigiCert EV certificate StatusPage StatusPage Status page