Skip to main content

It queries the BOLD database to get identification of taxa based on COI sequences

Project description

Pypi index Build Status Cover alls Dependencies status Downloads

This script accepts FASTA files containing COI sequences. It queries the BOLD database http://boldsystems.org/ in order to get the taxa identification based on the sequences.

Run this way

  • clone repository:

    cd $USERAPPL
    git clone https://github.com/carlosp420/bold_retriever.git
  • install dependencies (python2.7):

    cd bold_retriever
    module load biopython-env
    pip install -r requirements.txt
  • run software

You have to choose one of the databases available from BOLD http://www.boldsystems.org/index.php/resources/api?type=idengine and enter it as argument:

  • COX1_SPECIES

  • COX1

  • COX1_SPECIES_PUBLIC

  • COX1_L640bp

For example:

python bold_retriever.py -f ZA2013-0565.fasta -db COX1_SPECIES
  • output:

    seq_id  bold_id       similarity  division  class       order       family        species                collection_country
    OTU_99  FBNE064-11    1           animal    Insecta     Neuroptera  Hemerobiidae  Hemerobius pini        Germany
    OTU_99  NEUFI079-11   1           animal    Insecta     Neuroptera  Hemerobiidae  Hemerobius pini        Finland
    OTU_99  FBNE172-13    0.9937      animal    Insecta     Neuroptera  Hemerobiidae  Hemerobius atrifrons   Germany
    OTU_99  FBNE162-13    0.9936      animal    Insecta     Neuroptera  Hemerobiidae  Hemerobius contumax    Austria
    OTU_99  TTSOW138-09   0.9811      animal    Insecta     Neuroptera  Hemerobiidae  Hemerobius ovalis      Canada
    OTU_99  CNPAH380-13   0.9811      animal    Insecta     Neuroptera  Hemerobiidae  Hemerobius             Canada
    OTU_99  CNKOF1602-14  0.9811      animal    Insecta     Neuroptera  Hemerobiidae  Hemerobius pinidumus   Canada
    OTU_99  NRAS173-11    0.9748      animal    Insecta     Neuroptera  Hemerobiidae  Hemerobius conjunctus  Canada
    OTU_99  SSBAE2911-13  0.9748      animal    Collembola  None        None          Collembola             Canada
    OTU_99  CNPAQ117-13   0.9686      animal    Insecta     Neuroptera  Hemerobiidae  Hemerobius humulinus   Canada

Speed

bold_retriever uses the library Twisted for performing asynchronous calls. This speeds up the total processing time:

benchmarks

Full documentation

See the full documentation at http://bold-retriever.readthedocs.org

History

  • v1.0.0: Using Twisted for asynchronous calls and increase in speed.

  • v0.2.4: Reorganizing columns in output file. Querying the API for family

    name of taxa.

  • v0.2.2: Killed bug taxon search.

  • v0.2.1: Killed bug in scraping web Public_BIN for species ID.

  • v0.2.0: Scraping web Public_BIN for species ID.

  • v0.1.9: Added request_id test and option to run fuction in debug mode.

  • v0.1.8: Fixed bug for exception when BOLD sends empty list of taxon names.

  • v0.1.7: Fixed bug for exception when BOLD sends empty list of taxon names.

  • v0.1.6: Append taxon identification results to file as we get them.

  • v0.1.5: Additionat tests coverage 92%

  • v0.1.4: Fixed bug in taxon_search function

  • v0.1.3: Coverage 75%

  • v0.1.2: Pep8 and test coverage 69%

  • v0.1.1: Packaged as Python module.

  • v0.1.0: You can specify which BOLD datase should be used for BLAST of FASTA sequences.

  • v0.0.7: Catching exception for NULL, list and text returned instead of XML from BOLD.

  • v0.0.6: Catching exception for malformed XML from BOLD.

  • v0.0.5: Catch exception when BOLD sends funny data such as {"481541":[]}.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bold_retriever-1.0.0.tar.gz (25.6 kB view details)

Uploaded Source

File details

Details for the file bold_retriever-1.0.0.tar.gz.

File metadata

File hashes

Hashes for bold_retriever-1.0.0.tar.gz
Algorithm Hash digest
SHA256 c5fc2f760ba3aaa097cbe21630602e6abd323e01c67d6ebb848f1e74b29856cf
MD5 8bed9f1bd826f533f39fa958e0b64765
BLAKE2b-256 72deafe1a354d8376ca80955b097374d0b1b713348bd26b0446309d865d6d2fe

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page