Skip to main content

All-in-one solution for discovering novel DNA barcode

Project description

Introduction

BarcodeFinder could automatically discover novel DNA barcodes with universal primers. It does three things as listed below.

  • Collect data

    It can automatically retrieve data from NCBI Genbank with restrictions that user provided, such as gene name, taxonomy, sequence name and organelle. Also, it can integrate user provided sequences or alignments.

  • Preprocess data

    Barcodefinder utilizes annotation information in data to divide sequence into fragments (gene, spacer, misc_feature), because data collected from Genbank may not be "uniform". For instance, you can find one gene's upstream and downstream sequences in one record but only gene sequence in another record. The situation becomes worse for intergenic spacers, that various annotation style may cause endless trouble in following analysis.

    Given that one gene or spacer for each species may be sequenced several times, by default, BarcodeFinder removes redundant sequences to left only one record for each species. This behavior can be changed as you wish. Then, mafft was called for alignment. Each sequence's direction were adjusted and all sequences were reordered.

  • Analyze

    Firstly, BarcodeFinder evaluate variance of each alignment by calculating Pi, Shannon Index, observed resolution, tree resolution and average terminal branch length, etc. If the result is lower than given threshold, i.e., it does not have efficient resolution, this alignment were skipped.

    Next, a sliding-window scan will be performed for those alignments passed the test. The high-variance region (variance "hotspot") were picked and its upstream/downstream region were used to find primer.

    Consensus sequence of those conserved region for finding primers were generated and with the help of primer3, candidate primers were selected. After BLAST validation, suitable primers were combined to form several primer pairs. According to the limit of PCR product's length, only pairs with wanted length were left. Note that gaps were removed to calculated real length instead of alignment length. The resolution of the sub-alignment were recalculated to remove false positive primer pairs.

    Finally, primer pairs were reordered by score to make it easy for user to find "best" primer pairs they want.

Prerequisite

  • Python3 (3.5 or above)
  • BLAST+
  • IQTREE
  • MAFFT
  • Biopython
  • coloredlogs
  • matplotlib
  • numpy
  • primer3-py

Project Information

The source code of BarcodeFinder is available under AGPLv3 license. For usage and details, please visit BarcodeFinder on GitHub.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

BarcodeFinder-0.9.26-py3-none-any.whl (42.7 kB view details)

Uploaded Python 3

File details

Details for the file BarcodeFinder-0.9.26-py3-none-any.whl.

File metadata

  • Download URL: BarcodeFinder-0.9.26-py3-none-any.whl
  • Upload date:
  • Size: 42.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.7.2+

File hashes

Hashes for BarcodeFinder-0.9.26-py3-none-any.whl
Algorithm Hash digest
SHA256 22a4fe9878eeb03cbaa238a79123bf64b775bbc70da62d2a9572ca18c93c4130
MD5 c51baa57a7b56b8f8a00eac82b94abf4
BLAKE2b-256 551155603a3cbf4010b2dfb97c1b8131085bf2b84990935d97e95f4a31f7a7ea

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page