All-in-one solution for discovering novel DNA barcode
Project description
Introduction
BarcodeFinder could automatically discover novel DNA barcodes with universal primers. It does three things as listed below.
-
Collect data
It can automatically retrieve data from NCBI Genbank with restrictions that user provided, such as gene name, taxonomy, sequence name and organelle. Also, it can integrate user provided sequences or alignments.
-
Preprocess data
Barcodefinder utilizes annotation information in data to divide sequence into fragments (gene, spacer, misc_feature), because data collected from Genbank may not be "uniform". For instance, you can find one gene's upstream and downstream sequences in one record but only gene sequence in another record. The situation becomes worse for intergenic spacers, that various annotation style may cause endless trouble in following analysis.
Given that one gene or spacer for each species may be sequenced several times, by default, BarcodeFinder removes redundant sequences to left only one record for each species. This behavior can be changed as you wish. Then, mafft was called for alignment. Each sequence's direction were adjusted and all sequences were reordered.
-
Analyze
Firstly, BarcodeFinder evaluate variance of each alignment by calculating Pi, Shannon Index, observed resolution, tree resolution and average terminal branch length, etc. If the result is lower than given threshold, i.e., it does not have efficient resolution, this alignment were skipped.
Next, a sliding-window scan will be performed for those alignments passed the test. The high-variance region (variance "hotspot") were picked and its upstream/downstream region were used to find primer.
Consensus sequence of those conserved region for finding primers were generated and with the help of primer3, candidate primers were selected. After BLAST validation, suitable primers were combined to form several primer pairs. According to the limit of PCR product's length, only pairs with wanted length were left. Note that gaps were removed to calculated real length instead of alignment length. The resolution of the sub-alignment were recalculated to remove false positive primer pairs.
Finally, primer pairs were reordered by score to make it easy for user to find "best" primer pairs they want.
Prerequisite
- Python3 (3.5 or above)
- BLAST+
- IQTREE
- MAFFT
- Biopython
- coloredlogs
- matplotlib
- numpy
- primer3-py
Project Information
The source code of BarcodeFinder is available under AGPLv3 license. For usage and details, please visit BarcodeFinder on GitHub.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
File details
Details for the file BarcodeFinder-0.9.35-py3-none-any.whl
.
File metadata
- Download URL: BarcodeFinder-0.9.35-py3-none-any.whl
- Upload date:
- Size: 43.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/41.0.0 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.7.3rc1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 84c44a075b9e98b70abf71e5e1785ea468cc1f618b13a4a863b8f4a04e181a07 |
|
MD5 | 16ad671aa0f772f93d2023661d615e42 |
|
BLAKE2b-256 | 3a7030f152cc2ecb80568469f8f3aebf87f22b3249e1f2f9e69f689852ca4a50 |