To get taxa information of sequences from BOLD system
Project description
bold_identification
1 Introduction
see https://github.com/linzhi2013/bold_identification
.
This is a package which can get the taxonomy information of sequences from BOLD http://www.boldsystems.org/index.php.
To get the taxonomy information of a sequence from BOLD, what we usually do is: (1) open the website http://www.boldsystems.org/index.php/IDS_OpenIdEngine
with a browser; (2) Choose a database; (3) input the sequence (4) click submit
and wait for the result. (5) Copy the taxonomy information from the result page.
bold_identification
actually does the same things as above, but it does such a thing automatically for you, and makes life easier.
Currently, bold_identification
only runs on Mac OS, Windows 64bit, Linux.
But be ware, only the Chrome browser works on Windows, while FireFox doesn't.
2 Installation
pip3 install bold_identification
There will be a command bold_identification
created under the same directory as your pip3
command.
3 Usage
run bold_identification
usage: bold_identification [-h] -i <str> [-f <str>] -o <str>
[-d {COX1,COX1_SPECIES,COX1_SPECIES_PUBLIC,COX1_L640bp,ITS,Plant}]
[-n <int>] [-b {Firefox,Chrome}]
[-t <int>] [-r <int>] [-c] [-debug]
To identify taxa of given sequences from BOLD (http://www.boldsystems.org/).
Some sequences can fail to get taxon information, which can be caused by
TimeoutException if your network to the BOLD server is bad.
Those sequences will be output in the file '*.TimeoutException.fasta'.
You can:
1) run another searching with the same command directly (-c option);
2) change the browser (-b option);
3) lengthen the time to wait for each query (-t option);
4) increase submission times (-r option) for a sequence.
Also, the sequences without BOLD matches will be output in the
file *.NoBoldMatchError.fasta
By mengguanliang@genomics.cn.
See https://github.com/linzhi2013/bold_identification.
optional arguments:
-h, --help show this help message and exit
-i <str> input file name
-f <str> input file format [fasta]
-o <str> outfile
-d {COX1,COX1_SPECIES,COX1_SPECIES_PUBLIC,COX1_L640bp,ITS,Plant}
database to search [COX1]
-n <int> how many first top hits will be output. [1]
-b {Firefox,Chrome} browser to be used [Chrome]
-t <int> the time to wait for a query [60]
-r <int> Maximum submission time for a sequence, useful for
handling TimeOutException. [4]
-c continuous mode, jump over the ones already in "-o"
file, will resubmit all the remained. [False]
-debug debug mode output [False]
4 Problems
Cannot download the browsers
This can happen when your network is not good.
Solution:
Download the executable driver file manaully, then extract the executable and put it on the drivers
directory. See more details output by bold_identification
when you run into this problem.
Browser doen't work
Sometimes it happens to me. And I don't know why. I guess it is because the browser driver is not so stable.
Solution:
Try another browser with the -b
option.
5 Citation
When you use bold_identification
in your study, please cite:
Guanliang MENG, Chengran ZHOU, et al., Shanlin LIU, Shaoying LIU. Mitogenome and nuclear gene datasets of small mammals on Qinghai-Tibetan Plateau.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file bold_identification-0.0.16.tar.gz
.
File metadata
- Download URL: bold_identification-0.0.16.tar.gz
- Upload date:
- Size: 25.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.18.4 setuptools/39.2.0 requests-toolbelt/0.8.0 tqdm/4.23.4 CPython/3.6.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8b8f55f3e7288d1c98e3e9ce8340f57bb8316bbb45b86df30f19920dd4059588 |
|
MD5 | f66de1a0130f1debb9af0bbafd31a1b4 |
|
BLAKE2b-256 | aa1143c0611784730fbea28c383ed469d03136114e573c65e3f60859fe116671 |