Extract any CDS or rNRA or tRNA DNA sequences of genes from Genbank file.
Project description
gbseqextractor
updates
version 0.0.5:
Merge pull request #4 from gopalpeddinti/patch-1. This was needed to fix the BioPython deprecation warning.
version 20201128:
1. Now we can handle compounlocation (feature location with "join")!
2. We can also output the translation for each CDS.
1 Introduction
gbseqextractor
is a tool to extract any CDS or rNRA or tRNA DNA sequences of genes from Genbank file. with Biopython
(http://www.biopython.org/)
2 Installation
pip install gbseqextractor
There will be a command gbseqextractor
created under the same directory as your pip
command.
3 Usage
$ gbseqextractor
usage: gbseqextractor.py [-h] -f <STR> -prefix <STR> [-seqPrefix <STR>]
[-types {CDS,rRNA,tRNA,wholeseq,gene} [{CDS,rRNA,tRNA,wholeseq,gene} ...]] [-cds_translation]
[-gi] [-p] [-t] [-s] [-l] [-rv] [-F]
Extract any CDS or rNRA or tRNA DNA sequences of genes from Genbank file.
Seqid will be the value of '/gene=' or '/product=', if they both were not
present, the gene will not be output!
version 20201128:
Now we can handle compounlocation (feature location with "join")!
We can also output the translation for each CDS (retrived from '/translation=')
Please cite:
Guanliang Meng, Yiyuan Li, Chentao Yang, Shanlin Liu,
MitoZ: a toolkit for animal mitochondrial genome assembly, annotation
and visualization, Nucleic Acids Research, https://doi.org/10.1093/nar/gkz173
optional arguments:
-h, --help show this help message and exit
-f <STR> Genbank file
-prefix <STR> prefix of output file. required.
-seqPrefix <STR> prefix of each seq id. default: None
-types {CDS,rRNA,tRNA,wholeseq,gene} [{CDS,rRNA,tRNA,wholeseq,gene} ...]
what kind of genes you want to extract? wholeseq for whole fasta seq. WARNING: Each sequence in the
result files corresponds to ONE feature in the GenBank file, I will NOT combine multiple CDS of the
same gene into ONE! [CDS]
-cds_translation Also output translated CDS (required -types CDS). The translations are retrived directly from the
'/translation=' key word. [False]
-gi use gi number as sequence ID instead of accession number when " gi number is present. (default:
accession number)
-p output the position information on the ID line. Warning: the position on ID line is 0 left-most!
[False]
-t output the taxonomy lineage on ID line [False]
-s output the species name on the ID line [False]
-l output the seq length on the ID line [False]
-rv reverse and complement the sequences if the gene is on minus strand. Always True!
-F only output full length genes,i.e., exclude the genes with '>' or '<' in their location [False]
Author
Guanliang MENG
Citation
This script is part of the package MitoZ
, when you use the script in your work, please cite:
Guanliang Meng, Yiyuan Li, Chentao Yang, Shanlin Liu,
MitoZ: a toolkit for animal mitochondrial genome assembly, annotation and visualization, Nucleic Acids Research, https://doi.org/10.1093/nar/gkz173
Meanwhile, since gbseqextractor
makes use of Biopython
, you should alos cite it if you use gbseqextractor
in your work:
Peter J. A. Cock, Tiago Antao, Jeffrey T. Chang, Brad A. Chapman, Cymon J. Cox, Andrew Dalke, Iddo Friedberg, Thomas Hamelryck, Frank Kauff, Bartek Wilczynski, Michiel J. L. de Hoon: “Biopython: freely available Python tools for computational molecular biology and bioinformatics”. Bioinformatics 25 (11), 1422–1423 (2009). https://doi.org/10.1093/bioinformatics/btp163
Please go to http://www.biopython.org/
for more details.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file gbseqextractor-0.0.5.tar.gz
.
File metadata
- Download URL: gbseqextractor-0.0.5.tar.gz
- Upload date:
- Size: 17.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.16
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 74ac946aaf567b7685a752dcd2b8ab5e029263f75594ee846f6bcbadf01f17c1 |
|
MD5 | d459c63f2f37b872b6527c4dd8a72308 |
|
BLAKE2b-256 | 06b3f85dffa22dc44f435edd8ff14d420f8bc96662aeb7458a791efd74675fb3 |