Skip to main content

Extract any CDS or rNRA or tRNA DNA sequences of genes from Genbank file.

Project description

gbseqextractor

1 Introduction

gbseqextractor is a tool to extract any CDS or rNRA or tRNA DNA sequences of genes from Genbank file. with Biopython (http://www.biopython.org/)

2 Installation

pip install gbseqextractor

There will be a command gbseqextractor created under the same directory as your pip command.

3 Usage

$ gbseqextractor
usage: gbseqextractor.py [-h] -f <STR> [-prefix <STR>] [-seqPrefix <STR>]
                         [-types {CDS,rRNA,tRNA,wholeseq} [{CDS,rRNA,tRNA,wholeseq} ...]]
                         [-gi] [-p] [-t] [-s] [-l] [-rv] [-F]

extract any CDS or rNRA or tRNA DNA sequences of genes from Genbank file.
Note: the position on ID line is 0 left-most! Seqid will be the value of
'/gene=' or '/product=', if they both were not present, the gene will not be
output!

optional arguments:
  -h, --help            show this help message and exit
  -f <STR>              Genbank file
  -prefix <STR>         prefix of output file.
  -seqPrefix <STR>      prefix of each seq id. default: None
  -types {CDS,rRNA,tRNA,wholeseq} [{CDS,rRNA,tRNA,wholeseq} ...]
                        what kind of genes you want to extract? wholeseq for
                        whole fasta seq.[CDS]
  -gi                   use gi number as sequence ID instead of accession
                        number when gi number is present. (default: accession
                        number)
  -p                    output the position information on the ID line [False]
  -t                    output the taxonomy lineage on ID line [False]
  -s                    output the species name on the ID line [False]
  -l                    output the seq length on the ID line [False]
  -rv                   reverse and complement the sequences if the gene is on
                        minus strand [False]
  -F                    only output full length genes [False]

Author

Guanliang MENG

Citation

This script is part of the package MitoZ, when you use the script in your work, please cite:

MitoZ: A toolkit for mitochondrial genome assembly, annotation and visualization with NGS data. Guangliang Meng, Yiyuan Li, Chentao Yang, Shanlin Liu (in manuscript)

Meanwhile, since gbseqextractor makes use of Biopython, you should alos cite it if you use gbseqextractor in your work:

Peter J. A. Cock, Tiago Antao, Jeffrey T. Chang, Brad A. Chapman, Cymon J. Cox, Andrew Dalke, Iddo Friedberg, Thomas Hamelryck, Frank Kauff, Bartek Wilczynski, Michiel J. L. de Hoon: “Biopython: freely available Python tools for computational molecular biology and bioinformatics”. Bioinformatics 25 (11), 1422–1423 (2009). https://doi.org/10.1093/bioinformatics/btp163

Please go to http://www.biopython.org/ for more details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gbseqextractor-0.0.1.tar.gz (18.0 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page