Skip to main content

To check for the internal stop codon in Genbank or FASTA file (CDS), then substitute the internal stop codon with NNN.

Project description

polish_genbank

1 Introduction

see https://github.com/linzhi2013/polish_genbank.

This package is to check for the internal stop codon in Genbank or FASTA file (CDS), then substitute the internal stop codon with NNN.

2 Installation

pip3 install polish_genbank

There will be a command polish_genbank created under the same directory as your pip3 command.

3 Usage

run polish_genbank

usage: polish_genbank.py [-h] --in <file> [--format {gb,fa}] [--table <int>]
                         [--ntNs <str>] [--aaNs <str>] --out <file>

Check for the internal stop codon, then substitute the internal stop codon
with NNN. By mengguanliang [] genomics.cn, where [] == @. See
https://github.com/linzhi2013/polish_genbank

optional arguments:
  -h, --help        show this help message and exit
  --in <file>       input genbank file or CDS file (fasta format)
  --format {gb,fa}  the input file format. For fasta file, all sequences are
                    assumed to be forward strand, coding from +1 position [gb]
  --table <int>     The genetic code table used for translation, for fasta
                    input only [2]
  --ntNs <str>      the chars used for substituting an internal stop codon in
                    CDS sequence. [NNN]
  --aaNs <str>      the chars used for substituting an internal stop codon in
                    protein sequence. [X]
  --out <file>      output filename

4 Used in scripts

In [1]: from polish_genbank import polish_gb, polish_fasta

In [2]: polish_gb?
Signature: polish_gb(ingb=None, NewInternalStopCodonNT='NNN', NewInternalStopCodonAA='X', logger=None)
Docstring:
Replace the internal stop codon with NNNs on Genbank nt sequence,
and replace the '*' in 'translation' tag (protein sequence) with 'X'

Return:
    An generator.

Usage:

>>> records = polish_gb(ingb='in.gb', NewInternalStopCodonNT='NNN',
        NewInternalStopCodonAA='X')
>>> for rec in records:
>>>     print(rec.id, rec.seq)


In [3]: polish_fasta?
Signature: polish_fasta(infasta=None, NewInternalStopCodonNT='NNN', table=2, logger=None)
Docstring:
Replace the internal stop codon with NNNs.

The infasta file is assumed to be CDS sequences, and coding from +1
position.

Return:
    An generator.

Usage:

>>> records = polish_fasta(infasta='myfile', NewInternalStopCodonNT='NNN', table=2)
>>> for rec in records:
>>>     print(rec.id, rec.seq)

5 Citation

Currently I have no plan to publish polish_genbank.

However, since polish_genbank makes use of Biopython, you should also cite it if you use breakSeqInNs_then_translate in your work:

Peter J. A. Cock, Tiago Antao, Jeffrey T. Chang, Brad A. Chapman, Cymon J. Cox, Andrew Dalke, Iddo Friedberg, Thomas Hamelryck, Frank Kauff, Bartek Wilczynski, Michiel J. L. de Hoon: “Biopython: freely available Python tools for computational molecular biology and bioinformatics”. Bioinformatics 25 (11), 1422–1423 (2009). https://doi.org/10.1093/bioinformatics/btp163

Please go to http://www.biopython.org/ for more details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

polish_genbank-0.0.2.tar.gz (18.7 kB view details)

Uploaded Source

File details

Details for the file polish_genbank-0.0.2.tar.gz.

File metadata

  • Download URL: polish_genbank-0.0.2.tar.gz
  • Upload date:
  • Size: 18.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.18.4 setuptools/39.2.0 requests-toolbelt/0.8.0 tqdm/4.23.4 CPython/3.6.6

File hashes

Hashes for polish_genbank-0.0.2.tar.gz
Algorithm Hash digest
SHA256 5b6b019867690d7b7bb7b9b1fc30f7c577de852651f9128df827fccb24d630aa
MD5 b4d3b9bdc4efe12f4ce90f24d8df7e6f
BLAKE2b-256 01811a6d94ebf6c612af13712f2c7c567d953e7b47722d1c8dc2052a1029ed7e

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page