To check for the internal stop codon in Genbank or FASTA file (CDS), then substitute the internal stop codon with NNN.
Project description
polish_genbank
1 Introduction
see https://github.com/linzhi2013/polish_genbank
.
This package is to check for the internal stop codon in Genbank or FASTA file (CDS), then substitute the internal stop codon with NNN.
2 Installation
pip3 install polish_genbank
There will be a command polish_genbank
created under the same directory as your pip3
command.
3 Usage
run polish_genbank
usage: polish_genbank.py [-h] --in <file> [--format {gb,fa}] [--table <int>]
[--ntNs <str>] [--aaNs <str>] --out <file>
Check for the internal stop codon, then substitute the internal stop codon
with NNN. By mengguanliang [] genomics.cn, where [] == @. See
https://github.com/linzhi2013/polish_genbank
optional arguments:
-h, --help show this help message and exit
--in <file> input genbank file or CDS file (fasta format)
--format {gb,fa} the input file format. For fasta file, all sequences are
assumed to be forward strand, coding from +1 position [gb]
--table <int> The genetic code table used for translation, for fasta
input only [2]
--ntNs <str> the chars used for substituting an internal stop codon in
CDS sequence. [NNN]
--aaNs <str> the chars used for substituting an internal stop codon in
protein sequence. [X]
--out <file> output filename
4 Used in scripts
In [1]: from polish_genbank import polish_gb, polish_fasta
In [2]: polish_gb?
Signature: polish_gb(ingb=None, NewInternalStopCodonNT='NNN', NewInternalStopCodonAA='X', logger=None)
Docstring:
Replace the internal stop codon with NNNs on Genbank nt sequence,
and replace the '*' in 'translation' tag (protein sequence) with 'X'
Return:
An generator.
Usage:
>>> records = polish_gb(ingb='in.gb', NewInternalStopCodonNT='NNN',
NewInternalStopCodonAA='X')
>>> for rec in records:
>>> print(rec.id, rec.seq)
In [3]: polish_fasta?
Signature: polish_fasta(infasta=None, NewInternalStopCodonNT='NNN', table=2, logger=None)
Docstring:
Replace the internal stop codon with NNNs.
The infasta file is assumed to be CDS sequences, and coding from +1
position.
Return:
An generator.
Usage:
>>> records = polish_fasta(infasta='myfile', NewInternalStopCodonNT='NNN', table=2)
>>> for rec in records:
>>> print(rec.id, rec.seq)
5 Citation
Currently I have no plan to publish polish_genbank
.
However, since polish_genbank
makes use of Biopython
, you should also cite it if you use breakSeqInNs_then_translate
in your work:
Peter J. A. Cock, Tiago Antao, Jeffrey T. Chang, Brad A. Chapman, Cymon J. Cox, Andrew Dalke, Iddo Friedberg, Thomas Hamelryck, Frank Kauff, Bartek Wilczynski, Michiel J. L. de Hoon: “Biopython: freely available Python tools for computational molecular biology and bioinformatics”. Bioinformatics 25 (11), 1422–1423 (2009). https://doi.org/10.1093/bioinformatics/btp163
Please go to http://www.biopython.org/
for more details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file polish_genbank-0.0.2.tar.gz
.
File metadata
- Download URL: polish_genbank-0.0.2.tar.gz
- Upload date:
- Size: 18.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.18.4 setuptools/39.2.0 requests-toolbelt/0.8.0 tqdm/4.23.4 CPython/3.6.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5b6b019867690d7b7bb7b9b1fc30f7c577de852651f9128df827fccb24d630aa |
|
MD5 | b4d3b9bdc4efe12f4ce90f24d8df7e6f |
|
BLAKE2b-256 | 01811a6d94ebf6c612af13712f2c7c567d953e7b47722d1c8dc2052a1029ed7e |