Code to work with Genbank files
Project description
genbank
Python code to work with Genbank files
This repo contains several classes to help work with Genbank files
The flow goes:
File -> Locus -> Feature
To use:
from genbank.file import File
file = File('infile.gbk')
for locus in file:
print(name)
for feature in locus:
print(feature)
You can also build a Locus object from the ground up:
from genbank.locus import Locus
locus = Locus('test', 'actgactgatcgtagctagc')
# then add a feature by parsing text of a genbank feature
locus.read_feature(' CDS 1..10')
# or add manually by specifing the type,strand,location
locus.add_feature('CDS',+1,[['10','20']])
locus.write()
which gives:
LOCUS test 20 bp
FEATURES Location/Qualifiers
CDS 1..10
CDS 10..20
ORIGIN
1 actgactgat cgtagctagc
//
This package also allows you to perform various conversions on a given genome file:
$ genbank.py tests/phiX174.gbk -f tabular
'phiX174' 'CDS' (('100', '627'),) {'gene': "G"}
'phiX174' 'CDS' (('636', '1622'),) {'gene': "H"}
'phiX174' 'CDS' (('1659', '3227'),) {'gene': "A"}
'phiX174' 'CDS' (('2780', '3142'),) {'gene': "B"}
'phiX174' 'CDS' (('3142', '3312'),) {'gene': "K"}
$ genbank.py tests/phiX174.gbk -f fasta
>phiX174
gtgtgaggttataacgccgaagcggtaaaaattttaatttttgccgctgagggg
ttgaccaagcgaagcgcggtaggttttctgcttaggagtttaatcatgtttcag
$ genbank.py tests/phiX174.gbk -f fna
>phiX174_CDS_[100..627] [gene="G"]
atgtttcagacttttatttctcgccataattcaaactttttttctgataag
>phiX174_CDS_[636..1622] [gene="H"]
atgtttggtgctattgctggcggtattgcttctgctcttgctggtggcgcc
>phiX174_CDS_[1659..3227]
$ genbank.py tests/phiX174.gbk -f faa
>phiX174_CDS_[100..627] [gene="G"]
MFQTFISRHNSNFFSDKLVLTSVTPASSAPVLQTPKATSSTLYFDSLTVNA
>phiX174_CDS_[636..1622] [gene="H"]
MFGAIAGGIASALAGGAMSKLFGGGQKAASGGIQGDVLATDNNTVGMGDAG
>phiX174_CDS_[1659..3227] [gene="A"]
$ genbank.py tests/phiX174.gbk -f coverage
phiX174 0.965
Print out the features of the given key:tag
$ genbank.py tests/phiX174.gbk -k CDS:gene > labels.tsv
Change the H of the second gene to something more informative: (ideally you will have columns from other sources, like excel)
perl -pi -e 's/H/Minor spike/' labels.tsv
Now edit all the features of the given key:tag with the updated labels:
$ genbank.py tests/phiX174.gbk -e CDS:gene < labels.tsv | head
LOCUS phiX174 5386 bp DNA PHG
FEATURES Location/Qualifiers
source 1..5386
rep_origin 13..56
CDS 100..627
/gene="G"
CDS 636..1622
/gene="Minor spike"
CDS 1659..3227
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file genbank-0.105.tar.gz
.
File metadata
- Download URL: genbank-0.105.tar.gz
- Upload date:
- Size: 28.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.17
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5174c7d59946020e774b10ccf0d9c0f4caf006cd4fa9bdee40069e0b0cbba898 |
|
MD5 | 6f22d855eb92838290bd220f2b8e3b2b |
|
BLAKE2b-256 | bee99c336c7768a6a5db8d040f8a02b3837b0948b3d4c1e65324cdf75a409d70 |
File details
Details for the file genbank-0.105-py3-none-any.whl
.
File metadata
- Download URL: genbank-0.105-py3-none-any.whl
- Upload date:
- Size: 28.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.17
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 698ba295d8e1f6232347a3cd093e5b62bf86c5dcc279ff4b07d86ccd344535f9 |
|
MD5 | 773a08554661b1d9aaf256502725de26 |
|
BLAKE2b-256 | fffe603827cc17af435693f9c939d4db019c4a8293f0898007931753046b55c1 |