Code to work with Genbank files

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

genbank

Python code to work with Genbank files

This repo contains several classes to help work with Genbank files

The flow goes:

File -> Locus -> Feature

To use:

from genbank.file import File

file = File('infile.gbk')
for locus in file:
	print(name)
	for feature in locus:
		print(feature)

You can also build a Locus object from the ground up:

from genbank.locus import Locus
locus = Locus('test', 'actgactgatcgtagctagc')
# then add a feature by parsing text of a genbank feature
locus.read_feature('  CDS  1..10')
# or add manually by specifing the type,strand,location
locus.add_feature('CDS',+1,[['10','20']])
locus.write()

which gives:

LOCUS       test                      20 bp
FEATURES             Location/Qualifiers
     CDS             1..10
     CDS             10..20
ORIGIN
        1 actgactgat cgtagctagc
//

This package also allows you to perform various conversions on a given genome file:

$ genbank.py tests/phiX174.gbk -f tabular
'phiX174'	'CDS'	(('100', '627'),)	{'gene': "G"}
'phiX174'	'CDS'	(('636', '1622'),)	{'gene': "H"}
'phiX174'	'CDS'	(('1659', '3227'),)	{'gene': "A"}
'phiX174'	'CDS'	(('2780', '3142'),)	{'gene': "B"}
'phiX174'	'CDS'	(('3142', '3312'),)	{'gene': "K"}

$ genbank.py tests/phiX174.gbk -f fasta
>phiX174
gtgtgaggttataacgccgaagcggtaaaaattttaatttttgccgctgagggg
ttgaccaagcgaagcgcggtaggttttctgcttaggagtttaatcatgtttcag

$ genbank.py tests/phiX174.gbk -f fna
>phiX174_CDS_[100..627] [gene="G"]
atgtttcagacttttatttctcgccataattcaaactttttttctgataag
>phiX174_CDS_[636..1622] [gene="H"]
atgtttggtgctattgctggcggtattgcttctgctcttgctggtggcgcc
>phiX174_CDS_[1659..3227]

$ genbank.py tests/phiX174.gbk -f faa
>phiX174_CDS_[100..627] [gene="G"]
MFQTFISRHNSNFFSDKLVLTSVTPASSAPVLQTPKATSSTLYFDSLTVNA
>phiX174_CDS_[636..1622] [gene="H"]
MFGAIAGGIASALAGGAMSKLFGGGQKAASGGIQGDVLATDNNTVGMGDAG
>phiX174_CDS_[1659..3227] [gene="A"]

$ genbank.py tests/phiX174.gbk -f coverage
phiX174	0.965

Print out the features of the given key:tag

$ genbank.py tests/phiX174.gbk -k CDS:gene > labels.tsv

Change the H of the second gene to something more informative: (ideally you will have columns from other sources, like excel)

perl -pi -e 's/H/Minor spike/' labels.tsv

Now edit all the features of the given key:tag with the updated labels:

$ genbank.py tests/phiX174.gbk -e CDS:gene < labels.tsv | head
LOCUS       phiX174                 5386 bp    DNA      PHG
FEATURES             Location/Qualifiers
     source          1..5386
     rep_origin      13..56
     CDS             100..627
                     /gene="G"
     CDS             636..1622
                     /gene="Minor spike"
     CDS             1659..3227

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

0.110

Oct 9, 2023

0.109

Oct 9, 2023

0.105

Aug 31, 2023

0.103

Aug 31, 2023

0.101

Aug 30, 2023

0.100

Aug 29, 2023

0.96

Jun 3, 2023

0.81

May 12, 2023

0.75

Apr 11, 2023

0.69

Feb 8, 2023

This version

0.61

Jan 20, 2023

0.60

Jan 20, 2023

0.55

Dec 27, 2022

0.53

Dec 27, 2022

0.39

Nov 17, 2022

0.38

Nov 16, 2022

0.37

Nov 16, 2022

0.33

Nov 9, 2022

0.22

Aug 10, 2022

0.12

Jul 7, 2022

0.6

Apr 7, 2022

0.5

Apr 7, 2022

0.3

Mar 10, 2022

0.2

Mar 9, 2022

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

genbank-0.61.tar.gz (26.0 kB view hashes)

Uploaded Jan 20, 2023 Source

Built Distribution

genbank-0.61-py3-none-any.whl (25.9 kB view hashes)

Uploaded Jan 20, 2023 Python 3

Hashes for genbank-0.61.tar.gz

Hashes for genbank-0.61.tar.gz
Algorithm	Hash digest
SHA256	`b40bb5195d16829e01a9a4af20a1a8e6a3a924c6f1d1054aed5597f0f9f0e5b6`
MD5	`6510c0517d321fbf9cfab5ab110f5bb2`
BLAKE2b-256	`972b11c01a24840aa04e0a8b59967c5a8cd527cec928e70d5bce6ae85b32ce7e`

Hashes for genbank-0.61-py3-none-any.whl

Hashes for genbank-0.61-py3-none-any.whl
Algorithm	Hash digest
SHA256	`6f9efd86ad526129b6e779ea17c6305a2419a1ed196e2b44da6e6e135758ba6d`
MD5	`03b29c15b825c519eac37a3664fe78fb`
BLAKE2b-256	`2bbefa87daf08dfa1197a2baedb04a7b8f87bba191d5eb37f49a79bacebcbfeb`