Skip to main content

Code to work with Genbank files

Project description

genbank

Python code to work with Genbank files

This repo contains several classes to help work with Genbank files

The flow goes:

File -> Locus -> Feature

To use:

from genbank.file import File

file = File('infile.gbk')
for locus in file:
	print(name)
	for feature in locus:
		print(feature)

You can also build a Locus object from the ground up:

from genbank.locus import Locus
locus = Locus('test', 'actgactgatcgtagctagc')
# then add a feature by parsing text of a genbank feature
locus.read_feature('  CDS  1..10')
# or add manually by specifing the type,strand,location
locus.add_feature('CDS',+1,[['10','20']])
locus.write()

which gives:

LOCUS       test                      20 bp
FEATURES             Location/Qualifiers
     CDS             1..10
     CDS             10..20
ORIGIN
        1 actgactgat cgtagctagc
//

This package also allows you to perform various conversions on a given genome file:

$ genbank.py tests/phiX174.gbk -f tabular
'phiX174'	'CDS'	(('100', '627'),)	{'gene': "G"}
'phiX174'	'CDS'	(('636', '1622'),)	{'gene': "H"}
'phiX174'	'CDS'	(('1659', '3227'),)	{'gene': "A"}
'phiX174'	'CDS'	(('2780', '3142'),)	{'gene': "B"}
'phiX174'	'CDS'	(('3142', '3312'),)	{'gene': "K"}

$ genbank.py tests/phiX174.gbk -f fasta
>phiX174
gtgtgaggttataacgccgaagcggtaaaaattttaatttttgccgctgagggg
ttgaccaagcgaagcgcggtaggttttctgcttaggagtttaatcatgtttcag

$ genbank.py tests/phiX174.gbk -f fna
>phiX174_CDS_[100..627] [gene="G"]
atgtttcagacttttatttctcgccataattcaaactttttttctgataag
>phiX174_CDS_[636..1622] [gene="H"]
atgtttggtgctattgctggcggtattgcttctgctcttgctggtggcgcc
>phiX174_CDS_[1659..3227]

$ genbank.py tests/phiX174.gbk -f faa
>phiX174_CDS_[100..627] [gene="G"]
MFQTFISRHNSNFFSDKLVLTSVTPASSAPVLQTPKATSSTLYFDSLTVNA
>phiX174_CDS_[636..1622] [gene="H"]
MFGAIAGGIASALAGGAMSKLFGGGQKAASGGIQGDVLATDNNTVGMGDAG
>phiX174_CDS_[1659..3227] [gene="A"]

$ genbank.py tests/phiX174.gbk -f coverage
phiX174	0.965

Print out the features of the given key:tag

$ genbank.py tests/phiX174.gbk -k CDS:gene > labels.tsv

Change the H of the second gene to something more informative: (ideally you will have columns from other sources, like excel)

perl -pi -e 's/H/Minor spike/' labels.tsv

Now edit all the features of the given key:tag with the updated labels:

$ genbank.py tests/phiX174.gbk -e CDS:gene < labels.tsv | head
LOCUS       phiX174                 5386 bp    DNA      PHG
FEATURES             Location/Qualifiers
     source          1..5386
     rep_origin      13..56
     CDS             100..627
                     /gene="G"
     CDS             636..1622
                     /gene="Minor spike"
     CDS             1659..3227

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

genbank-0.101.tar.gz (28.4 kB view details)

Uploaded Source

Built Distribution

genbank-0.101-py3-none-any.whl (28.5 kB view details)

Uploaded Python 3

File details

Details for the file genbank-0.101.tar.gz.

File metadata

  • Download URL: genbank-0.101.tar.gz
  • Upload date:
  • Size: 28.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.17

File hashes

Hashes for genbank-0.101.tar.gz
Algorithm Hash digest
SHA256 9204b91c10c01ac6f9bdd9c1e5cf5f8944e3b75be4ddbe596c16bda0e6e880bd
MD5 2b589b129582cb7ad25dbb14765ab95b
BLAKE2b-256 0189c70d0d7e3be2461957d89ef049429000c3245321d2e347d31263e20bee29

See more details on using hashes here.

File details

Details for the file genbank-0.101-py3-none-any.whl.

File metadata

  • Download URL: genbank-0.101-py3-none-any.whl
  • Upload date:
  • Size: 28.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.17

File hashes

Hashes for genbank-0.101-py3-none-any.whl
Algorithm Hash digest
SHA256 b8a5afb9706b17d656a37e54d35b41242fe04b0d784c3d4ed303a6f2326d54ba
MD5 0e22cc4572cefc3e6d88df11a7b40bcb
BLAKE2b-256 4616843a1585493062053041a1b52d03f0a72a157bd65f68030d2b5a84f9238d

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page