Skip to main content

Code to work with Genbank files

Project description

genbank

Python code to work with Genbank files

This repo contains several classes to help work with Genbank files

The flow goes:

File -> Locus -> Feature

To use:

from genbank.file import File

file = File('infile.gbk')
for locus in file:
	print(name)
	for feature in locus:
		print(feature)

You can also build a Locus object from the ground up:

from genbank.locus import Locus
locus = Locus('test', 'actgactgatcgtagctagc')
# then add a feature by parsing text of a genbank feature
locus.read_feature('  CDS  1..10')
# or add manually by specifing the type,strand,location
locus.add_feature('CDS',+1,[['10','20']])
locus.write()

which gives:

LOCUS       test                      20 bp
FEATURES             Location/Qualifiers
     CDS             1..10
     CDS             10..20
ORIGIN
        1 actgactgat cgtagctagc
//

This package also allows you to perform various conversions on a given genome file:

$ genbank.py tests/phiX174.gbk -f tabular
'phiX174'	'CDS'	(('100', '627'),)	{'gene': "G"}
'phiX174'	'CDS'	(('636', '1622'),)	{'gene': "H"}
'phiX174'	'CDS'	(('1659', '3227'),)	{'gene': "A"}
'phiX174'	'CDS'	(('2780', '3142'),)	{'gene': "B"}
'phiX174'	'CDS'	(('3142', '3312'),)	{'gene': "K"}

$ genbank.py tests/phiX174.gbk -f fasta
>phiX174
gtgtgaggttataacgccgaagcggtaaaaattttaatttttgccgctgagggg
ttgaccaagcgaagcgcggtaggttttctgcttaggagtttaatcatgtttcag

$ genbank.py tests/phiX174.gbk -f fna
>phiX174_CDS_[100..627] [gene="G"]
atgtttcagacttttatttctcgccataattcaaactttttttctgataag
>phiX174_CDS_[636..1622] [gene="H"]
atgtttggtgctattgctggcggtattgcttctgctcttgctggtggcgcc
>phiX174_CDS_[1659..3227]

$ genbank.py tests/phiX174.gbk -f faa
>phiX174_CDS_[100..627] [gene="G"]
MFQTFISRHNSNFFSDKLVLTSVTPASSAPVLQTPKATSSTLYFDSLTVNA
>phiX174_CDS_[636..1622] [gene="H"]
MFGAIAGGIASALAGGAMSKLFGGGQKAASGGIQGDVLATDNNTVGMGDAG
>phiX174_CDS_[1659..3227] [gene="A"]

$ genbank.py tests/phiX174.gbk -f coverage
phiX174	1.1168

You can also slice the locus to a specified range and only the nucleotides and features that occur within the slice are kept. The command to take the first two hundred bases of the phiX174 genome is shown below.

$ genbank.py tests/phiX174.gbk --slice 1..200
LOCUS       phiX174                  200 bp    DNA             PHG
DEFINITION  phiX174
FEATURES             Location/Qualifiers
     rep_origin      13..56
     CDS             100..>200
                     /gene="G"
                     /note="merged"
ORIGIN
        1 gtgtgaggtt ataacgccga agcggtaaaa attttaattt ttgccgctga ggggttgacc
       61 aagcgaagcg cggtaggttt tctgcttagg agtttaatca tgtttcagac ttttatttct
      121 cgccataatt caaacttttt ttctgataag ctggttctca cttctgttac tccagcttct
      181 tcggcacctg ttttacagac
//

The slice arguement can be paired with all the other output format options:

$ genbank.py tests/phiX174.gbk --slice 1..200 -f coverage
0.51

You can also easily edit features by running multiple commands.

Print out the features of a specified key:tag into a file

$ genbank.py tests/phiX174.gbk -k CDS:gene > labels.tsv

Change the H of the second gene to something more informative: (ideally you will have columns from other sources, like excel)

perl -pi -e 's/H/Minor spike/' labels.tsv

Edit all the features of a specified key:tag with the updated labels:

$ genbank.py tests/phiX174.gbk -e CDS:gene < labels.tsv | head
LOCUS       phiX174                 5386 bp    DNA      PHG
FEATURES             Location/Qualifiers
     source          1..5386
     rep_origin      13..56
     CDS             100..627
                     /gene="G"
     CDS             636..1622
                     /gene="Minor spike"
     CDS             1659..3227

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

genbank-0.110.tar.gz (29.7 kB view details)

Uploaded Source

Built Distribution

genbank-0.110-py3-none-any.whl (29.5 kB view details)

Uploaded Python 3

File details

Details for the file genbank-0.110.tar.gz.

File metadata

  • Download URL: genbank-0.110.tar.gz
  • Upload date:
  • Size: 29.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.17

File hashes

Hashes for genbank-0.110.tar.gz
Algorithm Hash digest
SHA256 f9a6ba243d62934e9e5ed283d8b475ab3dd9a3760ba329a054d5f6df54a8ec45
MD5 7a91c9e02a49b7d8ff7aab9367aa303c
BLAKE2b-256 e27d594c39487c8add11852211bea8cdccca90ceddd7c485ac92bdd622ff8b6b

See more details on using hashes here.

File details

Details for the file genbank-0.110-py3-none-any.whl.

File metadata

  • Download URL: genbank-0.110-py3-none-any.whl
  • Upload date:
  • Size: 29.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.17

File hashes

Hashes for genbank-0.110-py3-none-any.whl
Algorithm Hash digest
SHA256 7e102dbb12c900906f92cb8fb2a27f27de43f8b936f6073c0205e35a4ddd1a61
MD5 c8e4851794a26641f3e62fa6ea484462
BLAKE2b-256 75a0b2d749bacb9a3eb4a1005553bb7bd839b9fb6d63abfbbff2d3fcba5a580f

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page