Skip to main content

A tool to annotate microbial genomes

Project description

Introduction

Genotate is a tool to annotate prokaryotic and phage genomes. It uses scrolling amino-acid windows in all six frames to distinguish between windows that belong to protein coding gene regions and those that belong to noncoding regions, in order to determine the coding frame at every position along the genome.

  • Unlike every other currently available gene caller, Genotate does not rely on start and stop codons in order to predict coding genes

To install Genotate,

 pip install genotate

And to run Genotate you only need to specify the FASTA formatted genome file The command to run using the phage models on the provided phiX174 genome is:

 genotate.py test/phiX174.fasta -o predictions.gb

The command to run using the partially trained bacterial/archaeal models needs the --bacterial flag. Instead of a FASTA formatted file, you can provide a Genbank formatted file and Genotate will use only the genomic sequence.

 genotate.py test/mycoplasma.gbff.gz -o predictions.gb --bacteria

It is recommended to use a GPU to run Genotate since it will take a long time to run prokaryotic genomes. Genotate will automatically try to run on GPU, if one isn't found it will run on a CPU.


† The output of Genotate are 'coding region' predictions in GenBank format. They should match with the true coding gene regions, but are not genes per say, since they are not based on start and stop codons. Though they have all been trimmed to a stop codon after Genotate determines which transation table the genome uses (i.e. whether it performs stop codon readthrough).

There are three main phases to the Genotate workflow

  1. window classification
  2. change-point detection
  3. refinement
    • analyze stop codons
    • merge adjacent regions
    • split regions on stop
    • adjust ends to a stop

Genotate determines the translation table by analyzing the initial coding gene region predictions. There are two outcomes for a stop codon that is readthrough: either the stop codon appears in the middle of a coding gene region or the region is broken into two pieces at the stop codon. If one of the three known stop codons is significantly over represented in the middle AND between predicted gene regions, that stop codon can be assumed to be read through. With the stop codon usage now known, adjacent coding regions that are in the same frame are merged if there is not a stop codon between them. Then the regions are split on any internal stop codons and the ends adjusted to the nearest stop codon.

** The end opposite the stop codon is not adjusted to a valid start codon since Genotate does not (yet) have a translation initiation site detection method yet, so the beginning of a gene call may be off by a few codons

Currently the best way to visualize the predictions is in a Genome Viewer application, such as Artemis by Sanger. The example phiX174.gb GenBank file loaded into Artemis shows the gene layout:

The Genotate gene calls in the output predictions.gb file can then be loaded using the 'File>Read An Entry' menu, and the predictions will be overlaid as grey 'coding regions' in the gene layout window:

The fact that Genotate calls 9 out of the 10 known coding genes of phiX174, including the fully nested genes B and K, shows just how unrivaled Genotate is among currently available gene callers.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

genotate-0.17.tar.gz (57.5 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

genotate-0.17-cp39-cp39-macosx_12_0_x86_64.whl (57.5 MB view details)

Uploaded CPython 3.9macOS 12.0+ x86-64

File details

Details for the file genotate-0.17.tar.gz.

File metadata

  • Download URL: genotate-0.17.tar.gz
  • Upload date:
  • Size: 57.5 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.9.17

File hashes

Hashes for genotate-0.17.tar.gz
Algorithm Hash digest
SHA256 ca17086f581289a37d160aded7bafc70195c649494e147db6ad5fd593b92f469
MD5 2c819bbefaecdf46f68a37307698bae8
BLAKE2b-256 c7016e5f902e074bffd2bc5a8ab3da4fb1a59bf64642924fb7d3297599f36625

See more details on using hashes here.

File details

Details for the file genotate-0.17-cp39-cp39-macosx_12_0_x86_64.whl.

File metadata

File hashes

Hashes for genotate-0.17-cp39-cp39-macosx_12_0_x86_64.whl
Algorithm Hash digest
SHA256 5604cf8092bdb7b77fd7b3119aecf87655681d1134f0d621e83315a2ac8f04e0
MD5 3062cae34f9b98bf1e645c9d36c93252
BLAKE2b-256 c78603f9acefa849240c0221d18df264b8a7e73f458623b005ac73eff94c1072

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page