Skip to main content

A tool for doing enrichment tests of functional groupings of genes across genomes and lineages.

Project description

Keggm - a small suite of tools I use to analyses microbial genomes

This is currently a very barebones package. Only two functions are fully operational.

enrichm looks for metabolic blocks which are enriched in your genomes compared to some background. This is good to try and get a quick idea on how your genomes are different to the background set in terms of certain metabolic functions. You can specify you own blocks and use custom protein names which make it quite extensible.

completm is a small tool to aid exploring what functions your genome can perform. It creates a “completeness” matrix which gives you an idea if your genome shows the potential to perform that metabolic block. It also create a matrix with the protein names which contributed to that completeness which can help you check if your metabolic block was complete due to proteins which are normally poorly annotated and, if there are complementary proteins, which ones were present. This will be expanded to provide a list of “complete” modules for each organism based on some user threshold.

In the works

plots aims to create some small visualisations to better parse the completeness results. It consists of heatmaps, to quickly scan across genomes, and will later included arrows diagrams of each metabolic block where each arrow represents a protein. The arrows will be coloured based on the organisms which had the relevant proteins.

overlap tries to identify is organisms have the potential to supplement each other. It does this by looking for metabolic blocks which are complete in one organisms but the rests of the metabolic block can be found in another organism.

TODO:

  1. Make a better test suite

  2. Make it usable on the command line (for convenience)

  3. Implement auxilliary non-enrichment features into main software
    1. Plots with options to compare completeness

    2. Visualisation of overlap within module across multiple genomes similar to Symbiodinium+coral paper

    3. Some kind of colourisation of KEGG pathways to give you a broad idea of what’s present within a pathway.

  4. Eventually, if I can, make the network stuff robust enough for the potential automated discovery of novel metabolism

  5. Add more customisability - Make it that any user can essentially create there own extra kegg data for use in this software
    1. requires a few auxilliary tools to augment permanent databases

  6. Unify Database scraping and production to be a single command


Other todo:

  1. Make it multithreaded/multiprocessor at the comparison stage (current scale of comparisons poses no speed issue)

    Implement in terms of producure consumer model of multithreading

  2. Investigate optimised Booschloo’s test and if it has reasonable runtime (unlikely)

  3. Implement a logging system for better debugging - will make my life easier.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

keggm-0.0.1.zip (35.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

keggm-0.0.1-py2.py3-none-any.whl (44.0 kB view details)

Uploaded Python 2Python 3

File details

Details for the file keggm-0.0.1.zip.

File metadata

  • Download URL: keggm-0.0.1.zip
  • Upload date:
  • Size: 35.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for keggm-0.0.1.zip
Algorithm Hash digest
SHA256 0994ba442b4b741f24faa3b1e7cf160f23a9c2f138de989d7ea926ffdcbec039
MD5 8f50124641b6e2584cd138170b8a587e
BLAKE2b-256 d9ba240ce889185407d6780e30560e902d67d594be1d831293a2facf221389f3

See more details on using hashes here.

File details

Details for the file keggm-0.0.1-py2.py3-none-any.whl.

File metadata

File hashes

Hashes for keggm-0.0.1-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 c0ab873424388839fb13732d970e307011be33b853ad59dbdf9d64fc73ee7395
MD5 b1588fe5c0b8cea8bf14ca01ae23e347
BLAKE2b-256 733f7a3c235b16c69e63a75382bca12ed5a7f96e11a5dc1d13ff6bf072c004af

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page