Skip to main content

The tool counts completeness of each KEGG pathway for protein sequences.

Project description

kegg-pathways-completeness tool

The tool counts completeness of each KEGG modules pathway for protein sequence.

Please read Theory section with detailed explanation in the bottom of README.

Required files:

This repository has a set of required files pre-generated. Current version of data was saved into pathways_data and graphs were saved into pkl format.

About graphs: In order to generate graphs all pathways were parsed with networkx library. Every graph is presented in .png format in png and .dot format in dots. Pathway and weights of each KO can be checked easily with .png image. Instructions how to build graphs.pkl are provided.

Latest update:

  • 07/03/2024 has 481 KEGG modules.

Previous updates:

  • 27/04/2023 has 475 modules.
  • MGnify pipeline-v5 uses 394 modules.

If you need to update existing pathways data and graphs follow this instruction.

Calculate pathways completeness

This script requires hmmsearch table that was run on KEGG profiles with annotated sequences (preferable) OR file with list of KOs. If you don't have this table follow instructions how to generate it.

Run using conda

conda create --name kegg-env
conda activate kegg-env

pip3 install -r requirements.txt

export INPUT="tests/fixtures/give_pathways/test_pathway.txt"  # path to hmm-result table
export OUTPUT="test-out"  # prefix for output

# hmmtable as input
python3 bin/give_pathways.py \
  -i ${INPUT} \
  -o ${OUTPUT}

# KOs list as input
python3 bin/give_pathways.py \
  -l 'tests/test_data/test-input/test_kos.txt' \
  -o ${OUTPUT}

Check example of output here.
*kegg_pathways.tsv has pathways completeness calculated by all KOs in given input file
*kegg_contigs.tsv has pathways completeness calculated per each contig (first column contains name of contig).

Run using docker

Results can be found in folder results. Final annotated pathways would be in folder results/pathways

export INPUT="path to hmm-result table"
docker \
    run \
    -i \
    --workdir=/results \
    --volume=`pwd`/results:/results:rw \
    --volume=${INPUT}:/files/input_table.tsv:ro \
    quay.io/microbiome-informatics/kegg-completeness:v1.1 \
    /tools/run_pathways.sh \
    -i /files/input_table.tsv

Plot pathways completeness

NOTE: please install graphviz
If you want to see what edges were chosen to complete the graph of completeness you can plot them adding --plot-pathways argument. \

python3 bin/give_pathways.py -i ${INPUT} -o ${OUTPUT} --plot-pathways

You can also run plotting script separately:

python3 bin/plot_completeness_graphs.py -i output_with_pathways_completeness

Example,

M00050.png

more examples for test data here

Theory:

Pathways to graphs

KEGG provides a representation of each pathway as specific expression of KOs. ex: A ((B,C) D,E) (A+F)
where A, B, C, D, E, F are KOs
space means AND
comma means OR
plus means essential component
minus means optional component Each expression was recursively converted into directed graph using NetworkX. First node has number 0 and the last number 1. Each edge corresponds to KO.

ex1.png

Completeness

In order to count pathways completeness each graph was made weighted. Default weight of each edge is 0.
Let's imagine there is a set of KOs predicted by annotation. If KO is presented in pathway - corresponding edge receives weight = 1 (or 0 if edge is optional or another value if edge is connected by +).
After that script searches the most weighted path from node 0 to node 1 (graph_weight). max_graph_weight calculated in assumption all KOs are presented.
completeness = graph_weight/max_graph_weight * 100%

ex2.png

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kegg_pathways_completeness-1.0.2.tar.gz (20.9 kB view details)

Uploaded Source

Built Distribution

kegg_pathways_completeness-1.0.2-py3-none-any.whl (24.9 kB view details)

Uploaded Python 3

File details

Details for the file kegg_pathways_completeness-1.0.2.tar.gz.

File metadata

File hashes

Hashes for kegg_pathways_completeness-1.0.2.tar.gz
Algorithm Hash digest
SHA256 70dde6ce82fad1de40f58e8a3637a8e518507ec63bfaef3c1ab3a839679379bd
MD5 668112592804f955a79833af387c5817
BLAKE2b-256 ce7913f369324a67756ebcf689401c0151360f7e322cb7e110739edfd80368bd

See more details on using hashes here.

File details

Details for the file kegg_pathways_completeness-1.0.2-py3-none-any.whl.

File metadata

File hashes

Hashes for kegg_pathways_completeness-1.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 cb83ce049534558f6474d99bfc7b5b08bbf98bc94f41ab3ea8f27b8fa430ff83
MD5 b59897369679f72f2291bb9162e9a41d
BLAKE2b-256 e0def4102b7d349e219b6c4710791e49a27adaa6e7a28e89f8cfc2108d85cf79

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page