The tool counts completeness of each KEGG pathway for protein sequences.
Project description
kegg-pathways-completeness tool
The tool counts completeness of each KEGG modules pathway for protein sequence.
Please read Theory section with detailed explanation in the bottom of README.
Required files:
- list of KEGG modules in KOs notation (example, all_pathways.txt)
- list of classes of KEGG modules (example, all_pathways_class.txt)
- list of names of KEGG modules (example, all_pathways_names.txt)
- graphs constructed from each module (example, graphs.pkl)
This repository has a set of required files pre-generated. Current version of data was saved into pathways_data and graphs were saved into pkl format.
About graphs: In order to generate graphs all pathways were parsed with networkx library. Every graph is presented in .png format in png and .dot format in dots. Pathway and weights of each KO can be checked easily with .png image. Instructions how to build graphs.pkl are provided.
Latest update:
- 07/03/2024 has 481 KEGG modules.
Previous updates:
- 27/04/2023 has 475 modules.
- MGnify pipeline-v5 uses 394 modules.
If you need to update existing pathways data and graphs follow this instruction.
Calculate pathways completeness
This script requires hmmsearch table that was run on KEGG profiles with annotated sequences (preferable) OR file with list of KOs. If you don't have this table follow instructions how to generate it.
Run using conda
conda create --name kegg-env
conda activate kegg-env
pip3 install -r requirements.txt
export INPUT="tests/fixtures/give_pathways/test_pathway.txt" # path to hmm-result table
export OUTPUT="test-out" # prefix for output
# hmmtable as input
python3 bin/give_pathways.py \
-i ${INPUT} \
-o ${OUTPUT}
# KOs list as input
python3 bin/give_pathways.py \
-l 'tests/test_data/test-input/test_kos.txt' \
-o ${OUTPUT}
Check example of output here.
*kegg_pathways.tsv
has pathways completeness calculated by all KOs in given input file
*kegg_contigs.tsv
has pathways completeness calculated per each contig (first column contains name of contig).
Run using docker
Results can be found in folder results
. Final annotated pathways would be in folder results/pathways
export INPUT="path to hmm-result table"
docker \
run \
-i \
--workdir=/results \
--volume=`pwd`/results:/results:rw \
--volume=${INPUT}:/files/input_table.tsv:ro \
quay.io/microbiome-informatics/kegg-completeness:v1.1 \
/tools/run_pathways.sh \
-i /files/input_table.tsv
Plot pathways completeness
NOTE: please install graphviz
If you want to see what edges were chosen to complete the graph of completeness you can plot them adding --plot-pathways argument. \
python3 bin/give_pathways.py -i ${INPUT} -o ${OUTPUT} --plot-pathways
You can also run plotting script separately:
python3 bin/plot_completeness_graphs.py -i output_with_pathways_completeness
Example,
more examples for test data here
Theory:
Pathways to graphs
KEGG provides a representation of each pathway as specific expression of KOs.
ex: A ((B,C) D,E) (A+F)
where A, B, C, D, E, F are KOs
space means AND
comma means OR
plus means essential component
minus means optional component
Each expression was recursively converted into directed graph using NetworkX. First node has number 0 and the last number 1. Each edge corresponds to KO.
Completeness
In order to count pathways completeness each graph was made weighted. Default weight of each edge is 0.
Let's imagine there is a set of KOs predicted by annotation. If KO is presented in pathway - corresponding edge receives weight = 1 (or 0 if edge is optional or another value if edge is connected by +).
After that script searches the most weighted path from node 0 to node 1 (graph_weight
).
max_graph_weight
calculated in assumption all KOs are presented.
completeness = graph_weight/max_graph_weight * 100%
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file kegg_pathways_completeness-1.0.2.tar.gz
.
File metadata
- Download URL: kegg_pathways_completeness-1.0.2.tar.gz
- Upload date:
- Size: 20.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/5.1.0 CPython/3.12.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 70dde6ce82fad1de40f58e8a3637a8e518507ec63bfaef3c1ab3a839679379bd |
|
MD5 | 668112592804f955a79833af387c5817 |
|
BLAKE2b-256 | ce7913f369324a67756ebcf689401c0151360f7e322cb7e110739edfd80368bd |
File details
Details for the file kegg_pathways_completeness-1.0.2-py3-none-any.whl
.
File metadata
- Download URL: kegg_pathways_completeness-1.0.2-py3-none-any.whl
- Upload date:
- Size: 24.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/5.1.0 CPython/3.12.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | cb83ce049534558f6474d99bfc7b5b08bbf98bc94f41ab3ea8f27b8fa430ff83 |
|
MD5 | b59897369679f72f2291bb9162e9a41d |
|
BLAKE2b-256 | e0def4102b7d349e219b6c4710791e49a27adaa6e7a28e89f8cfc2108d85cf79 |