Skip to main content

The tool counts completeness of each KEGG pathway for protein sequences.

Project description

kegg-pathways-completeness tool

This tool computes the completeness of each KEGG pathway module for given set of KEGG orthologues (KOs) based on their presence/absence. The current version of this tool has 482 KEGG modules (updated 02/07/2024).

Please read the Theory section at the bottom of this README for a detailed explanation.

Input example

Output example

  • *.summary.kegg_pathways.tsv (example) contains module pathways completeness calculated for all KOs in the given input file.
  • *.summary.kegg_contigs.tsv (example) contains module pathways completeness calculated per each contig (first column contains name of contig) if contig annotation were provided with -i.

Optional:

  • pathways_plots/ (example) folder containing PNG representation and graphs generated with --plot-pathways argument.
  • with_weights.*.tsv example of output generated with --include-weights argument. Each KO has a weight in brackets.

Check more examples of different output files here.

Installation

This tool was published in Pypi and Bioconda:

Install with pip

pip install kegg-pathways-completeness

Install with bioconda

Follow bioconda instructions

Install from source using venv/conda env (not the best option)

conda create --name kegg-env
conda activate kegg-env

pip3 install -r requirements.txt

How to run

Quick start

# for list of KOs
give_pathways -l {INPUT_LIST}

# per contig annotation with KOs
give_pathways -i {INPUT_FILE}

Run with test examples

# hmmtable as input
python3 kegg_pathways_completeness/bin/give_pathways.py \
  -i 'tests/fixtures/give_pathways/test_pathway.txt' \
  -o test_pathway

# KOs list as input
python3 kegg_pathways_completeness/bin/give_pathways.py \
  -l 'tests/fixtures/give_pathways/test_kos.txt' \
  -o test_list_kos

Run using docker

Results can be found in folder results. Final annotated pathways are generated in results/pathways

export INPUT="path to hmm-result table"
docker \
    run \
    -i \
    --workdir=/results \
    --volume=`pwd`/results:/results:rw \
    --volume=${INPUT}:/files/input_table.tsv:ro \
    quay.io/microbiome-informatics/kegg-completeness:v1.1 \
    /tools/run_pathways.sh \
    -i /files/input_table.tsv

Input arguments description

Required arguments:

input file:

An input file is required under either of the following commands:

  • input table (-i/--input): hmmsearch table (example) that was run on KEGG profiles DB with annotated sequences (preferable). If you don't have this table, follow these instructions to generate it.
  • file with KOs list (-l/--input-list): comma separated file with list of KOs (example).

Optional arguments:

  • output prefix (-o/--outname): prefix for output tables (-o test_kos in example)
  • add weight information to output files (-w/--include-weights). The output table will contain the weight of each KO edge in the pathway graph, for example K00942(0.25) means that the KO has 0.25 importance in the given pathway. Example of output
  • plot present KOs in pathways (p/--plot-pathways): generates a PNG containing a schematic representation of the pathway. Presented KOs are marked with red edges. Example: M00002

pathways data: modules information and graphs

This repository contains a set of pre-generated files. Modules information files can be found in pathways_data. The repository also contains pre-parsed module pathways into graphs format. In order to generate graphs all pathways were parsed with the NetworkX library. The graph for every module is shown in .png format in png folder and .dot format in dots folder. Pathway and weights of each KO can be easily checked with the .png image.

In order to run a tool there is no need to re-generate those files again. All graphs re-generation instructions and module pathways info re-generation commands are provided for updates and understanding a process.

modules information:

graphs:

  • graphs constructed from each module (-g/--graphs) (latest graphs.pkl)

Plot pathway completeness

NOTE: please make sure you have graphviz installed

You can also run the plotting script separately:

plot_completeness_graphs.py -i output_with_pathways_completeness

Example

M00050.png

More examples for test data here

Theory:

Pathways to graphs

KEGG provides a representation of each pathway as a specific expression of KOs. example A ((B,C) D,E) (A+F) where:

  • A, B, C, D, E, F are KOs
  • space == AND
  • comma == OR
  • plus == essential component
  • minus == optional component
  • minus minus == missing optional component (replaced into K0000 with 0 weight (example))

Each expression was converted into a directed graph using NetworkX. The first node is node 0 and the last one is node 1. Each edge corresponds to a KO.

ex1.png

Completeness

In order to compute pathways completeness, each node in the graph is weighted. The default weight of each edge is 0.

Given a set of predicted KOs, if the KO is present in the pathway, the corresponding edge will have assigned weight = 1 (or 0 if edge is optional or another value if edge is connected by +). After that, this script searches the most relevant path by graph_weight from node 0 to node 1. max_graph_weight is then calculated under the assumption that all KOs are present.

completeness = graph_weight/max_graph_weight * 100%

ex2.png

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kegg_pathways_completeness-1.0.5.tar.gz (107.6 kB view details)

Uploaded Source

Built Distribution

kegg_pathways_completeness-1.0.5-py3-none-any.whl (112.5 kB view details)

Uploaded Python 3

File details

Details for the file kegg_pathways_completeness-1.0.5.tar.gz.

File metadata

File hashes

Hashes for kegg_pathways_completeness-1.0.5.tar.gz
Algorithm Hash digest
SHA256 b7ec0b1ceabd168296a18ab1184a26e40cfa4fdcaf156cd1d75b0e12c51f8aaa
MD5 767bf86bc7af0bb777914ece55ad8b2f
BLAKE2b-256 6bc15f896e7bc857dc3ae23fb0e7e987182c0d723fb809375f7e03d03f208727

See more details on using hashes here.

File details

Details for the file kegg_pathways_completeness-1.0.5-py3-none-any.whl.

File metadata

File hashes

Hashes for kegg_pathways_completeness-1.0.5-py3-none-any.whl
Algorithm Hash digest
SHA256 0dce69750e5c891709d5cabfeab44c6c84a2de2f6201ac7283e38772f0b58c49
MD5 bc642d28652f4e40f3ef6db34d0f31e6
BLAKE2b-256 e3312372259c41c1811e012a17eae57406be5428e8121163c3deeef8aa54bbbf

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page