The tool counts completeness of each KEGG pathway for protein sequences.

These details have been verified by PyPI

Owner

Microbiome Informatics

Maintainers

These details have not been verified by PyPI

Project description

kegg-pathways-completeness tool

This tool computes the completeness of each KEGG pathway module for given set of KEGG orthologues (KOs) based on their presence/absence. The current version of this tool has 495 KEGG modules (updated 06/12/2024).

Please, read the Theory section at the bottom of this README for a detailed explanation.

Input example:

per contig annotation with KOs (ideally given from hmmscan annotation (see instructions));

list of KOs.

Output example:

*_pathways.tsv (example) contains module pathways completeness calculated for all KOs in the given input file.

Optional:

*_contigs.tsv (example) contains module pathways completeness calculated per each contig (first column contains name of contig).
pathways_plots/ (example) folder containing PNG representation and graphs generated with --plot-pathways argument.
*.with_weights.tsv example of output generated with --include-weights argument. Each KO has a weight in brackets.

Check more examples of different output files here.

Installation

This tool was published in Pypi and Bioconda.
Docker container is available on DockerHub and Quay.

Install with pip

pip install kegg-pathways-completeness

Install with bioconda

Follow bioconda instructions

Docker

docker pull quay.io/biocontainers/kegg-pathways-completeness

Install from source or for development

git clone https://github.com/EBI-Metagenomics/kegg-pathways-completeness-tool.git
cd kegg-pathways-completeness-tool
pip install .

How to run

Quick start

# --- Using list of KOs
# simple run
give_completeness -l {INPUT_LIST} 

# test example:
# give_completeness \
#   --input-list 'tests/fixtures/give_completeness/test_kos.txt' \
#   --outprefix test_list_kos \
#   --list-separator ','

# --- per contig annotation with KOs
# simple run
give_completeness -i {INPUT_FILE}

# test example:
# give_completeness \
#   --input 'tests/fixtures/give_completeness/test_pathway.txt' \
#   --outprefix test_pathway \
#   --add-per-contig \

Input arguments description

Required arguments:

input file:

An input file is required under either of the following commands:

input table (-i/--input): hmmsearch table (example) that was run on KEGG profiles DB with annotated sequences (preferable). If you don't have this table, follow instructions to generate it.
file with KOs list (-l/--input-list): file with list of KOs (example).

Optional arguments:

KOs separator for list option (-s/--list-separator): default is , (comma)
output directory (-o/--outdir): default is currently working directory
output prefix (-r/--outprefix): prefix for output tables (-r test_kos in example)
add weight information to output files (-w/--include-weights). The output table will contain the weight of each KO edge in the pathway graph, for example K00942(0.25) means that the KO has 0.25 importance in the given pathway. Example of output.
plot present KOs in pathways (-p/--plot-pathways): generates a PNG containing a schematic representation of the pathway. Presented KOs are marked with red edges. Example: M00002.
generate a table with per-contig modules completeness (-m/--add-per-contig): generates *_contigs.tsv calculating completeness per each contig for each module. That option makes sense to use only with --input containing information about contigs. If you send a list of KOs as input that means you do not provide information about contigs and *_contigs.tsv and *_pathways.tsv would be identical. Example.

pathways_data: modules information and graphs

This repository contains a set of pre-generated files. Modules information files can be found in pathways_data. The repository also contains pre-parsed module pathways into graphs format. In order to generate graphs all pathways were parsed with the NetworkX library.

modules information:

list of KEGG modules in KOs notation (-a/--definitions) (latest all_pathways.txt)
list of classes of KEGG modules (-c/--classes) (latest all_pathways_class.txt)
list of names of KEGG modules (-n/--names) (latest all_pathways_names.txt)

graphs:

graphs constructed from each module (-g/--graphs) (latest graphs.pkl))

Latest release has a plots archive with images and graphviz-files for all modules. The graph for every module is shown in .png format in png folder and contains corresponding graphviz file in graphs folder. Pathway and weights of each KO can be easily checked with the .png image.

In order to run a tool there is no need to re-generate those files.

All module data generation commands and graphs creation instructions are also available for updates and understanding a process.

Pathways visualization

NOTE: please make sure you have graphviz installed

You can also run the plotting script separately:

Plot modules of interest

plot_modules_graphs.py -l tests/fixtures/plot_modules_graphs/modules_list.txt

Plot graphs knowing completeness

plot_modules_graphs.py -i tests/outputs/give_completeness/test_kos_pathways.tsv

Example

More examples for test data here.

Theory:

Pathways to graphs

KEGG provides a representation of each pathway as a specific expression of KOs.
Example A ((B,C) D,E) -- (A+F-G) where:

A, B, C, D, E, F, G are KOs
space == AND
comma == OR
plus == essential component
minus == optional component
minus minus == missing optional component (replaced into K00000 with 0 weight (example: KEGG, corresponding graph))
new line == mediator (example: KEGG, corresponding graph)

------ Mediator addition note ------

There are some modules that have DEFINITION with line separated KOs. Those KOs are interpreted as mediators. Each line is connected with AND operator. It is considered that each line plays a crucial role into module that is why it influences weights assignment quite much.
All list of those modules presented in definition_separated.txt.

The question is how to use mediators is very difficult for current realisation and is under debate.

------------------------------------------------

Each expression was converted into a directed graph using NetworkX. The first node is node 0 and the last one is node 1. Each edge corresponds to a KO.

Completeness

In order to compute pathways completeness, each node in the graph is weighted. The default weight of each edge is 0.

Given a set of predicted KOs, if the KO is present in the pathway, the corresponding edge will have assigned weight = 1 (or 0 if edge is optional or another value if edge is connected by +). After that, this script searches the most relevant path by graph_weight from node 0 to node 1. max_graph_weight is then calculated under the assumption that all KOs are present.

completeness = graph_weight/max_graph_weight * 100%

Project details

These details have been verified by PyPI

Owner

Microbiome Informatics

Maintainers

chrisata kates_ebi sndyrgrs

These details have not been verified by PyPI

Release history Release notifications | RSS feed

1.4.3

Apr 7, 2026

1.4.2

Apr 2, 2026

1.4.0

Mar 30, 2026

1.3.0

Jan 28, 2025

1.2.1

Jan 20, 2025

This version

1.2.0

Jan 20, 2025

1.1.0

Jan 17, 2025

1.0.5

Jul 8, 2024

1.0.4

Jul 3, 2024

1.0.3

Jul 2, 2024

1.0.2

Jun 21, 2024

1.0.1

Jun 20, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kegg_pathways_completeness-1.2.0.tar.gz (117.6 kB view details)

Uploaded Jan 20, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

kegg_pathways_completeness-1.2.0-py3-none-any.whl (123.3 kB view details)

Uploaded Jan 20, 2025 Python 3

File details

Details for the file kegg_pathways_completeness-1.2.0.tar.gz.

File metadata

Download URL: kegg_pathways_completeness-1.2.0.tar.gz
Upload date: Jan 20, 2025
Size: 117.6 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.0.1 CPython/3.12.8

File hashes

Hashes for kegg_pathways_completeness-1.2.0.tar.gz
Algorithm	Hash digest
SHA256	`9859918ab68a6d34e1710463b420d67182c265eaf51a4be6ad29872ce6217619`
MD5	`27fe506159e7f1cc2e238a16139bd6a6`
BLAKE2b-256	`00edcb6e01434359bf1cfb19bf29b814eb9b840088ab19bd6ef1f30774318c58`

See more details on using hashes here.

Provenance

The following attestation bundles were made for kegg_pathways_completeness-1.2.0.tar.gz:

Publisher: python-publish.yml on EBI-Metagenomics/kegg-pathways-completeness-tool

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: kegg_pathways_completeness-1.2.0.tar.gz
- Subject digest: 9859918ab68a6d34e1710463b420d67182c265eaf51a4be6ad29872ce6217619
- Sigstore transparency entry: 163874882
- Sigstore integration time: Jan 20, 2025
Source repository:
- Permalink: EBI-Metagenomics/kegg-pathways-completeness-tool@dc302057d8c93491b265eb8558a0c929f6207ef8
- Branch / Tag: refs/heads/master
- Owner: https://github.com/EBI-Metagenomics
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: python-publish.yml@dc302057d8c93491b265eb8558a0c929f6207ef8
- Trigger Event: workflow_run

File details

Details for the file kegg_pathways_completeness-1.2.0-py3-none-any.whl.

File metadata

Download URL: kegg_pathways_completeness-1.2.0-py3-none-any.whl
Upload date: Jan 20, 2025
Size: 123.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.0.1 CPython/3.12.8

File hashes

Hashes for kegg_pathways_completeness-1.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`562c3281a0174ef5c7d2ba5955183fda528f7bf3d96edea74dbf9161ba78874f`
MD5	`0d8e5908595da08ffe4baa2d19797be6`
BLAKE2b-256	`89e8a6bf7a412d1526ec3365d0aa4f010e4d591a5b2f751e78cbfc6cee1b7fcc`

See more details on using hashes here.

Provenance

The following attestation bundles were made for kegg_pathways_completeness-1.2.0-py3-none-any.whl:

Publisher: python-publish.yml on EBI-Metagenomics/kegg-pathways-completeness-tool

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: kegg_pathways_completeness-1.2.0-py3-none-any.whl
- Subject digest: 562c3281a0174ef5c7d2ba5955183fda528f7bf3d96edea74dbf9161ba78874f
- Sigstore transparency entry: 163874885
- Sigstore integration time: Jan 20, 2025
Source repository:
- Permalink: EBI-Metagenomics/kegg-pathways-completeness-tool@dc302057d8c93491b265eb8558a0c929f6207ef8
- Branch / Tag: refs/heads/master
- Owner: https://github.com/EBI-Metagenomics
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: python-publish.yml@dc302057d8c93491b265eb8558a0c929f6207ef8
- Trigger Event: workflow_run

kegg-pathways-completeness 1.2.0

Navigation

Verified details

Owner

Maintainers

Meta

Unverified details

Meta

Classifiers

Project description

kegg-pathways-completeness tool

Input example:

Output example:

Installation

Install with pip

Install with bioconda

Docker

Install from source or for development

How to run

Quick start

Input arguments description

pathways_data: modules information and graphs

Pathways visualization

Plot modules of interest

Plot graphs knowing completeness

Example

Theory:

Pathways to graphs

------ Mediator addition note ------

------------------------------------------------

Completeness

Project details

Verified details

Owner

Maintainers

Meta

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance