The tool counts completeness of each KEGG pathway for protein sequences.
Project description
kegg-pathways-completeness tool
This tool computes the completeness of KEGG pathway modules for a given set of KEGG Orthologues (KOs) based on their presence/absence.
The current version includes 570 KEGG modules (updated 19/01/2026).
Please, read the Theory & Background section for a detailed explanation.
Table of Contents
- Installation
- Prerequisites
- Quick Start
- Detailed Usage
- Module Data Files
- Output Files
- Theory & Background
- Updating Module Data
- Complete Workflow
- Citation
Installation
The tool is available via PyPI, Bioconda, and Docker.
Install with pip
pip install kegg-pathways-completeness
Install with bioconda
conda install -c bioconda kegg-pathways-completeness
See bioconda recipe for details.
Docker
docker pull quay.io/biocontainers/kegg-pathways-completeness
Install from source (for development)
git clone https://github.com/EBI-Metagenomics/kegg-pathways-completeness-tool.git
cd kegg-pathways-completeness-tool
pip install -e .
Prerequisites
- Python: 3.8 or higher
- graphviz: Required for pathway visualization (install via system package manager)
- HMMER (optional): For annotating protein sequences with KOs
Quick Start
Tool uses pre-generated files modules_table.tsv and graphs.pkl described in Module Data Files.
Option 1: From a list of KOs
Input format (example): File with KO identifiers
K00001,K00002,K00003
command:
give_completeness \
--input-list kos_list.txt \
--list-separator ',' \
--outprefix my_analysis
Option 2: From per-contig KO annotations
Input format (example): Tab-separated file with contig names and KOs
contig_1 K00001 K00002 K00003
contig_2 K00004 K00005
command:
give_completeness \
--input ko_annotations.tsv \
--outprefix my_analysis
Detailed Usage
give_completeness
Calculate KEGG pathway module completeness from KO annotations.
Required Arguments
Input (choose one):
-i, --input <FILE>: Tab-separated file with contig names and KOs (example)-l, --input-list <FILE>: List of KOs, separated by delimiter (example)
Module data:
-t, --modules-table <FILE>: Module information in TSV format (columns: module, definition, name, class)- Default: Uses packaged
kegg_pathways_completeness/pathways_data/modules_table.tsv
- Default: Uses packaged
-g, --graphs <FILE>: Custom graphs file (default: uses packagedkegg_pathways_completeness/pathways_data/graphs.pkl)
Optional Arguments
-s, --list-separator <CHAR>: Separator for--input-list(default:,)-o, --outdir <DIR>: Output directory (default: current directory)-r, --outprefix <PREFIX>: Prefix for output files (default:summary.kegg)-m, --add-per-contig: Generate per-contig completeness table-w, --include-weights: Include KO weights in output (e.g.,K00942(0.25))-p, --plot-pathways: Generate pathway visualization plots-v, --verbose: Enable verbose logging
Examples
# Basic usage with KO list
give_completeness \
--input-list kos.txt \
--modules-table kegg_pathways_completeness/pathways_data/modules_table.tsv \
--graphs kegg_pathways_completeness/pathways_data/graphs.pkl \
--outprefix sample1
# Full analysis with per-contig results, weights, and plots
give_completeness \
--input ko_annotations.tsv \
--outprefix sample1 \
--add-per-contig \
--include-weights \
--plot-pathways \
--outdir results/
# Using custom module data
give_completeness \
--input ko_annotations.tsv \
--modules-table custom_modules.tsv \
--graphs custom_graphs.pkl \
--outdir custom_analysis
plot_modules_graphs
Generate pathway visualization with KOs highlighted.
Note: Requires graphviz to be installed.
Required Arguments
Input (choose one):
-i, --input-completeness <FILE>: Completeness output fromgive_completeness-m, --modules <ID> [<ID> ...]: Module IDs to plot (can be specified multiple times)-l, --modules-file <FILE>: File containing module IDs (one per line)
Graphs:
-g, --graphs <FILE>: Graphs pickle file (default:pathways_data/graphs.pkl)
Optional Arguments
-s, --file-separator <CHAR>: Separator in modules file (default: newline)-o, --outdir <DIR>: Output directory (default:pathways_plots)--use-pydot: Use pydot instead of graphviz backend
Examples
# Plot from completeness results
plot_modules_graphs \
-i sample1_pathways.tsv \
-g kegg_pathways_completeness/pathways_data/graphs.pkl \
-o pathway_plots
# Plot specific modules
plot_modules_graphs \
-m M00001 M00002 M00050 \
-g kegg_pathways_completeness/pathways_data/graphs.pkl
# Plot modules from file
plot_modules_graphs \
-l modules_of_interest.txt \
-g kegg_pathways_completeness/pathways_data/graphs.pkl
# Use pydot backend
plot_modules_graphs \
-i sample1_pathways.tsv \
-g kegg_pathways_completeness/pathways_data/graphs.pkl \
--use-pydot
Output:
- PNG images with pathways (present KOs in red)
- DOT source files (when using
--use-pydot)
More visualization examples: test output plots
Module Data Files
The package includes pre-generated data files in pathways_data/:
modules_table.tsv
Unified TSV file with all module information.
Columns:
module: Module ID (e.g., M00001)definition: KEGG module definition in KO notationname: Module name/descriptionclass: Module classification/category
File: modules_table.tsv
graphs.pkl
Pre-parsed NetworkX directed graphs for all modules. Each pathway definition has been converted to a graph structure for completeness calculation.
File: graphs.pkl
Output Files
Pathway completeness table (*_pathways.tsv)
Main output with completeness scores for all detected pathways.
Columns:
module_accession: Module IDcompleteness: Completeness percentage (0-100)pathway_name: Module namepathway_class: Module classificationmatching_ko: KOs found in the pathwaymissing_ko: KOs required but not found
Example: test_kos_pathways.tsv
Per-contig completeness (*_contigs.tsv)
Generated with -m/--add-per-contig flag. Same format as above but with contig name as first column.
Example: test_pathway_contigs.tsv
Weighted output (*.with_weights.tsv)
Generated with -w/--include-weights flag. Includes weight values for each KO in parentheses (e.g., K00942(0.25) means weight = 0.25).
Example: test_weights_pathways.with_weights.tsv
Pathway plots (pathways_plots/)
Generated with -p/--plot-pathways flag. Contains:
- PNG images with pathway graphs
- Present KOs highlighted in red
- Missing KOs in black
Example directory: pathways_plots/
Theory & Background
How KEGG modules are represented
KEGG provides pathway definitions as logical expressions of KOs.
Example: (K00844,K12407) (K01810,K06859,K13810) (K00850,K16370) K00918
Notation:
- Space = AND (all components required)
- Comma = OR (any one component required)
- Plus (+) = Essential component
- Minus (-) = Optional component
- Double minus (--) = Missing optional (replaced with K00000 with 0 weight)
- Newline = Mediator (multi-line definitions use AND between lines)
Examples:
Pathway to graph conversion
Each KEGG module definition is converted into a directed graph using NetworkX:
- Start node: 0
- End node: 1
- Edges: Represent KOs with assigned weights
Completeness calculation
Algorithm:
- Each edge in the graph has a weight based on its importance (calculated from pathway structure)
- For a given set of KOs:
- Present KOs → edge weight = original weight
- Missing KOs → edge weight = 0
- Find the path from node 0 to node 1 with minimum
(current_weight / original_weight)ratio - Calculate completeness:
completeness = (path_weight / max_path_weight) × 100%
Note on mediators: Some modules have multi-line definitions where each line represents a mediator component. All mediators are connected with AND operators. The complete list of modules with mediators is in definition_separated.txt.
Updating Module Data
To update module data to the latest KEGG version, see the update documentation.
The update process includes:
- Fetching latest module definitions from KEGG API
- Generating the unified
modules_table.tsv - Creating NetworkX graphs from module definitions
- Validating and testing the updated data
Complete Workflow
From raw sequences to pathway completeness
# Step 1: Annotate protein sequences using HMMER
# Download KEGG profiles database (KOfam) from KEGG
hmmscan --domtblout hmmer_output.tbl \
--cpu 4 \
profiles.hmm \
sequences.faa
# Step 2: Parse HMMER output to extract KO annotations per contig
parse_hmmer_table \
-i hmmer_output.tbl \
-f sequences.faa \
-t hmmscan \
-o ko_annotations.tsv
# Step 3: Calculate pathway completeness
give_completeness \
-i ko_annotations.tsv \
-t kegg_pathways_completeness/pathways_data/modules_table.tsv \
-r my_sample \
-m \
-w \
-p
# Step 4 (optional): Visualize specific modules
plot_modules_graphs \
-i my_sample_pathways.tsv \
-g kegg_pathways_completeness/pathways_data/graphs.pkl \
-o pathway_plots
See detailed documentation about hmmer usage and parsing.
Citation
If you use this tool in your research, please cite
Richardson L, Allen B, Baldi G, Beracochea M, Bileschi ML, Burdett T, et al. MGnify: the microbiome sequence data analysis resource in 2023 [Internet]. Vol. 51, Nucleic Acids Research. Oxford University Press (OUP); 2022. p. D753–9. Available from: http://dx.doi.org/10.1093/nar/gkac1080.
Issues & Contributions: Report bugs or request features on GitHub Issues
License: Apache License 2.0
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file kegg_pathways_completeness-1.4.0.tar.gz.
File metadata
- Download URL: kegg_pathways_completeness-1.4.0.tar.gz
- Upload date:
- Size: 110.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d9bc31c473bdc5cc469aee7e8237dbde46466678ddfcdccca79a4ef039e542a0
|
|
| MD5 |
00d052908b9b75de6701177fcb620e76
|
|
| BLAKE2b-256 |
0c3660aeb96a5305db4915385003cf9be55c1ddb62b9d2ad592c9fa0f768018e
|
Provenance
The following attestation bundles were made for kegg_pathways_completeness-1.4.0.tar.gz:
Publisher:
python-publish-pypi.yml on EBI-Metagenomics/kegg-pathways-completeness-tool
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
kegg_pathways_completeness-1.4.0.tar.gz -
Subject digest:
d9bc31c473bdc5cc469aee7e8237dbde46466678ddfcdccca79a4ef039e542a0 - Sigstore transparency entry: 1199734039
- Sigstore integration time:
-
Permalink:
EBI-Metagenomics/kegg-pathways-completeness-tool@ff520546122b46d545945d4d8374461fcef9273c -
Branch / Tag:
refs/tags/1.4.1 - Owner: https://github.com/EBI-Metagenomics
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python-publish-pypi.yml@ff520546122b46d545945d4d8374461fcef9273c -
Trigger Event:
release
-
Statement type:
File details
Details for the file kegg_pathways_completeness-1.4.0-py3-none-any.whl.
File metadata
- Download URL: kegg_pathways_completeness-1.4.0-py3-none-any.whl
- Upload date:
- Size: 115.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c86b8bd458538ad1fba1c06da8b3e63db76caf4a33c794a2047cbf233bc75129
|
|
| MD5 |
c83f38fda1a98aab8eb9330e441d7d36
|
|
| BLAKE2b-256 |
37d5aa84753e09b5b512fd8f6c7775ff66879088f66fb3b0b038b84b2b887dbc
|
Provenance
The following attestation bundles were made for kegg_pathways_completeness-1.4.0-py3-none-any.whl:
Publisher:
python-publish-pypi.yml on EBI-Metagenomics/kegg-pathways-completeness-tool
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
kegg_pathways_completeness-1.4.0-py3-none-any.whl -
Subject digest:
c86b8bd458538ad1fba1c06da8b3e63db76caf4a33c794a2047cbf233bc75129 - Sigstore transparency entry: 1199734040
- Sigstore integration time:
-
Permalink:
EBI-Metagenomics/kegg-pathways-completeness-tool@ff520546122b46d545945d4d8374461fcef9273c -
Branch / Tag:
refs/tags/1.4.1 - Owner: https://github.com/EBI-Metagenomics
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python-publish-pypi.yml@ff520546122b46d545945d4d8374461fcef9273c -
Trigger Event:
release
-
Statement type: