Skip to main content

KEGG Pathway Profiler

Project description

KEGG Pathway Profiler

KEGG Pathway Profiler is a pathway profiling tool designed for traversing metabolic pathway graphs, identifying most complete paths based on an evaluation set of KEGG orthologs (KO), and generalized for internal usage within Python and via cli executables. This package is a reimplementation of kegg-pathways-completeness-tool (e.g., base code and theory). For any publications or usage, please cite the original implementation and credit the lead developer (See Acknowledgements below).

Installation:

pip install kegg_pathway_profiler

Dependencies:

networkx>=3.0
numpy>=1.9
scipy>=1.11
pandas>=1.0
tqdm

CLI Usage:

Fetching and building the database:

# Fetch the database
mkdir -p data/
download-kegg-pathways.sh data/

# Build the database
build-pathway-database.py \
    -d data/database.pkl.gz \
    -i data/pathway_definitions.tsv \
    -n data/pathway_names.tsv \
    -c data/pathway_classes.tsv \

Note: These 2 steps will be combined into 1 step with the --download argument but this needs to be debugged.

Profile pathway coverage

Running:

profile-pathway-coverage.py -i data/test/kos.genomes.tsv -o data/test/pathway-profiler_output -d data/database.pkl.gz

Output:

Python Usage:

import kegg_pathway_profiler as kpp
# Load Database
database = kpp.utils.read_pickle("data/database.pkl.gz")
id = "M00001"
pathway = kpp.pathways.Pathway(
                    id=id, 
                    definition=database[id]["definition"],
                    name=database[id]["name"],
                    classes=database[id]["classes"],
)
pathway
# ==================
# Pathway(id:M00001)
# ==================
# Properties:
#     - name: Glycolysis (Embden-Meyerhof pathway), glucose => pyruvate
#     - classes: Pathway modules; Carbohydrate metabolism; Central carbohydrate metabolism
#     - number_of_kos: 32
# Definition:
#     (K00844,K12407,K00845,K25026,K00886,K08074,K00918) (K01810,K06859,K13810,K15916) (K00850,K16370,K21071,K00918) (K01623,K01624,K11645,K16305,K16306) K01803 ((K00134,K00150) K00927,K11389) (K01834,K15633,K15634,K15635) (K01689,K27394) (K00873,K12406)

# Evaluate
evaluation_kos = {'K00134',
 'K00150',
 'K00844',
 'K00845',
 'K00850',
 'K00873',
 'K00886',
 'K00918',
 'K00927',
 'K01623',
 'K01624',
 'K01689',
 'K16370',
 'K21071',
 'K25026',
 'K27394',
}
results = pathway.evaluate(evaluation_kos)

# Get coverage only
results["coverage"]
# 0.6666666666666667

# Get most complete path KOs
results["most_complete_path"]
# ['K00844',
#  'K01810',
#  'K00850',
#  'K01623',
#  'K01803',
#  'K00134',
#  'K00927',
#  'K01834',
#  'K01689',
#  'K00873']

Documentation:

profile-pathway-coverage.py

usage: profile-pathway-coverage.py

    Running: profile-pathway-coverage.py v3.10.14 via Python v/Users/jolespin/miniconda3/envs/kegg_pathway_profiler_env/bin/python3.10 | profile-pathway-coverage.py

options:
  -h, --help            show this help message and exit

I/O arguments:
  -i KOS, --kos KOS     path/to/kos.list[.gz].  Can either be 1 KO per line or a tab-separated table with the following structure: [id_genome]<tab>[id_ko], No header.
  -n NAME, --name NAME  Name of genome. [Default: Filename for --kos]
  -o OUTPUT_DIRECTORY, --output_directory OUTPUT_DIRECTORY
                        path/to/output_directory/ (e.g., kegg_pathway_profiler_output/]
  -d DATABASE, --database DATABASE
                        path/to/database.pkl[.gz] [Default: /Users/jolespin/miniconda3/envs/kegg_pathway_profiler_env/lib/python3.10/site-packages/kegg_pathway_profiler/data/database.pkl.gz]
  --index_name INDEX_NAME
                        Index name for coverage table (e.g., id_genome, id_genome_cluster, id_contig) [Default: id_genome]

Copyright 2024 New Atlantis Labs (jolespin@newatlantis.io)

build-pathway-database.py

usage: build-pathway-database.py

    Running: build-pathway-database.py v3.10.14 via Python v/Users/jolespin/miniconda3/envs/kegg_pathway_profiler_env/bin/python3.10 | build-pathway-database.py

options:
  -h, --help            show this help message and exit

Local arguments:
  -d DATABASE, --database DATABASE
                        path/to/database.pkl[.gz] [Default: /Users/jolespin/miniconda3/envs/kegg_pathway_profiler_env/lib/python3.10/site-packages/kegg_pathway_profiler/data/database.pkl.gz]
  -V DATABASE_VERSION, --database_version DATABASE_VERSION
                        Database version: Adds version information to the following file: path/to/database.version where .pkl extensions are removed [Default: KEGG_v2024.8.23]
  -f, --force           If file exists, then remove file and update it.

Local arguments:
  -i PATHWAY_DEFINITIONS, --pathway_definitions PATHWAY_DEFINITIONS
                        path/to/pathway_definitions.tsv.  [id_pathway]<tab>[definition], No header.
  -n PATHWAY_NAMES, --pathway_names PATHWAY_NAMES
                        path/to/pathway_names.tsv  [id_pathway]<tab>[name], No header.
  -c PATHWAY_CLASSES, --pathway_classes PATHWAY_CLASSES
                        path/to/pathway_classes.tsv.  [id_pathway]<tab>[class], No header.

Download arguments:
  --download            Download directly from http://rest.kegg.jp/
  --intermediate_directory INTERMEDIATE_DIRECTORY
                        Write the intermediate files from http://rest.kegg.jp/ to a directory.  If 'auto' then download to the directory that contains --database called `pathway_data`.
  --no_intermediate_files
                        Don't write intermediate files

Copyright 2024 New Atlantis Labs (jolespin@newatlantis.io)

Acknowledgements:

Ekaterina Sakharova the developer for the original implementation kegg-pathways-completeness-tool.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kegg_pathway_profiler-2024.8.23.tar.gz (28.5 kB view details)

Uploaded Source

File details

Details for the file kegg_pathway_profiler-2024.8.23.tar.gz.

File metadata

File hashes

Hashes for kegg_pathway_profiler-2024.8.23.tar.gz
Algorithm Hash digest
SHA256 09d3bca232063f1bffdb0170b373cfcd6dd0d4defa5a7ff5627acebcba8afe0a
MD5 e16bbda14a63abea05727d703b118e0f
BLAKE2b-256 63dbe92ca358083f195e04852a4d2d417506b06c1c7c3a33d70d63a37d2c7b9b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page