Skip to main content

Genbinesia Pathway Extractor

Project description

GPATHEX (Genbinesia Pathway Extractor)

Python Version License: GPL v3 Version

GPATHEX is a simple biological pathway and taxonomic information extractor from the KEGG Database.

⚠️ Derivative Work Notice

GPATHEX is a derivative work based on KEGGTools by Junpeng Fan. This version includes substantial modifications and enhancements while maintaining MIT License compliance. A copy of MIT License of original KEGGTools also included.

Original work: Fan, J. (2018). KEGGTools. https://github.com/FlyPythons/KEGGTools

Major Enhancements from Original

  • ✅ Complete Python 3.11 migration and support to 3.11+
  • ✅ Biological big data support (thanks to Polars and Polars-bio) and efficient data storage using Apache Parquet format (thru Polars)
  • ✅ Better NCBI Entrez communication using Biopython
  • ✅ Better KEGG information processing following current-state of KEGG format

Author

  • Maulana Malik Nashrulloh (Division of Biomics Research, Department of Sciences, Generasi Biologi Indonesia Foundation)

Quick Start

Dependencies

Make sure that your system have Python >=3.11 installed and these packages/libraries installed:

  • biopython>=1.86
  • polars-bio>=0.19.0
  • polars>=1.37.1
  • matplotlib>=3.10.8
  • seaborn>=0.13.2
  • rich>=14.2.0
  • tqdm>=4.67.1
  • colorama>=0.4.6
  • pyarrow>=21.0.0
  • openpyxl>=3.1.5
  • orjson>=3.11.5
  • aiohttp>=3.13.3
  • httpx>=0.28.1
  • backoff>=2.2.1

Installation

Currently we only support installation thru pip command only.

pip install gpathex

Valid, available commands

GPATHEX is accessible using gpathex command. Under GPATHEX, currently we support these commands, which accessible using gpathex <commands> Available Commands: Command to execute download-org Download KEGG organism information download-ko Download KEGG Orthology (KO) files download-proteins Download protein sequences from NCBI process-proteins Process proteins with KO annotations make-db Create custom KEGG database make-keg Create .keg file from annotations plot-keg-kos Plot KEGG KO hierarchy (ko00001.keg format) plot-keg-genes Plot KEGG gene annotations (from make-keg command) plot-keg Legacy: Plot KEGG annotation results (use plot-keg-kos or plot-keg-genes) get-ranks Get KEGG organism classification ranks config Show or modify configuration info Show system information and dependencies download-taxonomy Download NCBI taxonomy database

Usage

Get KEGG organisms list:

gpathex download-org \
    --out /path/to/your/organisms.tsv

Download KO files:

gpathex download-ko \
    --org /path/to/your/organisms.tsv \
    --out /path/to/your/ko_files_dir/

Get protein sequences:

gpathex download-proteins \
    --org /path/to/your/organisms.tsv \
    --out /path/to/your/proteins_dir/

Get NCBI Taxonomy Lineage:

gpathex download-taxonomy \
    --out /path/to/your/ncbi_taxonomy.tsv

Create database:

gpathex make-db \
    --org /path/to/your/organisms.tsv \
    --keg /path/to/your/ko_files_dir/ \
    --pep /path/to/your/proteins_dir/ 
    --out /path/to/your/my_kegg_db/

The resulting database at /path/to/your/my_kegg_db/ structured as follows:

my_kegg_db/
├── my_kegg_db.pep.fasta.gz          # Compressed protein sequences 
├── my_kegg_db.sequences.parquet     # Sequence metadata
├── my_kegg_db.annotations.parquet   # KO annotations
├── my_kegg_db.pep2ko.tsv            # Protein-to-KO mapping
├── my_kegg_db.stats.json            # Statistics
└── my_kegg_db.summary.txt           # Human-readable summary

Create .keg file:

gpathex make-keg \
    --keg /path/to/your/ko00001.keg \
    --in /path/to/your/annotations.tsv 
    --out /path/to/your/results.keg

Plot KO hierarchy:

gpathex plot-keg-kos \
    --keg /path/to/your/ko00001.keg \
    --out /path/to/your/kos_plot

By default, this will make your plot in PNG format.

Plot gene annotations:

gpathex plot-keg-genes \
    --keg /path/to/your/results.keg \
    --out /path/to/your/genes_plot

By default, this will make your plot in PNG format.

Get organism ranks:

gpathex get-ranks \
    --keg /path/to/your/br08610.keg \
    --taxon /path/to/your/ncbi_taxonomy.tsv \
    --out /path/to/your/ranks.tsv

Help

To access the help, use:

gpathex -h

Or, if you want to access the help of specific command, use:

gpathex <command> -h

Acknowledgments

  • This program is based on KEGGTools by Junpeng Fan (https://github.com/FlyPythons/KEGGTools)
  • This program was made as part of research mini-project "PyTax4Fun2: A Python Tool for Functional Profiling and Redundancy Analysis of Bacterial Communities via 16S rRNA Gene Sequences, Featuring Polars for Efficient Processing of Large Genomic Datasets" (Project #BIOMIKA-02), Subproject #BIOMIKA-02.1, funded internally by Generasi Biologi Indonesia Foundation.

Citation

A dedicated publication for this program is not yet available. For citation purposes, please refer to the following technical report:

Nashrulloh, M.M. (2026). GPATHEX: A simple biological pathway and taxonomic information extractor from the KEGG Database (Technical Report No. GBR-TR-BIOMIKA-03/Genbinesia/I/2026). Generasi Biologi Indonesia Foundation. Gresik, Indonesia.

If you wish to cite this repository, you may use the following APA-style reference entry:

Nashrulloh, M.M. (2026). GPATHEX: A simple biological pathway and taxonomic information extractor from the KEGG Database (Version 1.0.0) [Computer software]. https://gitlab.com/biomikalab/gpathex

License

This project is licensed under the GNU General Public License v3.0 - See the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gpathex-1.0.0.tar.gz (74.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

gpathex-1.0.0-py3-none-any.whl (83.2 kB view details)

Uploaded Python 3

File details

Details for the file gpathex-1.0.0.tar.gz.

File metadata

  • Download URL: gpathex-1.0.0.tar.gz
  • Upload date:
  • Size: 74.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for gpathex-1.0.0.tar.gz
Algorithm Hash digest
SHA256 ce24b0e4da9093a8e9923a49580da26515597c4ab212f4fd492b55417999b9e8
MD5 54caf89699a95e1678bf3571fcadc9d1
BLAKE2b-256 a2ffb63863105ffe73e7824fcb9db96af8612cbd0a439058b8972efb3c7ba414

See more details on using hashes here.

File details

Details for the file gpathex-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: gpathex-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 83.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for gpathex-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 213f9a53210f56c435eff73fc38ef273cc31fa967c63deefd8f032cd43e9cb2b
MD5 66f6d8e5cd1b658d73cfb47c8c9e1693
BLAKE2b-256 3fccedd9be90400bc75db9e234c68e4739fcc82325e28571a5165197f2dea378

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page