Genbinesia Pathway Extractor
Project description
GPATHEX (Genbinesia Pathway Extractor)
GPATHEX is a simple biological pathway and taxonomic information extractor from the KEGG Database.
⚠️ Derivative Work Notice
GPATHEX is a derivative work based on KEGGTools by Junpeng Fan.
This version includes substantial modifications and enhancements while maintaining MIT License compliance.
A copy of MIT License of original KEGGTools also included.
Original work: Fan, J. (2018). KEGGTools. https://github.com/FlyPythons/KEGGTools
Major Enhancements from Original
- ✅ Complete Python 3.11 migration and support to 3.11+
- ✅ Biological big data support (thanks to Polars and Polars-bio) and efficient data storage using Apache Parquet format (thru Polars)
- ✅ Better NCBI Entrez communication using Biopython
- ✅ Better KEGG information processing following current-state of KEGG format
Author
- Maulana Malik Nashrulloh (Division of Biomics Research, Department of Sciences, Generasi Biologi Indonesia Foundation)
Quick Start
Dependencies
Make sure that your system have Python >=3.11 installed and these packages/libraries installed:
- biopython>=1.86
- polars-bio>=0.19.0
- polars>=1.37.1
- matplotlib>=3.10.8
- seaborn>=0.13.2
- rich>=14.2.0
- tqdm>=4.67.1
- colorama>=0.4.6
- pyarrow>=21.0.0
- openpyxl>=3.1.5
- orjson>=3.11.5
- aiohttp>=3.13.3
- httpx>=0.28.1
- backoff>=2.2.1
Installation
Currently we only support installation thru pip command only.
pip install gpathex
Valid, available commands
GPATHEX is accessible using gpathex command. Under GPATHEX, currently we support these commands, which accessible using gpathex <commands>
Available Commands:
Command to execute
download-org Download KEGG organism information
download-ko Download KEGG Orthology (KO) files
download-proteins Download protein sequences from NCBI
process-proteins Process proteins with KO annotations
make-db Create custom KEGG database
make-keg Create .keg file from annotations
plot-keg-kos Plot KEGG KO hierarchy (ko00001.keg format)
plot-keg-genes Plot KEGG gene annotations (from make-keg command)
plot-keg Legacy: Plot KEGG annotation results (use plot-keg-kos or plot-keg-genes)
get-ranks Get KEGG organism classification ranks
config Show or modify configuration
info Show system information and dependencies
download-taxonomy Download NCBI taxonomy database
Usage
Get KEGG organisms list:
gpathex download-org \
--out /path/to/your/organisms.tsv
Download KO files:
gpathex download-ko \
--org /path/to/your/organisms.tsv \
--out /path/to/your/ko_files_dir/
Get protein sequences:
gpathex download-proteins \
--org /path/to/your/organisms.tsv \
--out /path/to/your/proteins_dir/
Get NCBI Taxonomy Lineage:
gpathex download-taxonomy \
--out /path/to/your/ncbi_taxonomy.tsv
Create database:
gpathex make-db \
--org /path/to/your/organisms.tsv \
--keg /path/to/your/ko_files_dir/ \
--pep /path/to/your/proteins_dir/
--out /path/to/your/my_kegg_db/
The resulting database at /path/to/your/my_kegg_db/ structured as follows:
my_kegg_db/
├── my_kegg_db.pep.fasta.gz # Compressed protein sequences
├── my_kegg_db.sequences.parquet # Sequence metadata
├── my_kegg_db.annotations.parquet # KO annotations
├── my_kegg_db.pep2ko.tsv # Protein-to-KO mapping
├── my_kegg_db.stats.json # Statistics
└── my_kegg_db.summary.txt # Human-readable summary
Create .keg file:
gpathex make-keg \
--keg /path/to/your/ko00001.keg \
--in /path/to/your/annotations.tsv
--out /path/to/your/results.keg
Plot KO hierarchy:
gpathex plot-keg-kos \
--keg /path/to/your/ko00001.keg \
--out /path/to/your/kos_plot
By default, this will make your plot in PNG format.
Plot gene annotations:
gpathex plot-keg-genes \
--keg /path/to/your/results.keg \
--out /path/to/your/genes_plot
By default, this will make your plot in PNG format.
Get organism ranks:
gpathex get-ranks \
--keg /path/to/your/br08610.keg \
--taxon /path/to/your/ncbi_taxonomy.tsv \
--out /path/to/your/ranks.tsv
Help
To access the help, use:
gpathex -h
Or, if you want to access the help of specific command, use:
gpathex <command> -h
Acknowledgments
- This program is based on KEGGTools by Junpeng Fan (https://github.com/FlyPythons/KEGGTools)
- This program was made as part of research mini-project "PyTax4Fun2: A Python Tool for Functional Profiling and Redundancy Analysis of Bacterial Communities via 16S rRNA Gene Sequences, Featuring Polars for Efficient Processing of Large Genomic Datasets" (Project #BIOMIKA-02), Subproject #BIOMIKA-02.1, funded internally by Generasi Biologi Indonesia Foundation.
Citation
A dedicated publication for this program is not yet available. For citation purposes, please refer to the following technical report:
Nashrulloh, M.M. (2026). GPATHEX: A simple biological pathway and taxonomic information extractor from the KEGG Database (Technical Report No. GBR-TR-BIOMIKA-03/Genbinesia/I/2026). Generasi Biologi Indonesia Foundation. Gresik, Indonesia.
If you wish to cite this repository, you may use the following APA-style reference entry:
Nashrulloh, M.M. (2026). GPATHEX: A simple biological pathway and taxonomic information extractor from the KEGG Database (Version 1.0.0) [Computer software]. https://gitlab.com/biomikalab/gpathex
License
This project is licensed under the GNU General Public License v3.0 - See the LICENSE file for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file gpathex-1.0.0.tar.gz.
File metadata
- Download URL: gpathex-1.0.0.tar.gz
- Upload date:
- Size: 74.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ce24b0e4da9093a8e9923a49580da26515597c4ab212f4fd492b55417999b9e8
|
|
| MD5 |
54caf89699a95e1678bf3571fcadc9d1
|
|
| BLAKE2b-256 |
a2ffb63863105ffe73e7824fcb9db96af8612cbd0a439058b8972efb3c7ba414
|
File details
Details for the file gpathex-1.0.0-py3-none-any.whl.
File metadata
- Download URL: gpathex-1.0.0-py3-none-any.whl
- Upload date:
- Size: 83.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
213f9a53210f56c435eff73fc38ef273cc31fa967c63deefd8f032cd43e9cb2b
|
|
| MD5 |
66f6d8e5cd1b658d73cfb47c8c9e1693
|
|
| BLAKE2b-256 |
3fccedd9be90400bc75db9e234c68e4739fcc82325e28571a5165197f2dea378
|