Skip to main content

A utility to generate input files for taxonomy propagation and assignment in QIIME/QIIME2 from the NCBI database

Reason this release was yanked:

There is an issue in calling cogent3 in Python 3.14

Project description

GTAXOPROP (Genbinesia Taxonomy Propagator)

Python Version License: GPL v3 Version Github All Releases

GTAXOPROP is a utility to generate input files for taxonomy propagation and assignment in QIIME/QIIME2 from the NCBI database. It converts NCBI accession numbers to QIIME/QIIME2-compatible taxonomy files with API fallback.

⚠️ Derivative Work Notice

GTAXOPROP is a derivative work based on entrez_qiime v2.0 by Christopher C. M. Baker. This version includes substantial modifications and enhancements while maintaining GPL v3 compliance.

Original work: Baker, C.C.M. (2016). entrez_qiime. v2.0. https://github.com/bakerccm/entrez_qiime

Major Enhancements from Original

  • ✅ Complete Python 3 migration
  • ✅ cogent3 integration (replaced PyCogent)
  • ✅ Better NCBI Entrez communication using Biopython
  • ✅ Advanced caching with resume capability
  • ✅ Batch API processing with rate limiting
  • ✅ Improved error handling and logging
  • ✅ Enhanced file encoding detection
  • ✅ Better taxonomy rank handling

Authors

  • Maulana Malik Nashrulloh (Division of Biomics Research, Department of Sciences, Generasi Biologi Indonesia Foundation)
  • Sonia Az Zahra Defi (Department of Biology, Faculty of Mathematics and Natural Sciences, Brawijaya University)
  • Brian Rahardi (Department of Bioinformatics, Faculty of Mathematics and Natural Sciences, Brawijaya University)
  • Muhammad Badrut Tamam (Division of Biomics Research, Department of Sciences, Generasi Biologi Indonesia Foundation & Biology Program, Faculty of Science, Technology, and Education, Muhammadiyah University of Lamongan)
  • Riki Ruhimat (Research Center for Applied Microbiology, Research Organization for Life Sciences, National Research and Innovation Agency)
  • Hessy Novita (Research Center for Veterinary Science, Research Organization for Health, National Research and Innovation Agency)

Quick Start

Dependencies

Make sure that your system have Python >=3.10 installed and these packages/libraries installed:

  • tinydb==4.8.2
  • pbr>=6.1.1
  • stevedore>=5.5.0
  • cogent3>=2025.9.8a2
  • biopython>=1.85

Installation

Currently we only support installation thru pip command only.

pip install git+https://github.com/biomikalab/GTAXOPROP.git

For much stable wheel package can be installed by pip

pip install gtaxoprop

Usage

To use this program, you must have NCBI taxdump and accession2taxid data

Unpacked content of nucl_gb.accession2taxid.gz and nucl_wgs.accession2taxid.gz respectively is very huge! (Spending 10 GB+ and 40 GB+ space respectively, manage your disk space accordingly!). Alternatively, you may choose only one, nucl_gb.accession2taxid.gz or nucl_wgs.accession2taxid.gz one, but this may will not cover entirety of your data.

Assumed that you have enough free space of 100-150 GB+ at your ~ (/home/username/), run this command one-by-one to set up your data:

cd ~
mkdir ~/path/to/your/NCBI/taxdump
cd ~/path/to/your/NCBI/taxdump
wget ftp://ftp.ncbi.nlm.nih.gov/pub/taxonomy/taxdump.tar.gz
tar -zxvf taxdump.tar.gz
mkdir ~/path/to/your/NCBI/accession2taxid
cd ~/path/to/your/NCBI/accession2taxid
wget ftp://ftp.ncbi.nlm.nih.gov/pub/taxonomy/accession2taxid/nucl_gb.accession2taxid.gz
gunzip nucl_gb.accession2taxid.gz
wget ftp://ftp.ncbi.nlm.nih.gov/pub/taxonomy/accession2taxid/nucl_wgs.accession2taxid.gz
gunzip nucl_wgs.accession2taxid.gz
cp nucl_gb.accession2taxid nucl_merged.accession2taxid
tail -n+2 nucl_wgs.accession2taxid >> nucl_merged.accession2taxid
rm nucl_gb.accession2taxid nucl_wgs.accession2taxid

For propagating taxonomy of Archaea, Bacteria, and Eukaryota:

gtaxoprop -i ~/path/to/your/your_sequences.fasta \
          -o ~/path/to/your/your_taxdumps.txt \
          -g ~/path/to/your/your_execution.log \
          -n ~/path/to/your/NCBI/taxdump/ \
          -a ~/path/to/your/NCBI/accession2taxid/nucl_merged.accession2taxid \
          -r domain,kingdom,phylum,class,order,family,genus,species \
          -d --email your_mail@email.xxx

For propagating taxonomy of Virus:

gtaxoprop -i ~/path/to/your/your_sequences.fasta \
          -o ~/path/to/your/your_taxdumps.txt \
          -g ~/path/to/your/your_execution.log \
          -n ~/path/to/your/NCBI/taxdump/ \
          -a ~/path/to/your/NCBI/accession2taxid/nucl_merged.accession2taxid \
          -r realm,kingdom,phylum,class,order,family,genus,species \
          -d --email your_mail@email.xxx

Help

To access the help, use:

gtaxoprop -h

Acknowledgments

  • This program is based on entrez_qiime Version 2.0 by Chris Baker (https://github.com/bakerccm/entrez_qiime)
  • Part of this program was presented at 4th International Conference on Biological Sciences (ICoBioS 2025) (https://www.icobios.org/)
  • This program was made as part of research mini-project "In silico metagenomic assessment of aCPSF1 phylogenetic marker for the identification and classification of archaea using publicly available Metagenomic Whole-genome Shotgun Sequencing data" funded internally by Generasi Biologi Indonesia Foundation.

Citation

A dedicated publication for this program is not yet available. For citation purposes, please refer to the following technical report:

Nashrulloh, M.M., Defi, S.A.Z., Rahardi, B., Tamam, Mh. B., Ruhimat, R., & Novita, H. (2025). GTAXOPROP: A utility to generate input files for taxonomy propagation and assignment in QIIME/QIIME2 from the NCBI database (Technical Report No. GBR-TR-BIOMIKA-01/Genbinesia/IX/2025). Generasi Biologi Indonesia Foundation. Gresik, Indonesia.

If you wish to cite this repository, you may use the following APA-style reference entry:

Nashrulloh, M.M., Defi, S.A.Z., Rahardi, B., Tamam, Mh. B., Ruhimat, R., & Novita, H. (2025). GTAXOPROP: A utility to generate input files for taxonomy propagation and assignment in QIIME/QIIME2 from the NCBI database (Version 1.0.post1) [Computer software]. https://github.com/biomikalab/GTAXOPROP

License

GNU General Public License v3.0 - See LICENSE

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gtaxoprop-1.0.post1.tar.gz (14.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

gtaxoprop-1.0.post1-py3-none-any.whl (14.1 kB view details)

Uploaded Python 3

File details

Details for the file gtaxoprop-1.0.post1.tar.gz.

File metadata

  • Download URL: gtaxoprop-1.0.post1.tar.gz
  • Upload date:
  • Size: 14.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for gtaxoprop-1.0.post1.tar.gz
Algorithm Hash digest
SHA256 61d73bcc4da7969e495a6ead3478ca786e16ab5ff0e258a094f95673087a252c
MD5 d506a4c299195f22cf81705a9b143349
BLAKE2b-256 7d37ecc66f94a85b4d7a144b4843cc441cc85a514beeaeb5819c9b2c628d3e71

See more details on using hashes here.

File details

Details for the file gtaxoprop-1.0.post1-py3-none-any.whl.

File metadata

  • Download URL: gtaxoprop-1.0.post1-py3-none-any.whl
  • Upload date:
  • Size: 14.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for gtaxoprop-1.0.post1-py3-none-any.whl
Algorithm Hash digest
SHA256 b5406b8fd1ac703bc95889eeb7b14a378e5c34c879404a1a738824b0c4235dec
MD5 d1c875e9a7fca28116d9d364ed00d512
BLAKE2b-256 da759adc3a2653b3bf97ad018de22fb80fae7c8b0c2d234623ded0a3f5a1e94d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page