ncbi-tree is an open source, cross-platform command-line tool for downloading the latest NCBI taxonomy database and converting it to Newick tree format (.tre), with optional plain-text visualization (.txt)
Project description
ncbi-tree
ncbi-tree is an open source, cross-platform command-line tool for downloading the latest NCBI taxonomy database and converting it to Newick tree format (.tre), with optional plain-text visualization (.txt).
Quick Start
pip install ncbi-tree
ncbi-tree ./output
That's it! The tool will download the latest NCBI taxonomy, generate phylogenetic trees, and create detailed reports.
Features
- Automatic Download: Fetches the latest taxonomy data from NCBI FTP servers
- Version Tracking: Automatically detects and records the exact server version
- Smart Caching: Skips re-download and re-extraction when files already exist
- Merged Taxa Support: Handles merged taxonomy IDs from merged.dmp
- Name Sanitization: By default inital letter is capitalized and space is replaced by
-. Configurable name formatting with --no-sanitize option - Server-side compatibility: No more blocking on user input in automated environments. Use
ncbi-tree ./output --no-prompt-1to automatically generate all files (core + optional); Usencbi-tree ./output --no-prompt-0to generate only core files, skip optional files.
Installation
pip install ncbi-tree
Usage
Basic Usage
# Download and build taxonomy tree with default settings
ncbi-tree ./output
# Clean up intermediate files after processing
ncbi-tree ./output --no-cache
# Disable name sanitization (keep original spaces)
ncbi-tree ./output --no-sanitize
# Use custom download URL
ncbi-tree ./output --url https://custom-mirror.org/taxdump.tar.gz
# Combined options
ncbi-tree ./output --no-cache --no-sanitize
Server-side, non-blocking, no-interaction
# Automatically generate all files (core + optional)
ncbi-tree ./output --no-prompt-1
# Generate only core files, skip optional files
ncbi-tree ./output --no-prompt-0
Help
ncbi-tree --help
ncbi-tree --version
Output Files
Core Files (Generated Automatically)
output.NCBI.tree.tre- Newick tree with NCBI taxonomy IDs onlyoutput.NCBI.report.txt- Exploratory taxonomy analysis and statisticsversion.txt- Server timestamped version for downloaded taxdump.tar.gz
Optional Files (User Prompted)
After core files are generated, you will be prompted:
Would you like to generate optional files (output.NCBI.tree.txt, output.NCBI.named.tree.tre, output.NCBI.ID.to.name.tsv)? [y/N]:
If you answer y, additional files will be generated without re-reading data:
output.NCBI.tree.txt- Plain-text tree with Unicode box-drawingoutput.NCBI.named.tree.tre- Newick tree with rank:id:name labelsoutput.NCBI.ID.to.name.tsv- TSV mapping of IDs to names (TaxID, Name, Rank)
Name Sanitization
By default, taxon names are sanitized for consistent display:
- Spaces replaced with
- - Existing
-escaped as<->, which will be eventually escaped back to-. Configurable by changingname = name.replace('-', '<->')insanitize_nameincore.py. - Title case applied
- Special characters removed
Default (sanitized):
"Human;Homo-Sapiens"
"Norway-Rat;Rattus-Norvegicus"
With --no-sanitize flag:
"human; Homo sapiens"
"Norway rat; Rattus norvegicus"
Advanced Configuration
Custom Name Display
To customize which name types are displayed, edit NAME_PRIORITIES in ncbi_tree/core.py:
# Default: both common and scientific names
NAME_PRIORITIES = {"genbank common name": 0, "scientific name": 1}
# Result: "Human; Homo sapiens"
# Scientific name only (disable common name)
NAME_PRIORITIES = {"genbank common name": -1, "scientific name": 0}
# Result: "Homo sapiens"
# Common name only (disable scientific name)
NAME_PRIORITIES = {"genbank common name": 0, "scientific name": -1}
# Result: "Human"
Note: Priority value -1 disables that name type, >= 0 enables it (lower number = higher priority).
Requirements
- Python 3.8 or higher
- requests >= 2.25.0
- tqdm >= 4.50.0
Technical Details
Data Source
- Primary: NCBI Taxonomy Database (https://ftp.ncbi.nlm.nih.gov/pub/taxonomy/)
- Updates: Automatic detection of latest version with timestamp tracking
- Size: ~70-100 MB compressed, ~2.7M+ taxonomy entries at the time of writing (October 2025)
- Format: NCBI taxdump format (nodes.dmp, names.dmp, merged.dmp)
Output Formats
- Newick (
.tre): Standard phylogenetic tree format compatible with all major tree viewers - Text Tree (
.txt): Unicode-based visualization for terminal/text viewing - TSV Mapping (
.tsv): Tabular format for database integration and lookups - Report (
.txt): Statistical analysis with rank distribution and depth metrics
License
This project is licensed under the Creative Commons Attribution-NonCommercial 4.0 International License (CC BY-NC 4.0).
Contributing
Contributions are welcome! Please feel free to submit issues or pull requests.
Acknowledgments
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file ncbi_tree-1.1.0.tar.gz.
File metadata
- Download URL: ncbi_tree-1.1.0.tar.gz
- Upload date:
- Size: 21.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5c12af945d3081ddc5d9a36f15836411441b6e608798718ce09863a106eed7a3
|
|
| MD5 |
b5f308ba3559f1954022261f527c4d7b
|
|
| BLAKE2b-256 |
c6d316f35bf997c00428360ff51c929814dd1a72f7f490cd9722257c13768a9b
|