JolTax is a high-performance, vectorized taxonomy library for Python.
Project description
JolTax
A vectorized taxonomy library for Python.
JolTax is a tool for working with large taxonomies like NCBI or GTDB. It stores the entire tree in contiguous NumPy arrays, enabling fast (like a jolt) traversals, clade queries, and mass annotation of datasets using Polars.
Key Features
- Search: Exact and approximate matching to find TaxIDs from strings.
- Clade Queries: Instantly identify all descendants of a TaxID.
- Annotate: Instantly return a Polars dataframe with the complete canonical taxonomy (for any number of TaxIDs), for easy annotation of your own datasets.
- Batch Processing: Get Lowest Common Ancestor (LCA) and node-to-node distances for thousands of TaxID pairs at once.
- Array-Based Core: Uses NumPy operations for property lookups and tree traversals.
- Pre-build: Build and save (cache) your taxonomies for instant loading later.
If you prefer an interactive experience for building and exploring taxonomies, a command-line interface is also available: JolTax-CLI.
Installation
From Bioconda (Recommended)
conda install -c bioconda joltax
From PyPI
pip install joltax
Requires: numpy, polars, rapidfuzz.
Quick Start
from joltax import JolTree
# Build from NCBI DMP files (dir where names.dmp and nodes.dmp are)
tree = JolTree('/path/to/ncbi/taxonomy/')
# Save a binary cache in dir "taxonomy_cache" for instant loading later
tree.save('taxonomy_cache')
# Load the cache
tree = JolTree.load('taxonomy_cache')
# Find a TaxID by name (fuzzy=False by default)
results = tree.search_name('Escherchia', fuzzy=True)
# Annotate a list of TaxIDs with their full canonical rank lineages
# Returns a Polars DataFrame with columns prefixed by 't_' (e.g., t_phylum, t_genus)
df = tree.annotate([9606, 562])
# Batch LCA calculation
lcas = tree.get_lca_batch(ids1, ids2)
Documentation
For a detailed API reference and a step-by-step guide, see USAGE.md.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file joltax-0.3.0.tar.gz.
File metadata
- Download URL: joltax-0.3.0.tar.gz
- Upload date:
- Size: 19.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c023f729d04c3b6b826af48bdd008840568975d93b080dacd093af26c6a233bc
|
|
| MD5 |
abf04a8ee399152cba240218daa6d017
|
|
| BLAKE2b-256 |
5e7356c527d5cd9c04882f2eeff89eb1a3797dd299d0db570d67820fb4e90cf1
|
File details
Details for the file joltax-0.3.0-py3-none-any.whl.
File metadata
- Download URL: joltax-0.3.0-py3-none-any.whl
- Upload date:
- Size: 15.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e976a7915c0463bd367d3ba4e9d445786b050da0e671b1bc53ff8fbdef0f2e1a
|
|
| MD5 |
fba84edfa8a0082b370721f240e047df
|
|
| BLAKE2b-256 |
b279cd79168e23b12f2f6d0b7ef9e43a35fdc8140c6aa36b44ca020a7dc17846
|