JolTax is a high-performance, vectorized taxonomy library for Python.
Project description
JolTax
High-performance, vectorized taxonomy library for Python.
JolTax is a Python library designed to handle massive taxonomies with extreme efficiency. By representing taxonomy trees as contiguous NumPy arrays and leveraging Polars for mass data handling, it achieves lightning-fast traversals, constant-time clade queries, and rapid mass annotation of large datasets.
Key Features
- Vectorized Performance: Uses hardware-accelerated NumPy operations for million-scale property lookups.
- Memory Efficient: Optimized string store using Polars/Arrow reduces RAM footprint.
- Fuzzy Name Search: Rapid fuzzy matching using RapidFuzz to find TaxIDs from names.
- Instant Clade Queries: Quickly find all descendants of any node (even millions) using optimized range indexing.
- Hyper-Vectorized LCA search: Lowest Common Ancestor (LCA) search and node-to-node distance calculations at lightning speeds.
- Mass Annotation: Annotate massive TaxID tables with 2,000,000+ rows in under a second using Polars.
Quick Start
from joltax import JolTree
# Build and process the NCBI taxonomy
tree = JolTree(nodes_file='nodes.dmp', names_file='names.dmp')
# Save for instant loading next time
tree.save('my_taxonomy_cache')
# Re-load in milliseconds (using zero-copy Arrow IPC)
tree = JolTree.load('my_taxonomy_cache')
# Batch LCA (process 10,000 pairs in <10ms)
lcas = tree.get_lca_batch(ids1, ids2)
# Fuzzy search for a name (returns a Polars DataFrame)
results = tree.search_name('Escherchia', fuzzy=True)
print(results)
Installation
pip install joltax
Requires: numpy, polars, rapidfuzz.
Documentation
For a detailed API reference and a comprehensive "How-To" guide with example workflows, please see USAGE.md.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file joltax-0.1.2.tar.gz.
File metadata
- Download URL: joltax-0.1.2.tar.gz
- Upload date:
- Size: 13.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c18acefe5a65bf53b0bc867b9d11893e6e9c5c8ddc59e55db4be132b79aead07
|
|
| MD5 |
5a5523fe155c12ea5703bb54b27b7c41
|
|
| BLAKE2b-256 |
a2e6a16bbd055e7bff3d3377336a5b1db58724513d5a7df370f98f7ceaf20401
|
File details
Details for the file joltax-0.1.2-py3-none-any.whl.
File metadata
- Download URL: joltax-0.1.2-py3-none-any.whl
- Upload date:
- Size: 11.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5ae90419c7a440ce98dedd517869a0bc8298691c1c45bfd73105a1062c3ea447
|
|
| MD5 |
4a1e0015efc2f18d93cca6351e8a4cdd
|
|
| BLAKE2b-256 |
4dc630b728c8b483e4397f98206ef66381f7cad2d5fb5075e0781bcb635b12ae
|