Skip to main content

JolTax is a high-performance, vectorized taxonomy library for Python.

Project description

joltax logo

JolTax

High-performance, vectorized taxonomy library for Python.

JolTax is a Python library designed to handle massive taxonomies with extreme efficiency. By representing taxonomy trees as contiguous NumPy arrays and leveraging Polars for mass data handling, it achieves lightning-fast traversals, constant-time clade queries, and rapid mass annotation of large datasets.

Key Features

  • Vectorized Performance: Uses hardware-accelerated NumPy operations for million-scale property lookups.
  • Memory Efficient: Optimized string store using Polars/Arrow reduces RAM footprint.
  • Fuzzy Name Search: Rapid fuzzy matching using RapidFuzz to find TaxIDs from names.
  • Instant Clade Queries: Quickly find all descendants of any node (even millions) using optimized range indexing.
  • Hyper-Vectorized LCA search: Lowest Common Ancestor (LCA) search and node-to-node distance calculations at lightning speeds.
  • Mass Annotation: Annotate massive TaxID tables with 2,000,000+ rows in under a second using Polars.

Quick Start

from joltax import JolTree

# Build and process the NCBI taxonomy
tree = JolTree(nodes_file='nodes.dmp', names_file='names.dmp')

# Save for instant loading next time
tree.save('my_taxonomy_cache')

# Re-load in milliseconds (using zero-copy Arrow IPC)
tree = JolTree.load('my_taxonomy_cache')

# Batch LCA (process 10,000 pairs in <10ms)
lcas = tree.get_lca_batch(ids1, ids2)

# Fuzzy search for a name (returns a Polars DataFrame)
results = tree.search_name('Escherchia', fuzzy=True)
print(results)

Installation

pip install joltax

Requires: numpy, polars, rapidfuzz.

Documentation

For a detailed API reference and a comprehensive "How-To" guide with example workflows, please see USAGE.md.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

joltax-0.1.2.tar.gz (13.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

joltax-0.1.2-py3-none-any.whl (11.1 kB view details)

Uploaded Python 3

File details

Details for the file joltax-0.1.2.tar.gz.

File metadata

  • Download URL: joltax-0.1.2.tar.gz
  • Upload date:
  • Size: 13.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.11

File hashes

Hashes for joltax-0.1.2.tar.gz
Algorithm Hash digest
SHA256 c18acefe5a65bf53b0bc867b9d11893e6e9c5c8ddc59e55db4be132b79aead07
MD5 5a5523fe155c12ea5703bb54b27b7c41
BLAKE2b-256 a2e6a16bbd055e7bff3d3377336a5b1db58724513d5a7df370f98f7ceaf20401

See more details on using hashes here.

File details

Details for the file joltax-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: joltax-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 11.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.11

File hashes

Hashes for joltax-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 5ae90419c7a440ce98dedd517869a0bc8298691c1c45bfd73105a1062c3ea447
MD5 4a1e0015efc2f18d93cca6351e8a4cdd
BLAKE2b-256 4dc630b728c8b483e4397f98206ef66381f7cad2d5fb5075e0781bcb635b12ae

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page