Skip to main content

A high-performance, vectorized taxonomy library for Python.

Project description

joltax logo

joltax

High-performance, vectorized taxonomy library for Python.

JolTax is a Python library designed to handle massive taxonomies with extreme efficiency. By representing taxonomy trees as contiguous NumPy arrays and leveraging Polars for mass data handling, it achieves lightning-fast traversals, constant-time clade queries, and rapid mass annotation of large datasets.

Key Features

  • Vectorized Performance: Uses hardware-accelerated NumPy operations for million-scale property lookups.
  • Memory Efficient: Optimized string store using Polars/Arrow reduces RAM footprint.
  • Fuzzy Name Search: Rapid fuzzy matching using RapidFuzz to find TaxIDs from names.
  • Instant Clade Queries: Quickly find all descendants of any node (even millions) using optimized range indexing.
  • Hyper-Vectorized LCA search: Lowest Common Ancestor (LCA) search and node-to-node distance calculations at lightning speeds.
  • Mass Annotation: Annotate massive TaxID tables with 2,000,000+ rows in under a second using Polars.

Quick Start

from joltax.joltree import JolTree

# Build and process the NCBI taxonomy
tree = JolTree(nodes_file='nodes.dmp', names_file='names.dmp')

# Save for instant loading next time
tree.save('my_taxonomy_cache')

# Re-load in milliseconds (using zero-copy Arrow IPC)
tree = JolTree.load('my_taxonomy_cache')

# Batch LCA (process 10,000 pairs in <10ms)
lcas = tree.get_lca_batch(ids1, ids2)

# Fuzzy search for a name (returns a Polars DataFrame)
results = tree.search_name('Escherchia', fuzzy=True)
print(results)

Installation

cd joltax
pip install .

Requires: numpy, polars, rapidfuzz.

Documentation

For a detailed API reference and a comprehensive "How-To" guide with example workflows, please see USAGE.md.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

joltax-0.1.1.tar.gz (13.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

joltax-0.1.1-py3-none-any.whl (11.1 kB view details)

Uploaded Python 3

File details

Details for the file joltax-0.1.1.tar.gz.

File metadata

  • Download URL: joltax-0.1.1.tar.gz
  • Upload date:
  • Size: 13.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.11

File hashes

Hashes for joltax-0.1.1.tar.gz
Algorithm Hash digest
SHA256 2527e201493ec528ad83c64f6a5b3f49a6a6385ea5aa3e05f07aa5593b68d937
MD5 fd2415d8e8317c98046f8d38ff380a4c
BLAKE2b-256 f87e9e6b936451162a865b95dce9a467fd52e8f3264170d8d46522810932a8d8

See more details on using hashes here.

File details

Details for the file joltax-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: joltax-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 11.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.11

File hashes

Hashes for joltax-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 8b6b45e275d4190b56ca290a4bd7f16b02dd059e461f4207e16c55656173cc9a
MD5 0aa05b344e832c50728f0cf4982d33de
BLAKE2b-256 db0764d46890f51a5e785eba3e995868039b062c8bad8b37a2d50e589249e05e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page