Skip to main content

JolTax is a high-performance, vectorized taxonomy library for Python.

Project description

joltax logo

Bioconda PyPI version

JolTax

A vectorized taxonomy library for Python.

JolTax is a tool for working with large taxonomies like NCBI or GTDB. It stores the entire tree in contiguous NumPy arrays, enabling fast (like a jolt) traversals, clade queries, and mass annotation of datasets using Polars.

Key Features

  • Search: Exact and approximate matching to find TaxIDs from strings.
  • Clade Queries: Instantly identify all descendants of a TaxID.
  • Annotate: Instantly return a Polars dataframe with the complete canonical taxonomy (for any number of TaxIDs), for easy annotation of your own datasets.
  • Batch Processing: Get Lowest Common Ancestor (LCA) and node-to-node distances for thousands of TaxID pairs at once.
  • Array-Based Core: Uses NumPy operations for property lookups and tree traversals.
  • Pre-build: Build and save (cache) your taxonomies for instant loading later.

If you prefer an interactive experience for building and exploring taxonomies, a command-line interface is also available: JolTax-CLI.

Installation

From Bioconda (Recommended)

conda install -c bioconda joltax

From PyPI

pip install joltax

Requires: numpy, polars, rapidfuzz.

Quick Start

from joltax import JolTree

# Build from NCBI DMP files (dir where names.dmp and nodes.dmp are)
tree = JolTree('/path/to/ncbi/taxonomy/')

# Save a binary cache in dir "taxonomy_cache" for instant loading later
tree.save('taxonomy_cache')

# Load the cache
tree = JolTree.load('taxonomy_cache')

# Find a TaxID by name (fuzzy=False by default)
results = tree.search_name('Escherchia', fuzzy=True)

# Annotate a list of TaxIDs with their full canonical rank lineages
# Returns a Polars DataFrame with columns prefixed by 't_' (e.g., t_phylum, t_genus)
df = tree.annotate([9606, 562])

# Batch LCA calculation
lcas = tree.get_lca_batch(ids1, ids2)

Documentation

For a detailed API reference and a step-by-step guide, see USAGE.md.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

joltax-0.2.0.tar.gz (19.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

joltax-0.2.0-py3-none-any.whl (15.1 kB view details)

Uploaded Python 3

File details

Details for the file joltax-0.2.0.tar.gz.

File metadata

  • Download URL: joltax-0.2.0.tar.gz
  • Upload date:
  • Size: 19.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.11

File hashes

Hashes for joltax-0.2.0.tar.gz
Algorithm Hash digest
SHA256 11fccdf67dc7e11194cb540785425b7f13b8fced44e8e01ee7a7654862202716
MD5 40d879897549c9cc2ab2932de8fbf6fb
BLAKE2b-256 6b445bacbfad3c20afa9da4bf93d80702ac16504023608b1e6b0c7bb41e98b81

See more details on using hashes here.

File details

Details for the file joltax-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: joltax-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 15.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.11

File hashes

Hashes for joltax-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 64625a57c7bd2278d820e6cbead9b7a3080982ff4ccf4e7a2d6b60d708801ecb
MD5 a03cddec6075a592cede68d5cea7fb71
BLAKE2b-256 cc79b5daa63c922c713ed5edcdb2fa053a5513ef8a50789ceaa94e1a8be83e57

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page