Skip to main content

JolTax is a high-performance, vectorized taxonomy library for Python.

Project description

joltax logo

Bioconda PyPI version

JolTax

A vectorized taxonomy library for Python.

JolTax is a tool for working with large taxonomies like NCBI or GTDB. It stores the entire tree in contiguous NumPy arrays, enabling fast (like a jolt) traversals, clade queries, and mass annotation of datasets using Polars.

Key Features

  • Search: Exact and approximate matching to find TaxIDs from strings.
  • Clade Queries: Instantly identify all descendants of a TaxID.
  • Annotate: Instantly return a Polars dataframe with the complete canonical taxonomy (for any number of TaxIDs), for easy annotation of your own datasets.
  • Batch Processing: Get Lowest Common Ancestor (LCA) and node-to-node distances for thousands of TaxID pairs at once.
  • Array-Based Core: Uses NumPy operations for property lookups and tree traversals.
  • Pre-build: Build and save (cache) your taxonomies for instant loading later.

If you prefer an interactive experience for building and exploring taxonomies, a command-line interface is also available: JolTax-CLI.

Installation

From Bioconda (Recommended)

conda install -c bioconda joltax

From PyPI

pip install joltax

Requires: numpy, polars, rapidfuzz.

Quick Start

from joltax import JolTree

# Build from NCBI DMP files (dir where names.dmp and nodes.dmp are)
tree = JolTree('/path/to/ncbi/taxonomy/')

# Save a binary cache in dir "taxonomy_cache" for instant loading later
tree.save('taxonomy_cache')

# Load the cache
tree = JolTree.load('taxonomy_cache')

# Find a TaxID by name (fuzzy=False by default)
results = tree.search_name('Escherchia', fuzzy=True)

# Annotate a list of TaxIDs with their full canonical rank lineages
# Returns a Polars DataFrame with columns prefixed by 't_' (e.g., t_phylum, t_genus)
df = tree.annotate([9606, 562])

# Batch LCA calculation
lcas = tree.get_lca_batch(ids1, ids2)

Documentation

For a detailed API reference and a step-by-step guide, see USAGE.md.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

joltax-0.3.0.tar.gz (19.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

joltax-0.3.0-py3-none-any.whl (15.6 kB view details)

Uploaded Python 3

File details

Details for the file joltax-0.3.0.tar.gz.

File metadata

  • Download URL: joltax-0.3.0.tar.gz
  • Upload date:
  • Size: 19.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.11

File hashes

Hashes for joltax-0.3.0.tar.gz
Algorithm Hash digest
SHA256 c023f729d04c3b6b826af48bdd008840568975d93b080dacd093af26c6a233bc
MD5 abf04a8ee399152cba240218daa6d017
BLAKE2b-256 5e7356c527d5cd9c04882f2eeff89eb1a3797dd299d0db570d67820fb4e90cf1

See more details on using hashes here.

File details

Details for the file joltax-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: joltax-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 15.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.11

File hashes

Hashes for joltax-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 e976a7915c0463bd367d3ba4e9d445786b050da0e671b1bc53ff8fbdef0f2e1a
MD5 fba84edfa8a0082b370721f240e047df
BLAKE2b-256 b279cd79168e23b12f2f6d0b7ef9e43a35fdc8140c6aa36b44ca020a7dc17846

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page