Skip to main content

Routines for loading, saving, and manipulating taxonomic trees

Project description

Taxonomy

PyPI version Crates version CI

This is a Rust library for reading, writing, and editing biological taxonomies. There are associated Python bindings for accessing most of the functionality from Python.

This library was developed initially as a component in One Codex's metagenomic classification pipeline before being refactored out, expanded, and open-sourced. It is designed such that it can be used as is with a number of taxonomic formats or the Taxonomy trait it provides can be used to add last common ancestor, traversal, etc. methods to a downstream package's taxonomy implementation.

The library ships with a number of features:

  • Common support for taxonomy handling across Rust and Python
  • Fast and low(er) memory usage
  • NCBI taxonomy, JSON ("tree" and "node_link_data" formats), Newick, and PhyloXML support
  • Easily extensible (in Rust) to support other formats and operations

Installation

Rust

This library can be added to an existing Cargo.toml file and installed straight from crates.io.

Python

You can install the Python bindings directly from PyPI (binaries are only built for select architectures) with:

pip install taxonomy

Python Usage

The Python taxonomy API can open and manipulate all of the formats from the Rust library. Note that Taxonomy IDs in NCBI format are integers, but they're converted to strings on import. We find working with "string taxonomy IDs" greatly simplifies inter-operation between different taxonomy systems.

Loading a taxonomy

Taxonomy can be loaded from a variety of sources.

  1. Taxonomy.from_newick(value: str): loads a Taxonomy from a Newick-encoded string.

  2. Taxonomy.from_ncbi(ncbi_filder: str): loads a Taxonomy from a pair of NCBI dump files. The folder needs to contain the individual files in the NCBI taxonomy directory (e.g. nodes.dmp and names.dmp).

  3. Taxonomy.from_json(value: str, /, json_pointer: str): loads a Taxonomy from a JSON-encoded string. The format can either be of the tree or node_link_data types and will be automatically detected (more details on both formats on the documentation. If json_pointer is specified, the JSON will be traversed to that sub-object before being parsed as a taxonomy.

  4. Taxonomy.from_phyloxml(value: &str): loads a Taxonomy from a PhyloXML-encoded string. Experimental

  5. Taxonomy.from_gtdb(value: &str): loads a Taxonomy from a GTDB-encoded string. Experimental

Exporting a taxonomy

Assuming that the taxonomy has been instantiated as a variable named tax.

  1. tax.to_newick(): exports a Taxonomy as a Newick-encoded byte string.
  2. tax.to_json_tree(): exports a Taxonomy as a JSON-encoded byte string in a tree format
  3. tax.to_json_node_links(): exports a Taxonomy as a JSON-encoded byte string in a node links format

Using a taxonomy

Assuming that the taxonomy has been instantiated as a variable named tax. Note that TaxonomyNode is a class with the following schema:

class TaxonomyNode:
    id: str
    name: str
    parent: Optional[str]
    rank: str

Note that tax_id in parameters passed in functions described below are string but for example in the case of NCBI need to be essentially quoting integers: 562 -> "562". If you loaded a taxonomy via JSON and you had additional data in your file, you can access it via indexing, node["readcount"] for example.

tax.clone() -> Taxonomy

Return a new taxonomy, equivalent to a deep copy.

tax.root -> TaxonomyNode

Points to the root of the taxonomy

tax.parent(tax_id: str, /, at_rank: str) -> Optional[TaxonomyNode]

Return the immediate parent TaxonomyNode of the node id.

If at_rank is provided, scan all the nodes in the node's lineage and return the parent id at that rank.

Examples:

parent = tax.parent("612")
parent = tax.parent("612", at_rank="species")
parent = tax.parent("612")
# Both variables will be `None` if we can't find the parent
parent = tax.parent("unknown")

tax.parent_with_distance(tax_id: str, /, at_rank: str) -> (Optional[TaxonomyNode], Optional[float])

Same as parent but return the distance in addition, as a (TaxonomyNode, float) tuple.

tax.node(tax_id: str) -> Optional[TaxonomyNode]

Returns the node at that id. Returns None if not found. You can also use indexing to accomplish that: tax["some_id"] but this will raise an exception if the node is not found.

tax.find_all_by_name(name: str) -> List[TaxonomyNode]

Returns all the nodes with that name. In NCBI, it only accounts for scientific names and not synonyms.

tax.children(tax_id: str) -> List[TaxonomyNode]

Returns all direct nodes below the given tax id.

tax.descendants(tax_id: str) -> List[TaxonomyNode]

Returns all nodes below the given tax id. Equivalent to running tax.children recursively on the initial result of tax.children(tax_id).

tax.lineage(tax_id: str) -> List[TaxonomyNode]

Returns all nodes above the given tax id, including itself.

tax.parents(tax_id: str) -> List[TaxonomyNode]

Returns all nodes above the given tax id.

tax.lca(id1: str, id2: str) -> Optional[TaxonomyNode]

Returns the lowest common ancestor for the 2 given nodes.

tax.prune(keep: List[str], remove: List[str])-> Taxonomy

Return a copy of the taxonomy containing:

  • only the nodes in keep and their parents if provided
  • all of the nodes except those in remove and their children if provided

tax.remove_node(tax_id: str)

Remove the node from the tree, re-attaching parents as needed: only a single node is removed.

tax.add_node(parent_tax_id: str, new_tax_id: str, /, name: str, rank: str)

Add a new node to the tree at the parent provided.

edit_node(tax_id: str, /, name: str, rank: str, parent_id: str, parent_dist: float)

Edit properties on a taxonomy node.

internal_index(tax_id: str)

Return internal integer index used by some applications. For the JSON node-link format, this is the positional index of each node in the nodes array.

Exceptions

Only one exception is raised intentionally by the library: TaxonomyError. If you get a pyo3_runtime.PanicException (or anything with pyo3 in its name), this is a bug in the underlying Rust library, please open an issue.

Development

Rust

There is a test suite runable with cargo test. To test the Python-bindings you need to use the additional python_test feature: cargo test --features python_test.

Python

To work on the Python library on a Mac OS X/Unix system (requires Python 3):

# you need the nightly version of Rust installed
curl https://sh.rustup.rs -sSf | sh

# finally, install the library in the local virtualenv
maturin develop --features python

# or using pip
pip install .

Building binary wheels and pushing to PyPI

There is a Github Workflow that will build Python wheels for macOS (x86 and ARM) and Ubuntu (x86). To run, create a new release.

To build wheels locally, run:

uv venv --python 3.11 #
source .venv/bin/activate
uv build

# or, using maturin by itself:

maturin build --features python --release --strip^

Other Taxonomy Libraries

There are taxonomic toolkits for other programming languages that offer different features and provided some inspiration for this library:

ETE Toolkit (http://etetoolkit.org/) A Python taxonomy library

Taxize (https://ropensci.github.io/taxize-book/) An R toolkit for working with taxonomic data

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

taxonomy-0.10.5.tar.gz (116.5 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

taxonomy-0.10.5-cp314-cp314-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (526.6 kB view details)

Uploaded CPython 3.14manylinux: glibc 2.17+ x86-64

taxonomy-0.10.5-cp314-cp314-macosx_11_0_arm64.whl (467.1 kB view details)

Uploaded CPython 3.14macOS 11.0+ ARM64

taxonomy-0.10.5-cp314-cp314-macosx_10_7_x86_64.whl (481.3 kB view details)

Uploaded CPython 3.14macOS 10.7+ x86-64

taxonomy-0.10.5-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (526.6 kB view details)

Uploaded CPython 3.13manylinux: glibc 2.17+ x86-64

taxonomy-0.10.5-cp313-cp313-macosx_11_0_arm64.whl (467.1 kB view details)

Uploaded CPython 3.13macOS 11.0+ ARM64

taxonomy-0.10.5-cp313-cp313-macosx_10_7_x86_64.whl (481.3 kB view details)

Uploaded CPython 3.13macOS 10.7+ x86-64

taxonomy-0.10.5-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (526.6 kB view details)

Uploaded CPython 3.12manylinux: glibc 2.17+ x86-64

taxonomy-0.10.5-cp312-cp312-macosx_11_0_arm64.whl (467.1 kB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

taxonomy-0.10.5-cp312-cp312-macosx_10_7_x86_64.whl (481.3 kB view details)

Uploaded CPython 3.12macOS 10.7+ x86-64

taxonomy-0.10.5-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (526.8 kB view details)

Uploaded CPython 3.11manylinux: glibc 2.17+ x86-64

taxonomy-0.10.5-cp311-cp311-macosx_11_0_arm64.whl (467.0 kB view details)

Uploaded CPython 3.11macOS 11.0+ ARM64

taxonomy-0.10.5-cp311-cp311-macosx_10_7_x86_64.whl (481.3 kB view details)

Uploaded CPython 3.11macOS 10.7+ x86-64

File details

Details for the file taxonomy-0.10.5.tar.gz.

File metadata

  • Download URL: taxonomy-0.10.5.tar.gz
  • Upload date:
  • Size: 116.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: maturin/1.13.1

File hashes

Hashes for taxonomy-0.10.5.tar.gz
Algorithm Hash digest
SHA256 19c5f9de7832851f0000b24e94e84394e9302fa1c377c4e783a1f702f9a0c130
MD5 8f05f45e589cbeefca6fcc1a06204be7
BLAKE2b-256 9ba7f3686fb1067110d6528a304b170fd60307bfaa25ed3cb0f5465f59e8431b

See more details on using hashes here.

File details

Details for the file taxonomy-0.10.5-cp314-cp314-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for taxonomy-0.10.5-cp314-cp314-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 85f7ef596ece7997006b4d5ee94c261a0b94717cb7a7eac20e9d0e92b20244cd
MD5 a0ba9544438e9d7cf3bfa9335cf8eaef
BLAKE2b-256 ed5135e7a66c2577687be1902e4c4b3efdf1d209b93e45873d76f4d0ce8e63cb

See more details on using hashes here.

File details

Details for the file taxonomy-0.10.5-cp314-cp314-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for taxonomy-0.10.5-cp314-cp314-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 2dbdf23fce5bcfcd7a3c4b218589d80b380696cdecb39954f3e9cf4a6fff71ed
MD5 6773da7f9dccbc7433dfe8c641634cf5
BLAKE2b-256 69a536177d734a862cf5fd4f199ef96583434963984c187f2cc661ea9c73ebd9

See more details on using hashes here.

File details

Details for the file taxonomy-0.10.5-cp314-cp314-macosx_10_7_x86_64.whl.

File metadata

File hashes

Hashes for taxonomy-0.10.5-cp314-cp314-macosx_10_7_x86_64.whl
Algorithm Hash digest
SHA256 c5201b1d1af54b10b7b12cb6a807457733e2bbc15359589b5ab4ac168d88aaee
MD5 f94166bd1a0848d73d79193f97802fdc
BLAKE2b-256 0dc3f276a4468e39d2b5494c224eaa5be620b4c8103c30c37b186ba7ff874dcc

See more details on using hashes here.

File details

Details for the file taxonomy-0.10.5-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for taxonomy-0.10.5-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 7bfd9083b8df9a6460276e696e36b99e0aa8770931e12bc07647cd53da42256f
MD5 f9472042dda019b05e5d32a338f17355
BLAKE2b-256 2fbda66e44e25b9c64ff6386610727c3c9d543a2be8f8b48b3be083d504b802a

See more details on using hashes here.

File details

Details for the file taxonomy-0.10.5-cp313-cp313-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for taxonomy-0.10.5-cp313-cp313-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 7f175e89cf6bc31c84bc56f8f64ed53b25e96ce419bb659e9123d618acde173f
MD5 71c2a6db8f9b24cb2efe6f206b4467c2
BLAKE2b-256 fa64a2ff38071510c8d58d16f72f6b0fb22bce77968a59bf9d4086c5ba37737b

See more details on using hashes here.

File details

Details for the file taxonomy-0.10.5-cp313-cp313-macosx_10_7_x86_64.whl.

File metadata

File hashes

Hashes for taxonomy-0.10.5-cp313-cp313-macosx_10_7_x86_64.whl
Algorithm Hash digest
SHA256 4f75b5ed3cff85e4f6bc5a3f432d6bcc3ed59b207d325237d198d0f3b5fbcea1
MD5 aea0580854c4e538b7bedda553c6494f
BLAKE2b-256 3668a08f8c181dfcaf6d1041f4edf49831e371dab11021fd5819f89a0b064e28

See more details on using hashes here.

File details

Details for the file taxonomy-0.10.5-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for taxonomy-0.10.5-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 76d147251691c72473f32a0ce5773ca653e42b8c4473aabb737ada90bcc89a44
MD5 c91980c29642f79b5e86c0949703f62f
BLAKE2b-256 c0c2f205fd0ccd2b5f2d126d9d25da8421f302753f0caadd2d82cfd5e4ce2d33

See more details on using hashes here.

File details

Details for the file taxonomy-0.10.5-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for taxonomy-0.10.5-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 aa90851bffb57c7ea865901d7aff5e8f0365a78ce45dd9737f13f202ed5172a4
MD5 25ba660df424fbc38e9306ed2028a2ff
BLAKE2b-256 752259793fcae52b39ae29f85155d7ea0f3e69b3e055328db01bc2733157d9bd

See more details on using hashes here.

File details

Details for the file taxonomy-0.10.5-cp312-cp312-macosx_10_7_x86_64.whl.

File metadata

File hashes

Hashes for taxonomy-0.10.5-cp312-cp312-macosx_10_7_x86_64.whl
Algorithm Hash digest
SHA256 88de10ff1d33059391feaef24394128d1a6960897741cf3226e7b06533fc9778
MD5 f10ed5e60b41c27c520b6c5260ae52a8
BLAKE2b-256 bac75aa516eade230393c46419dc0f2d07ffe6b2e880d154ced0378b9dbf5ff6

See more details on using hashes here.

File details

Details for the file taxonomy-0.10.5-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for taxonomy-0.10.5-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 18cafa44b2f6068ff9f34592ed867274e56833caf8cb7c801541d94174062ca3
MD5 3f82a70968d7db7df2f0a2aff3b659bb
BLAKE2b-256 8978c21bea6b95ddc61880876cdd4261b0505856ef3898dfed098df4f25557e7

See more details on using hashes here.

File details

Details for the file taxonomy-0.10.5-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for taxonomy-0.10.5-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 742d14ae36b777ce7dc53694f8289baa4fbb0c862610e8b04de1bf917cd00309
MD5 a377914920806da5169aae5a01654c2b
BLAKE2b-256 9fd3becac3d944f4f4a526cf756b117417a001bc2570d5c891ddc42669410c65

See more details on using hashes here.

File details

Details for the file taxonomy-0.10.5-cp311-cp311-macosx_10_7_x86_64.whl.

File metadata

File hashes

Hashes for taxonomy-0.10.5-cp311-cp311-macosx_10_7_x86_64.whl
Algorithm Hash digest
SHA256 8dacbf81c05d339dc48db9093f2206f0d70ed677d99417896a3e94832173f1f2
MD5 63af52bef080cf47453630256dcd4c6b
BLAKE2b-256 a018938c6d773c9b813ac407ece9f498227db93d076db55c55d584a246e79163

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page