Skip to main content

Routines for loading, saving, and manipulating taxonomic trees

Project description

Taxonomy

PyPI version Crates version CI

This is a Rust library for reading, writing, and editing biological taxonomies. There are associated Python bindings for accessing most of the functionality from Python.

This library was developed initially as a component in One Codex's metagenomic classification pipeline before being refactored out, expanded, and open-sourced. It is designed such that it can be used as is with a number of taxonomic formats or the Taxonomy trait it provides can be used to add last common ancestor, traversal, etc. methods to a downstream package's taxonomy implementation.

The library ships with a number of features:

  • Common support for taxonomy handling across Rust and Python
  • Fast and low(er) memory usage
  • NCBI taxonomy, JSON ("tree" and "node_link_data" formats), Newick, and PhyloXML support
  • Easily extensible (in Rust) to support other formats and operations

Installation

Rust

This library can be added to an existing Cargo.toml file and installed straight from crates.io.

Python

You can install the Python bindings directly from PyPI (binaries are only built for select architectures) with:

pip install taxonomy

Python Usage

The Python taxonomy API can open and manipulate all of the formats from the Rust library. Note that Taxonomy IDs in NCBI format are integers, but they're converted to strings on import. We find working with "string taxonomy IDs" greatly simplifies inter-operation between different taxonomy systems.

Loading a taxonomy

Taxonomy can be loaded from a variety of sources.

  1. Taxonomy.from_newick(value: str): loads a Taxonomy from a Newick-encoded string.

  2. Taxonomy.from_ncbi(ncbi_filder: str): loads a Taxonomy from a pair of NCBI dump files. The folder needs to contain the individual files in the NCBI taxonomy directory (e.g. nodes.dmp and names.dmp).

  3. Taxonomy.from_json(value: str, /, json_pointer: str): loads a Taxonomy from a JSON-encoded string. The format can either be of the tree or node_link_data types and will be automatically detected (more details on both formats on the documentation. If json_pointer is specified, the JSON will be traversed to that sub-object before being parsed as a taxonomy.

  4. Taxonomy.from_phyloxml(value: &str): loads a Taxonomy from a PhyloXML-encoded string. Experimental

  5. Taxonomy.from_gtdb(value: &str): loads a Taxonomy from a GTDB-encoded string. Experimental

Exporting a taxonomy

Assuming that the taxonomy has been instantiated as a variable named tax.

  1. tax.to_newick(): exports a Taxonomy as a Newick-encoded byte string.
  2. tax.to_json_tree(): exports a Taxonomy as a JSON-encoded byte string in a tree format
  3. tax.to_json_node_links(): exports a Taxonomy as a JSON-encoded byte string in a node links format

Using a taxonomy

Assuming that the taxonomy has been instantiated as a variable named tax. Note that TaxonomyNode is a class with the following schema:

class TaxonomyNode:
    id: str
    name: str
    parent: Optional[str]
    rank: str

Note that tax_id in parameters passed in functions described below are string but for example in the case of NCBI need to be essentially quoting integers: 562 -> "562". If you loaded a taxonomy via JSON and you had additional data in your file, you can access it via indexing, node["readcount"] for example.

tax.clone() -> Taxonomy

Return a new taxonomy, equivalent to a deep copy.

tax.root -> TaxonomyNode

Points to the root of the taxonomy

tax.parent(tax_id: str, /, at_rank: str) -> Optional[TaxonomyNode]

Return the immediate parent TaxonomyNode of the node id.

If at_rank is provided, scan all the nodes in the node's lineage and return the parent id at that rank.

Examples:

parent = tax.parent("612")
parent = tax.parent("612", at_rank="species")
parent = tax.parent("612")
# Both variables will be `None` if we can't find the parent
parent = tax.parent("unknown")

tax.parent_with_distance(tax_id: str, /, at_rank: str) -> (Optional[TaxonomyNode], Optional[float])

Same as parent but return the distance in addition, as a (TaxonomyNode, float) tuple.

tax.node(tax_id: str) -> Optional[TaxonomyNode]

Returns the node at that id. Returns None if not found. You can also use indexing to accomplish that: tax["some_id"] but this will raise an exception if the node is not found.

tax.find_all_by_name(name: str) -> List[TaxonomyNode]

Returns all the nodes with that name. In NCBI, it only accounts for scientific names and not synonyms.

tax.children(tax_id: str) -> List[TaxonomyNode]

Returns all direct nodes below the given tax id.

tax.descendants(tax_id: str) -> List[TaxonomyNode]

Returns all nodes below the given tax id. Equivalent to running tax.children recursively on the initial result of tax.children(tax_id).

tax.lineage(tax_id: str) -> List[TaxonomyNode]

Returns all nodes above the given tax id, including itself.

tax.parents(tax_id: str) -> List[TaxonomyNode]

Returns all nodes above the given tax id.

tax.lca(id1: str, id2: str) -> Optional[TaxonomyNode]

Returns the lowest common ancestor for the 2 given nodes.

tax.prune(keep: List[str], remove: List[str])-> Taxonomy

Return a copy of the taxonomy containing:

  • only the nodes in keep and their parents if provided
  • all of the nodes except those in remove and their children if provided

tax.remove_node(tax_id: str)

Remove the node from the tree, re-attaching parents as needed: only a single node is removed.

tax.add_node(parent_tax_id: str, new_tax_id: str, /, name: str, rank: str)

Add a new node to the tree at the parent provided.

edit_node(tax_id: str, /, name: str, rank: str, parent_id: str, parent_dist: float)

Edit properties on a taxonomy node.

internal_index(tax_id: str)

Return internal integer index used by some applications. For the JSON node-link format, this is the positional index of each node in the nodes array.

Exceptions

Only one exception is raised intentionally by the library: TaxonomyError. If you get a pyo3_runtime.PanicException (or anything with pyo3 in its name), this is a bug in the underlying Rust library, please open an issue.

Development

Rust

There is a test suite runable with cargo test. To test the Python-bindings you need to use the additional python_test feature: cargo test --features python_test.

Python

To work on the Python library on a Mac OS X/Unix system (requires Python 3):

# you need the nightly version of Rust installed
curl https://sh.rustup.rs -sSf | sh

# finally, install the library in the local virtualenv
maturin develop --features python

# or using pip
pip install .

Building binary wheels and pushing to PyPI

There is a Github Workflow that will build Python wheels for macOS (x86 and ARM) and Ubuntu (x86). To run, create a new release.

To build wheels locally, run:

uv venv --python 3.11 #
source .venv/bin/activate
uv build

# or, using maturin by itself:

maturin build --features python --release --strip^

Other Taxonomy Libraries

There are taxonomic toolkits for other programming languages that offer different features and provided some inspiration for this library:

ETE Toolkit (http://etetoolkit.org/) A Python taxonomy library

Taxize (https://ropensci.github.io/taxize-book/) An R toolkit for working with taxonomic data

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

taxonomy-0.10.1.tar.gz (113.1 kB view details)

Uploaded Source

Built Distributions

taxonomy-0.10.1-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (541.7 kB view details)

Uploaded CPython 3.13manylinux: glibc 2.17+ x86-64

taxonomy-0.10.1-cp313-cp313-macosx_11_0_arm64.whl (482.4 kB view details)

Uploaded CPython 3.13macOS 11.0+ ARM64

taxonomy-0.10.1-cp313-cp313-macosx_10_7_x86_64.whl (498.9 kB view details)

Uploaded CPython 3.13macOS 10.7+ x86-64

taxonomy-0.10.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (541.7 kB view details)

Uploaded CPython 3.12manylinux: glibc 2.17+ x86-64

taxonomy-0.10.1-cp312-cp312-macosx_11_0_arm64.whl (482.4 kB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

taxonomy-0.10.1-cp312-cp312-macosx_10_7_x86_64.whl (498.9 kB view details)

Uploaded CPython 3.12macOS 10.7+ x86-64

taxonomy-0.10.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (541.5 kB view details)

Uploaded CPython 3.11manylinux: glibc 2.17+ x86-64

taxonomy-0.10.1-cp311-cp311-macosx_11_0_arm64.whl (482.3 kB view details)

Uploaded CPython 3.11macOS 11.0+ ARM64

taxonomy-0.10.1-cp311-cp311-macosx_10_7_x86_64.whl (498.8 kB view details)

Uploaded CPython 3.11macOS 10.7+ x86-64

File details

Details for the file taxonomy-0.10.1.tar.gz.

File metadata

  • Download URL: taxonomy-0.10.1.tar.gz
  • Upload date:
  • Size: 113.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: maturin/1.8.3

File hashes

Hashes for taxonomy-0.10.1.tar.gz
Algorithm Hash digest
SHA256 a77ac372b1d9e4230a0fb0879707ee680327b7032d320a43434ebd15cd57536a
MD5 1b6293311d5a78c589d076659b10be24
BLAKE2b-256 93dc150fdd3664738e5095e3bcb89cc3923b516f95fd36f3579bc4a7c8e5b2a3

See more details on using hashes here.

File details

Details for the file taxonomy-0.10.1-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for taxonomy-0.10.1-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 019473a0394d2017ff6729a6bd479a283ec9666cef96f0a55d4b77074ba2f0dd
MD5 9c628e6ffc94c005fea18afe67af29e9
BLAKE2b-256 73d71d737d2df2324b78d0f632c473a2a3fa78fef561e05ce148ec859420e35e

See more details on using hashes here.

File details

Details for the file taxonomy-0.10.1-cp313-cp313-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for taxonomy-0.10.1-cp313-cp313-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 605aa594d4ee1174a2f82472aa4507f687828a3584bb3041ab6b1f67f795fd7a
MD5 09d24dd58f3e5204183d0bbee62d44bb
BLAKE2b-256 579de5da5e87ebd76d7de40a3d6ec8f0377e2b7d13e610eaf0d2f8f2b192599d

See more details on using hashes here.

File details

Details for the file taxonomy-0.10.1-cp313-cp313-macosx_10_7_x86_64.whl.

File metadata

File hashes

Hashes for taxonomy-0.10.1-cp313-cp313-macosx_10_7_x86_64.whl
Algorithm Hash digest
SHA256 176c5293057aa790ac917355df10b7d7db0deed3f0a343b1235c8a525d91cb71
MD5 72cbdf6d1b4138b0a8cd6b0410e6f4f5
BLAKE2b-256 a853c3fbf6480e10ac79807d0db931470996dc8f61d3ae57872b9c210d99a7c8

See more details on using hashes here.

File details

Details for the file taxonomy-0.10.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for taxonomy-0.10.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 a2ba55c8ba67a31aab71a685d5dec05c027452fcb3aa01c66ba67cd0724831f3
MD5 0cac9b5b93df0f2f86ef9d48a3e16ed7
BLAKE2b-256 c12d6658ff35b2d2b6d050916d813531e2d382172ea1bd2095260c75da009613

See more details on using hashes here.

File details

Details for the file taxonomy-0.10.1-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for taxonomy-0.10.1-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 2b9298e7b30cfbdd047b8129cd4ad0debee8c761211fe961d7023b736acdd6d3
MD5 3d0e3dfc64a3ee7e3d01e43402a8b216
BLAKE2b-256 20f1522bcd813be86300dd26e82d1fc5a82e0c53bcef9766c37666fe7c07b09a

See more details on using hashes here.

File details

Details for the file taxonomy-0.10.1-cp312-cp312-macosx_10_7_x86_64.whl.

File metadata

File hashes

Hashes for taxonomy-0.10.1-cp312-cp312-macosx_10_7_x86_64.whl
Algorithm Hash digest
SHA256 21bccbe59cbba584ad93e5f35d768d90c1a03a428ff8a8f99a85092454450a2f
MD5 cfbe8f68c236f29c028e84a2d1f7dd7b
BLAKE2b-256 7dbf77fb910333401617dbdcb7996bc2fff7473bcde57a56cf6ee0c3bfbc751a

See more details on using hashes here.

File details

Details for the file taxonomy-0.10.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for taxonomy-0.10.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 1dda295f30111a6a87ff22b3c5d29fa9a055a26668fdd25be0c121952f37509a
MD5 aa6f36afb8cf56f5289e0bb6a092c1b7
BLAKE2b-256 45c77aa8eb23f7e45a1c4e8ff2ef4afb3b8dbc27b8694916fcdfd1322fefea04

See more details on using hashes here.

File details

Details for the file taxonomy-0.10.1-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for taxonomy-0.10.1-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 a79bff100e778f2488da68c40e69c25d8be0e77d36a6415fb646efefce52b680
MD5 a46e7dba627a543906433d6dd9469001
BLAKE2b-256 efc0e2206ae58cc4ad1a1c657620836d145b4e324ef5b31fc9ceef916b6a3584

See more details on using hashes here.

File details

Details for the file taxonomy-0.10.1-cp311-cp311-macosx_10_7_x86_64.whl.

File metadata

File hashes

Hashes for taxonomy-0.10.1-cp311-cp311-macosx_10_7_x86_64.whl
Algorithm Hash digest
SHA256 df470468411916165e83ac73ef9da5f79d47e55e2aec717f8dcb549ad12116b2
MD5 bb530ea419bd927d355fe0d7009d13ad
BLAKE2b-256 206f255cb8a14d05d629e5456a33e371e99d8a5fcb7da8b2afb15f305c502863

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page