Skip to main content

Routines for loading, saving, and manipulating taxonomic trees

Project description

Taxonomy

PyPI version Crates version CI

This is a Rust library for reading, writing, and editing biological taxonomies. There are associated Python bindings for accessing most of the functionality from Python.

This library was developed initially as a component in One Codex's metagenomic classification pipeline before being refactored out, expanded, and open-sourced. It is designed such that it can be used as is with a number of taxonomic formats or the Taxonomy trait it provides can be used to add last common ancestor, traversal, etc. methods to a downstream package's taxonomy implementation.

The library ships with a number of features:

  • Common support for taxonomy handling across Rust and Python
  • Fast and low(er) memory usage
  • NCBI taxonomy, JSON ("tree" and "node_link_data" formats), Newick, and PhyloXML support
  • Easily extensible (in Rust) to support other formats and operations

Installation

Rust

This library can be added to an existing Cargo.toml file and installed straight from crates.io.

Python

You can install the Python bindings directly from PyPI (binaries are only built for select architectures) with:

pip install taxonomy

Python Usage

The Python taxonomy API can open and manipulate all of the formats from the Rust library. Note that Taxonomy IDs in NCBI format are integers, but they're converted to strings on import. We find working with "string taxonomy IDs" greatly simplifies inter-operation between different taxonomy systems.

Loading a taxonomy

Taxonomy can be loaded from a variety of sources.

  1. Taxonomy.from_newick(value: str): loads a Taxonomy from a Newick-encoded string.

  2. Taxonomy.from_ncbi(ncbi_filder: str): loads a Taxonomy from a pair of NCBI dump files. The folder needs to contain the individual files in the NCBI taxonomy directory (e.g. nodes.dmp and names.dmp).

  3. Taxonomy.from_json(value: str, /, json_pointer: str): loads a Taxonomy from a JSON-encoded string. The format can either be of the tree or node_link_data types and will be automatically detected (more details on both formats on the documentation. If json_pointer is specified, the JSON will be traversed to that sub-object before being parsed as a taxonomy.

  4. Taxonomy.from_phyloxml(value: &str): loads a Taxonomy from a PhyloXML-encoded string. Experimental

  5. Taxonomy.from_gtdb(value: &str): loads a Taxonomy from a GTDB-encoded string. Experimental

Exporting a taxonomy

Assuming that the taxonomy has been instantiated as a variable named tax.

  1. tax.to_newick(): exports a Taxonomy as a Newick-encoded byte string.
  2. tax.to_json_tree(): exports a Taxonomy as a JSON-encoded byte string in a tree format
  3. tax.to_json_node_links(): exports a Taxonomy as a JSON-encoded byte string in a node links format

Using a taxonomy

Assuming that the taxonomy has been instantiated as a variable named tax. Note that TaxonomyNode is a class with the following schema:

class TaxonomyNode:
    id: str
    name: str
    parent: Optional[str]
    rank: str

Note that tax_id in parameters passed in functions described below are string but for example in the case of NCBI need to be essentially quoting integers: 562 -> "562". If you loaded a taxonomy via JSON and you had additional data in your file, you can access it via indexing, node["readcount"] for example.

tax.clone() -> Taxonomy

Return a new taxonomy, equivalent to a deep copy.

tax.root -> TaxonomyNode

Points to the root of the taxonomy

tax.parent(tax_id: str, /, at_rank: str) -> Optional[TaxonomyNode]

Return the immediate parent TaxonomyNode of the node id.

If at_rank is provided, scan all the nodes in the node's lineage and return the parent id at that rank.

Examples:

parent = tax.parent("612")
parent = tax.parent("612", at_rank="species")
parent = tax.parent("612")
# Both variables will be `None` if we can't find the parent
parent = tax.parent("unknown")

tax.parent_with_distance(tax_id: str, /, at_rank: str) -> (Optional[TaxonomyNode], Optional[float])

Same as parent but return the distance in addition, as a (TaxonomyNode, float) tuple.

tax.node(tax_id: str) -> Optional[TaxonomyNode]

Returns the node at that id. Returns None if not found. You can also use indexing to accomplish that: tax["some_id"] but this will raise an exception if the node is not found.

tax.find_all_by_name(name: str) -> List[TaxonomyNode]

Returns all the nodes with that name. In NCBI, it only accounts for scientific names and not synonyms.

tax.children(tax_id: str) -> List[TaxonomyNode]

Returns all direct nodes below the given tax id.

tax.descendants(tax_id: str) -> List[TaxonomyNode]

Returns all nodes below the given tax id. Equivalent to running tax.children recursively on the initial result of tax.children(tax_id).

tax.lineage(tax_id: str) -> List[TaxonomyNode]

Returns all nodes above the given tax id, including itself.

tax.parents(tax_id: str) -> List[TaxonomyNode]

Returns all nodes above the given tax id.

tax.lca(id1: str, id2: str) -> Optional[TaxonomyNode]

Returns the lowest common ancestor for the 2 given nodes.

tax.prune(keep: List[str], remove: List[str])-> Taxonomy

Return a copy of the taxonomy containing:

  • only the nodes in keep and their parents if provided
  • all of the nodes except those in remove and their children if provided

tax.remove_node(tax_id: str)

Remove the node from the tree, re-attaching parents as needed: only a single node is removed.

tax.add_node(parent_tax_id: str, new_tax_id: str, /, name: str, rank: str)

Add a new node to the tree at the parent provided.

edit_node(tax_id: str, /, name: str, rank: str, parent_id: str, parent_dist: float)

Edit properties on a taxonomy node.

internal_index(tax_id: str)

Return internal integer index used by some applications. For the JSON node-link format, this is the positional index of each node in the nodes array.

Exceptions

Only one exception is raised intentionally by the library: TaxonomyError. If you get a pyo3_runtime.PanicException (or anything with pyo3 in its name), this is a bug in the underlying Rust library, please open an issue.

Development

Rust

There is a test suite runable with cargo test. To test the Python-bindings you need to use the additional python_test feature: cargo test --features python_test.

Python

To work on the Python library on a Mac OS X/Unix system (requires Python 3):

# you need the nightly version of Rust installed
curl https://sh.rustup.rs -sSf | sh

# finally, install the library in the local virtualenv
maturin develop --features python

# or using pip
pip install .

Building binary wheels and pushing to PyPI

There is a Github Workflow that will build Python wheels for macOS (x86 and ARM) and Ubuntu (x86). To run, create a new release.

To build wheels locally, run:

uv venv --python 3.11 #
source .venv/bin/activate
uv build

# or, using maturin by itself:

maturin build --features python --release --strip^

Other Taxonomy Libraries

There are taxonomic toolkits for other programming languages that offer different features and provided some inspiration for this library:

ETE Toolkit (http://etetoolkit.org/) A Python taxonomy library

Taxize (https://ropensci.github.io/taxize-book/) An R toolkit for working with taxonomic data

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

taxonomy-0.10.4.tar.gz (116.1 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

taxonomy-0.10.4-cp314-cp314-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (525.8 kB view details)

Uploaded CPython 3.14manylinux: glibc 2.17+ x86-64

taxonomy-0.10.4-cp314-cp314-macosx_11_0_arm64.whl (467.2 kB view details)

Uploaded CPython 3.14macOS 11.0+ ARM64

taxonomy-0.10.4-cp314-cp314-macosx_10_7_x86_64.whl (481.3 kB view details)

Uploaded CPython 3.14macOS 10.7+ x86-64

taxonomy-0.10.4-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (525.8 kB view details)

Uploaded CPython 3.13manylinux: glibc 2.17+ x86-64

taxonomy-0.10.4-cp313-cp313-macosx_11_0_arm64.whl (467.2 kB view details)

Uploaded CPython 3.13macOS 11.0+ ARM64

taxonomy-0.10.4-cp313-cp313-macosx_10_7_x86_64.whl (481.3 kB view details)

Uploaded CPython 3.13macOS 10.7+ x86-64

taxonomy-0.10.4-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (525.8 kB view details)

Uploaded CPython 3.12manylinux: glibc 2.17+ x86-64

taxonomy-0.10.4-cp312-cp312-macosx_11_0_arm64.whl (467.2 kB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

taxonomy-0.10.4-cp312-cp312-macosx_10_7_x86_64.whl (481.3 kB view details)

Uploaded CPython 3.12macOS 10.7+ x86-64

taxonomy-0.10.4-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (525.8 kB view details)

Uploaded CPython 3.11manylinux: glibc 2.17+ x86-64

taxonomy-0.10.4-cp311-cp311-macosx_11_0_arm64.whl (467.2 kB view details)

Uploaded CPython 3.11macOS 11.0+ ARM64

taxonomy-0.10.4-cp311-cp311-macosx_10_7_x86_64.whl (481.3 kB view details)

Uploaded CPython 3.11macOS 10.7+ x86-64

File details

Details for the file taxonomy-0.10.4.tar.gz.

File metadata

  • Download URL: taxonomy-0.10.4.tar.gz
  • Upload date:
  • Size: 116.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: maturin/1.12.6

File hashes

Hashes for taxonomy-0.10.4.tar.gz
Algorithm Hash digest
SHA256 0bbfbffad0b9fb9674cddb1a6cc08bca79d2d67b610f79c96bc3cec6a99bede6
MD5 3f48456d3f132a6758b7f46c38e6da7a
BLAKE2b-256 1f2b2b98e17b4cbcf30922b734751f1a4d6c4caec95dc390a001c8b8c6b3ad0f

See more details on using hashes here.

File details

Details for the file taxonomy-0.10.4-cp314-cp314-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for taxonomy-0.10.4-cp314-cp314-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 dbdc1f05a2c8f96f0f40d3b79ccb25654d425a3e4acddf1691981aa3030c1c9c
MD5 a307796bfd5377ef89bb77c0b1a64d4f
BLAKE2b-256 2084ee98d6d66f4276c8e82bd3115e82c0b1cb75244206a1585224e27006a93c

See more details on using hashes here.

File details

Details for the file taxonomy-0.10.4-cp314-cp314-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for taxonomy-0.10.4-cp314-cp314-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 28dca4a298590c446887832c4828119854c6ad2f589042b83b2dc09ea793f24f
MD5 09881ef27edfc0c74e414dfe41243d2d
BLAKE2b-256 6c76d3aea68b4acd0d3a226cbfde9b36de9c72b2880ad9b65b7a7874b45a9f52

See more details on using hashes here.

File details

Details for the file taxonomy-0.10.4-cp314-cp314-macosx_10_7_x86_64.whl.

File metadata

File hashes

Hashes for taxonomy-0.10.4-cp314-cp314-macosx_10_7_x86_64.whl
Algorithm Hash digest
SHA256 e732f92a65b7b7f1f3750483cf8b5b490d12c10fde2c8eaa9ee27926e34024c3
MD5 949963a7788f5e461f9a3b22cc0805af
BLAKE2b-256 7744e9f61fc4a71fe1421fec954a1b1915093667b086fa4e6b90cd7fac318ab9

See more details on using hashes here.

File details

Details for the file taxonomy-0.10.4-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for taxonomy-0.10.4-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 ca6ec57f6db36a970bac9848ee7aafe6d8a7f2c12b27ed3753669e18f7f29641
MD5 5d16f6dca453676321637fadccf75c9a
BLAKE2b-256 3ebc94aabe2e7218f3f2bc015c28dcf1f0e70d7325e0a2edf2a60398898f85a4

See more details on using hashes here.

File details

Details for the file taxonomy-0.10.4-cp313-cp313-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for taxonomy-0.10.4-cp313-cp313-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 ba90b9474759bc2a3da2284990e2119ba646bb51a3fbca61e4569b9cd413a1b4
MD5 89aecd814424dda85fd1bd944b83fe2c
BLAKE2b-256 8fd2fc1955152d5e70a888a5ce8e1668ecd2c6fb688ef175038296618447c262

See more details on using hashes here.

File details

Details for the file taxonomy-0.10.4-cp313-cp313-macosx_10_7_x86_64.whl.

File metadata

File hashes

Hashes for taxonomy-0.10.4-cp313-cp313-macosx_10_7_x86_64.whl
Algorithm Hash digest
SHA256 e4fd9b09c30603d99c7d579abcd5d5f2bf7388e02b702e71f2a87a3bc116edcb
MD5 3cc86b36e36bdff00d11390eadeab563
BLAKE2b-256 fa69788aa3ce8e9e0a0a8297d83e10b33780aa3b5232189119ca440121cf8af5

See more details on using hashes here.

File details

Details for the file taxonomy-0.10.4-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for taxonomy-0.10.4-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 b99cb358ff5b7384c48a997d82d73dca7498f858c12f7d6497038f776a0bcca4
MD5 8467547430660d4635af13694489e6f9
BLAKE2b-256 8504568f101dfa75f9163409459c52400a9b90ec60567069c0fdf60d17846d7a

See more details on using hashes here.

File details

Details for the file taxonomy-0.10.4-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for taxonomy-0.10.4-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 208a667a8b675344cb5f54d8e2aae818f00f953d3ec007923e2525372e824465
MD5 d53750d64ac88159451f63a6f7a69616
BLAKE2b-256 cc3b4b0122569e67b20b5dbd4901c8987c73c49acf3ab69d1cd03f94e51a3de7

See more details on using hashes here.

File details

Details for the file taxonomy-0.10.4-cp312-cp312-macosx_10_7_x86_64.whl.

File metadata

File hashes

Hashes for taxonomy-0.10.4-cp312-cp312-macosx_10_7_x86_64.whl
Algorithm Hash digest
SHA256 ea4a89703755c5bb86c87b33df1b97b3632219508a973262457bbf0482b9f17b
MD5 e8e3e51307dc0b3884ebb072fdb4d097
BLAKE2b-256 d7e194a3c8aa1cc23510693fe1e042b6836c61b05c515895cc87160e808ed534

See more details on using hashes here.

File details

Details for the file taxonomy-0.10.4-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for taxonomy-0.10.4-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 ec699724df10410d87f42aac86ef179f5ceae3cdbadb58f248c8163f7bc4be61
MD5 0bc3b10e675c5dfa0e344da4d438f025
BLAKE2b-256 899587bca42364387b6dd52cc409cbd474e39376c971bd33678ff7ed40a5128e

See more details on using hashes here.

File details

Details for the file taxonomy-0.10.4-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for taxonomy-0.10.4-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 888ae5a25367e212659cb5529ea07723ebbe7f725b8f68a0a081d6fb7ad512da
MD5 3d535005b1f94df6ba0538dbfc82b180
BLAKE2b-256 88f64bf6f23e69d0aa26dd6fca9eb1f86f2ca3e64689752a6692240941bc60e4

See more details on using hashes here.

File details

Details for the file taxonomy-0.10.4-cp311-cp311-macosx_10_7_x86_64.whl.

File metadata

File hashes

Hashes for taxonomy-0.10.4-cp311-cp311-macosx_10_7_x86_64.whl
Algorithm Hash digest
SHA256 83e44ee0c3dfa49d775914c1e9baf01ca4c9e315385583ca33c70d7df4a44f47
MD5 84c22007f3bd30f6c034d5f4b5143d5a
BLAKE2b-256 c39a87408f2cc0422b2055e39a465abd6ba9dfb1c53cf242053b3690a5001797

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page