Skip to main content

Routines for loading, saving, and manipulating taxonomic trees

Project description

Taxonomy

PyPI version Crates version CI

This is a Rust library for reading, writing, and editing biological taxonomies. There are associated Python bindings for accessing most of the functionality from Python.

This library was developed initially as a component in One Codex's metagenomic classification pipeline before being refactored out, expanded, and open-sourced. It is designed such that it can be used as is with a number of taxonomic formats or the Taxonomy trait it provides can be used to add last common ancestor, traversal, etc. methods to a downstream package's taxonomy implementation.

The library ships with a number of features:

  • Common support for taxonomy handling across Rust and Python
  • Fast and low(er) memory usage
  • NCBI taxonomy, JSON ("tree" and "node_link_data" formats), Newick, and PhyloXML support
  • Easily extensible (in Rust) to support other formats and operations

Installation

Rust

This library can be added to an existing Cargo.toml file and installed straight from crates.io.

Python

You can install the Python bindings directly from PyPI (binaries are only built for select architectures) with:

pip install taxonomy

Python Usage

The Python taxonomy API can open and manipulate all of the formats from the Rust library. Note that Taxonomy IDs in NCBI format are integers, but they're converted to strings on import. We find working with "string taxonomy IDs" greatly simplifies inter-operation between different taxonomy systems.

Loading a taxonomy

Taxonomy can be loaded from a variety of sources.

  1. Taxonomy.from_newick(value: str): loads a Taxonomy from a Newick-encoded string.

  2. Taxonomy.from_ncbi(ncbi_filder: str): loads a Taxonomy from a pair of NCBI dump files. The folder needs to contain the individual files in the NCBI taxonomy directory (e.g. nodes.dmp and names.dmp).

  3. Taxonomy.from_json(value: str, /, json_pointer: str): loads a Taxonomy from a JSON-encoded string. The format can either be of the tree or node_link_data types and will be automatically detected (more details on both formats on the documentation. If json_pointer is specified, the JSON will be traversed to that sub-object before being parsed as a taxonomy.

  4. Taxonomy.from_phyloxml(value: &str): loads a Taxonomy from a PhyloXML-encoded string. Experimental

  5. Taxonomy.from_gtdb(value: &str): loads a Taxonomy from a GTDB-encoded string. Experimental

Exporting a taxonomy

Assuming that the taxonomy has been instantiated as a variable named tax.

  1. tax.to_newick(): exports a Taxonomy as a Newick-encoded byte string.
  2. tax.to_json_tree(): exports a Taxonomy as a JSON-encoded byte string in a tree format
  3. tax.to_json_node_links(): exports a Taxonomy as a JSON-encoded byte string in a node links format

Using a taxonomy

Assuming that the taxonomy has been instantiated as a variable named tax. Note that TaxonomyNode is a class with the following schema:

class TaxonomyNode:
    id: str
    name: str
    parent: Optional[str]
    rank: str

Note that tax_id in parameters passed in functions described below are string but for example in the case of NCBI need to be essentially quoting integers: 562 -> "562". If you loaded a taxonomy via JSON and you had additional data in your file, you can access it via indexing, node["readcount"] for example.

tax.clone() -> Taxonomy

Return a new taxonomy, equivalent to a deep copy.

tax.root -> TaxonomyNode

Points to the root of the taxonomy

tax.parent(tax_id: str, /, at_rank: str) -> Optional[TaxonomyNode]

Return the immediate parent TaxonomyNode of the node id.

If at_rank is provided, scan all the nodes in the node's lineage and return the parent id at that rank.

Examples:

parent = tax.parent("612")
parent = tax.parent("612", at_rank="species")
parent = tax.parent("612")
# Both variables will be `None` if we can't find the parent
parent = tax.parent("unknown")

tax.parent_with_distance(tax_id: str, /, at_rank: str) -> (Optional[TaxonomyNode], Optional[float])

Same as parent but return the distance in addition, as a (TaxonomyNode, float) tuple.

tax.node(tax_id: str) -> Optional[TaxonomyNode]

Returns the node at that id. Returns None if not found. You can also use indexing to accomplish that: tax["some_id"] but this will raise an exception if the node is not found.

tax.find_all_by_name(name: str) -> List[TaxonomyNode]

Returns all the nodes with that name. In NCBI, it only accounts for scientific names and not synonyms.

tax.children(tax_id: str) -> List[TaxonomyNode]

Returns all direct nodes below the given tax id.

tax.descendants(tax_id: str) -> List[TaxonomyNode]

Returns all nodes below the given tax id. Equivalent to running tax.children recursively on the initial result of tax.children(tax_id).

tax.lineage(tax_id: str) -> List[TaxonomyNode]

Returns all nodes above the given tax id, including itself.

tax.parents(tax_id: str) -> List[TaxonomyNode]

Returns all nodes above the given tax id.

tax.lca(id1: str, id2: str) -> Optional[TaxonomyNode]

Returns the lowest common ancestor for the 2 given nodes.

tax.prune(keep: List[str], remove: List[str])-> Taxonomy

Return a copy of the taxonomy containing:

  • only the nodes in keep and their parents if provided
  • all of the nodes except those in remove and their children if provided

tax.remove_node(tax_id: str)

Remove the node from the tree, re-attaching parents as needed: only a single node is removed.

tax.add_node(parent_tax_id: str, new_tax_id: str, /, name: str, rank: str)

Add a new node to the tree at the parent provided.

edit_node(tax_id: str, /, name: str, rank: str, parent_id: str, parent_dist: float)

Edit properties on a taxonomy node.

internal_index(tax_id: str)

Return internal integer index used by some applications. For the JSON node-link format, this is the positional index of each node in the nodes array.

Exceptions

Only one exception is raised intentionally by the library: TaxonomyError. If you get a pyo3_runtime.PanicException (or anything with pyo3 in its name), this is a bug in the underlying Rust library, please open an issue.

Development

Rust

There is a test suite runable with cargo test. To test the Python-bindings you need to use the additional python_test feature: cargo test --features python_test.

Python

To work on the Python library on a Mac OS X/Unix system (requires Python 3):

# you need the nightly version of Rust installed
curl https://sh.rustup.rs -sSf | sh

# finally, install the library in the local virtualenv
maturin develop --features python

# or using pip
pip install .

Building binary wheels and pushing to PyPI

There is a Github Workflow that will build Python wheels for macOS (x86 and ARM) and Ubuntu (x86). To run, create a new release.

To build wheels locally, run:

uv venv --python 3.11 #
source .venv/bin/activate
uv build

# or, using maturin by itself:

maturin build --features python --release --strip^

Other Taxonomy Libraries

There are taxonomic toolkits for other programming languages that offer different features and provided some inspiration for this library:

ETE Toolkit (http://etetoolkit.org/) A Python taxonomy library

Taxize (https://ropensci.github.io/taxize-book/) An R toolkit for working with taxonomic data

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

taxonomy-0.10.2.tar.gz (114.1 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

taxonomy-0.10.2-cp314-cp314-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (536.6 kB view details)

Uploaded CPython 3.14manylinux: glibc 2.17+ x86-64

taxonomy-0.10.2-cp314-cp314-macosx_11_0_arm64.whl (465.0 kB view details)

Uploaded CPython 3.14macOS 11.0+ ARM64

taxonomy-0.10.2-cp314-cp314-macosx_10_7_x86_64.whl (480.8 kB view details)

Uploaded CPython 3.14macOS 10.7+ x86-64

taxonomy-0.10.2-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (536.6 kB view details)

Uploaded CPython 3.13manylinux: glibc 2.17+ x86-64

taxonomy-0.10.2-cp313-cp313-macosx_11_0_arm64.whl (465.0 kB view details)

Uploaded CPython 3.13macOS 11.0+ ARM64

taxonomy-0.10.2-cp313-cp313-macosx_10_7_x86_64.whl (480.8 kB view details)

Uploaded CPython 3.13macOS 10.7+ x86-64

taxonomy-0.10.2-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (536.6 kB view details)

Uploaded CPython 3.12manylinux: glibc 2.17+ x86-64

taxonomy-0.10.2-cp312-cp312-macosx_11_0_arm64.whl (465.0 kB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

taxonomy-0.10.2-cp312-cp312-macosx_10_7_x86_64.whl (480.8 kB view details)

Uploaded CPython 3.12macOS 10.7+ x86-64

taxonomy-0.10.2-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (536.9 kB view details)

Uploaded CPython 3.11manylinux: glibc 2.17+ x86-64

taxonomy-0.10.2-cp311-cp311-macosx_11_0_arm64.whl (465.2 kB view details)

Uploaded CPython 3.11macOS 11.0+ ARM64

taxonomy-0.10.2-cp311-cp311-macosx_10_7_x86_64.whl (481.1 kB view details)

Uploaded CPython 3.11macOS 10.7+ x86-64

File details

Details for the file taxonomy-0.10.2.tar.gz.

File metadata

  • Download URL: taxonomy-0.10.2.tar.gz
  • Upload date:
  • Size: 114.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: maturin/1.11.5

File hashes

Hashes for taxonomy-0.10.2.tar.gz
Algorithm Hash digest
SHA256 495374b014503f5a79ac3350f6704e8e3eac30be579e794492fdc8e3b81b078e
MD5 9c6744fae9cf8ba9a6b3baac9bebe81e
BLAKE2b-256 652013e86d3fb839d5f808bcc2b34f615c0b0b885c49a98252d933ef03288da3

See more details on using hashes here.

File details

Details for the file taxonomy-0.10.2-cp314-cp314-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for taxonomy-0.10.2-cp314-cp314-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 61cf8b63e00645f5628a4f4c49cf1ccdd606f8680f6b09b8052f53644428f9bf
MD5 f1e16519c8f4fec06a15da0964fd9c82
BLAKE2b-256 7daac103c6ba392da184c0f12f22f29c0b8a4b857e075ccac0f93d13d8750581

See more details on using hashes here.

File details

Details for the file taxonomy-0.10.2-cp314-cp314-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for taxonomy-0.10.2-cp314-cp314-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 33ba5890bb09c19add1b3d950efeea203cb65d2bca4f66987ad15d1256a3eaca
MD5 6771593a935a2c946443f351945ec147
BLAKE2b-256 eb9fb5d1d57d750a6da8c19053697fc347cc909ffe4d76d3c262605b6e4aaad9

See more details on using hashes here.

File details

Details for the file taxonomy-0.10.2-cp314-cp314-macosx_10_7_x86_64.whl.

File metadata

File hashes

Hashes for taxonomy-0.10.2-cp314-cp314-macosx_10_7_x86_64.whl
Algorithm Hash digest
SHA256 134c3bad660655e21340683b8f3b9a660065dc158d8985d29e414baf11e7fcb6
MD5 848bca6942237beba17b076b62e06153
BLAKE2b-256 1ee73ff57f163af35612ab26200abac340d5dd4206c7e012983473f9387d6901

See more details on using hashes here.

File details

Details for the file taxonomy-0.10.2-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for taxonomy-0.10.2-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 947ca6b9a46fcf80da87bdf730ab2f03fbf1687e061061e29d6a3160933ed709
MD5 edd308cc899d3a08addf3626b4c4f5dd
BLAKE2b-256 1c4eb0e40af6950d6a2c615535e6a32615c0827efc9957a32dee1b32679a3a64

See more details on using hashes here.

File details

Details for the file taxonomy-0.10.2-cp313-cp313-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for taxonomy-0.10.2-cp313-cp313-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 3a9d1f64987f294bf5f6351d556c5d82fe82c4eecefb064e9986931c476f870a
MD5 d620fdefd3000693049b74fbf33a09dc
BLAKE2b-256 2648918c03fd74b020c9e87123cea8a319050b853bab4da9dfde43f72a4b4656

See more details on using hashes here.

File details

Details for the file taxonomy-0.10.2-cp313-cp313-macosx_10_7_x86_64.whl.

File metadata

File hashes

Hashes for taxonomy-0.10.2-cp313-cp313-macosx_10_7_x86_64.whl
Algorithm Hash digest
SHA256 d994f4b0e453c18f36ae598fcf75ed3bd8be673b2ce13cf16b5d9c327d9c5166
MD5 f08f07c688d102069fe659a161b387e1
BLAKE2b-256 fdb7a30eb252d0625d30163013aadc99c6d4665f81a85657fc518ffb938b070b

See more details on using hashes here.

File details

Details for the file taxonomy-0.10.2-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for taxonomy-0.10.2-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 10018a440b15eb44a658748e60ccf7f79f9db1c188e3170cdd086861c7eef9a7
MD5 a39f74e0caade9b815d84c24d16279f6
BLAKE2b-256 b73dd7de6b8b28859bae904bd67264055c4b4e3a2969c28d4ff6550b6189b881

See more details on using hashes here.

File details

Details for the file taxonomy-0.10.2-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for taxonomy-0.10.2-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 c8625fb41195842cdac2fe9b74087faf891efe7b36db2f884148c9efc0051d2a
MD5 95b98733114228cddf2a4b301bd7395b
BLAKE2b-256 d0b320f9c242ad72d076e7472f36696087f5178d8ab06e9ec77ab9829051788f

See more details on using hashes here.

File details

Details for the file taxonomy-0.10.2-cp312-cp312-macosx_10_7_x86_64.whl.

File metadata

File hashes

Hashes for taxonomy-0.10.2-cp312-cp312-macosx_10_7_x86_64.whl
Algorithm Hash digest
SHA256 c5101c2d039ee24f9ac0bc3e76a7b087a079a47d162330560dcdc6e8a898e59a
MD5 d72057f198f8d4d2be97ef58130ceb33
BLAKE2b-256 12bcc250a985617e0b4ddf2c8be85d11b10c41e516a1afb51c9fa1685affc1fb

See more details on using hashes here.

File details

Details for the file taxonomy-0.10.2-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for taxonomy-0.10.2-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 5fd89956593c152732245ce143bdbd859306e3381844bb497ff81eb73c91da83
MD5 5a100a58ffa02a90f0dc4d009368d9d9
BLAKE2b-256 0d7564f309d0b55eb4d181ff0a22a788d1a3795480a8b20f0863d9b63f329289

See more details on using hashes here.

File details

Details for the file taxonomy-0.10.2-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for taxonomy-0.10.2-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 886db88620b68efe7783cb88d3f83c1666438e085e9aae5341b8a3c586dd2269
MD5 f8c89dd0ba21d373a563bec9adf61984
BLAKE2b-256 2d53e1ee8bc84f4163ce52956ad7c73d7dd3851d2d2769feaabe4e17812141e9

See more details on using hashes here.

File details

Details for the file taxonomy-0.10.2-cp311-cp311-macosx_10_7_x86_64.whl.

File metadata

File hashes

Hashes for taxonomy-0.10.2-cp311-cp311-macosx_10_7_x86_64.whl
Algorithm Hash digest
SHA256 d8dfa70fb98e0fbd04d3bb74ae163557c4055984be4ebbdb7f6972b793329794
MD5 7f5e5f22ba67fab516602658bb2bedc1
BLAKE2b-256 bcddb057acd9a12290bad90a341ff5431c399f7ac6413370ead5cdbce43d2f0f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page