Skip to main content

Routines for loading, saving, and manipulating taxonomic trees

Project description

Taxonomy

PyPI version Crates version CI

This is a Rust library for reading, writing, and editing biological taxonomies. There are associated Python bindings for accessing most of the functionality from Python.

This library was developed initially as a component in One Codex's metagenomic classification pipeline before being refactored out, expanded, and open-sourced. It is designed such that it can be used as is with a number of taxonomic formats or the Taxonomy trait it provides can be used to add last common ancestor, traversal, etc. methods to a downstream package's taxonomy implementation.

The library ships with a number of features:

  • Common support for taxonomy handling across Rust and Python
  • Fast and low(er) memory usage
  • NCBI taxonomy, JSON ("tree" and "node_link_data" formats), Newick, and PhyloXML support
  • Easily extensible (in Rust) to support other formats and operations

Installation

Rust

This library can be added to an existing Cargo.toml file and installed straight from crates.io.

Python

You can install the Python bindings directly from PyPI (binaries are only built for select architectures) with:

pip install taxonomy

Python Usage

The Python taxonomy API can open and manipulate all of the formats from the Rust library. Note that Taxonomy IDs in NCBI format are integers, but they're converted to strings on import. We find working with "string taxonomy IDs" greatly simplifies inter-operation between different taxonomy systems.

Loading a taxonomy

Taxonomy can be loaded from a variety of sources.

  1. Taxonomy.from_newick(value: str): loads a Taxonomy from a Newick-encoded string.

  2. Taxonomy.from_ncbi(ncbi_filder: str): loads a Taxonomy from a pair of NCBI dump files. The folder needs to contain the individual files in the NCBI taxonomy directory (e.g. nodes.dmp and names.dmp).

  3. Taxonomy.from_json(value: str, /, json_pointer: str): loads a Taxonomy from a JSON-encoded string. The format can either be of the tree or node_link_data types and will be automatically detected. If path is specified, the JSON will be traversed to that sub-object before being parsed as a taxonomy.

  4. Taxonomy.from_phyloxml(value: &str): loads a Taxonomy from a PhyloXML-encoded string. Experimental

Exporting a taxonomy

Assuming that the taxonomy has been instantiated as a variable named tax.

  1. tax.to_newick(): exports a Taxonomy as a Newick-encoded byte string.
  2. tax.to_json_tree(): exports a Taxonomy as a JSON-encoded byte string in a tree format
  3. tax.to_json_node_links(): exports a Taxonomy as a JSON-encoded byte string in a node links format

Using a taxonomy

Assuming that the taxonomy has been instantiated as a variable named tax. Note that TaxonomyNode is a class with the following schema:

class TaxonomyNode:
    id: str
    name: str
    parent: Optional[str]
    rank: str

Note that tax_id in parameters passed in functions described below are string but for example in the case of NCBI need to be essentially quoting integers: 562 -> "562". If you loaded a taxonomy via JSON and you had additional data in your file, you can access it via indexing, node["readcount"] for example.

tax.root -> TaxonomyNode

Points to the root of the taxonomy

tax.parent(tax_id: str, /, at_rank: str) -> Optional[TaxonomyNode]

Return the immediate parent TaxonomyNode of the node id.

If at_rank is provided, scan all the nodes in the node's lineage and return the parent id at that rank.

Examples:

parent = tax.parent("612")
parent = tax.parent("612", at_rank="species")
parent = tax.parent("612")
# Both variables will be `None` if we can't find the parent
parent = tax.parent("unknown")

tax.parent_with_distance(tax_id: str, /, at_rank: str) -> (Optional[TaxonomyNode], Optional[float])

Same as parent but return the distance in addition, as a (TaxonomyNode, float) tuple.

tax.node(tax_id: str) -> Optional[TaxonomyNode]

Returns the node at that id. Returns None if not found. You can also use indexing to accomplish that: tax["some_id"] but this will raise an exception if the node is not found.

tax.find_by_name(name: str) -> Optional[TaxonomyNode]

Returns the node with that name. Returns None if not found. In NCBI, it only accounts for scientific names and not synonyms.

tax.children(tax_id: str) -> List[TaxonomyNode]

Returns all nodes below the given tax id.

tax.lineage(tax_id: str) -> List[TaxonomyNode]

Returns all nodes above the given tax id, including itself.

tax.parents(tax_id: str) -> List[TaxonomyNode]

Returns all nodes above the given tax id.

tax.lca(id1: str, id2: str) -> Optional[TaxonomyNode]

Returns the lowest common ancestor for the 2 given nodes.

tax.prune(keep: List[str], remove: List[str])-> Taxonomy

Return a copy of the taxonomy containing:

  • only the nodes in keep and their parents if provided
  • all of the nodes except those in remove and their children if provided

tax.remove_node(tax_id: str)

Remove the node from the tree, re-attaching parents as needed: only a single node is removed.

tax.add_node(parent_tax_id: str, new_tax_id: str)

Add a new node to the tree at the parent provided.

edit_node(tax_id: str, /, name: str, rank: str, parent_id: str, parent_dist: float)

Edit properties on a taxonomy node.

Exceptions

Only one exception is raised intentionally by the library: TaxonomyError. If you get a pyo3_runtime.PanicException (or anything with pyo3 in its name), this is a bug in the underlying Rust library, please open an issue.

Development

Rust

There is a test suite runable with cargo test. To test the Python-bindings you need to use the additional python_test feature: cargo test --features python_test.

Python

To work on the Python library on a Mac OS X/Unix system (requires Python 3):

# you need the nightly version of Rust installed
curl https://sh.rustup.rs -sSf | sh

# finally, install the library in the local virtualenv
maturin develop --cargo-extra-args="--features=python"

# or using pip
pip install .

Building binary wheels and pushing to PyPI

# The Mac build requires switching through a few different python versions
maturin build --cargo-extra-args="--features=python" --release --strip

# The linux build is automated through cross-compiling in a docker image
docker run --rm -v $(pwd):/io ghcr.io/pyo3/maturin build --cargo-extra-args="--features=python" --release --strip
twine upload target/wheels/*

Other Taxonomy Libraries

There are taxonomic toolkits for other programming languages that offer different features and provided some inspiration for this library:

ETE Toolkit (http://etetoolkit.org/) A Python taxonomy library

Taxize (https://ropensci.github.io/taxize-book/) An R toolkit for working with taxonomic data

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

taxonomy-0.8.1.tar.gz (31.3 kB view details)

Uploaded Source

Built Distributions

taxonomy-0.8.1-pp37-pypy37_pp73-manylinux_2_5_x86_64.manylinux1_x86_64.whl (414.1 kB view details)

Uploaded PyPy manylinux: glibc 2.5+ x86-64

taxonomy-0.8.1-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (391.7 kB view details)

Uploaded CPython 3.10 manylinux: glibc 2.17+ ARM64

taxonomy-0.8.1-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.whl (412.0 kB view details)

Uploaded CPython 3.10 manylinux: glibc 2.5+ x86-64

taxonomy-0.8.1-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (391.7 kB view details)

Uploaded CPython 3.9 manylinux: glibc 2.17+ ARM64

taxonomy-0.8.1-cp39-cp39-manylinux_2_5_x86_64.manylinux1_x86_64.whl (412.3 kB view details)

Uploaded CPython 3.9 manylinux: glibc 2.5+ x86-64

taxonomy-0.8.1-cp39-cp39-macosx_11_0_arm64.whl (354.8 kB view details)

Uploaded CPython 3.9 macOS 11.0+ ARM64

taxonomy-0.8.1-cp39-cp39-macosx_10_7_x86_64.whl (382.5 kB view details)

Uploaded CPython 3.9 macOS 10.7+ x86-64

taxonomy-0.8.1-cp38-cp38-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (391.7 kB view details)

Uploaded CPython 3.8 manylinux: glibc 2.17+ ARM64

taxonomy-0.8.1-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.whl (411.5 kB view details)

Uploaded CPython 3.8 manylinux: glibc 2.5+ x86-64

taxonomy-0.8.1-cp37-cp37m-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (391.7 kB view details)

Uploaded CPython 3.7m manylinux: glibc 2.17+ ARM64

taxonomy-0.8.1-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.whl (411.5 kB view details)

Uploaded CPython 3.7m manylinux: glibc 2.5+ x86-64

taxonomy-0.8.1-cp37-cp37m-macosx_10_7_x86_64.whl (382.2 kB view details)

Uploaded CPython 3.7m macOS 10.7+ x86-64

File details

Details for the file taxonomy-0.8.1.tar.gz.

File metadata

  • Download URL: taxonomy-0.8.1.tar.gz
  • Upload date:
  • Size: 31.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.7.10

File hashes

Hashes for taxonomy-0.8.1.tar.gz
Algorithm Hash digest
SHA256 35aa3230754cb0ab82af8e020cd14404852ffa5d1e774bd05e723510d00e7112
MD5 62fba793ba5c0317c28052380980694e
BLAKE2b-256 1ad1cd680d296b30ab264d260494f226152c825509477532e17417640b299305

See more details on using hashes here.

File details

Details for the file taxonomy-0.8.1-pp37-pypy37_pp73-manylinux_2_5_x86_64.manylinux1_x86_64.whl.

File metadata

File hashes

Hashes for taxonomy-0.8.1-pp37-pypy37_pp73-manylinux_2_5_x86_64.manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 34494f017ea9edf56d0e30110747dc92be646813e24436975eafa86f17ccf791
MD5 9102e3a8422c54298de63208b171adff
BLAKE2b-256 0d5742c8c496f54979f160e6d71799396ec1ab31243f3dbe21dad0dd6074c0a3

See more details on using hashes here.

File details

Details for the file taxonomy-0.8.1-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for taxonomy-0.8.1-cp310-cp310-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 47a4fbbf784b6d0867327a6a8244d0587e981008ade0e2d768933359451e674b
MD5 9c6e22474c804b33dddfab18c42b6b4b
BLAKE2b-256 090b1309b2e6732b9bd312ca3b4ddfa068c66f9c1cb1c936233b6e780910fb1a

See more details on using hashes here.

File details

Details for the file taxonomy-0.8.1-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.whl.

File metadata

File hashes

Hashes for taxonomy-0.8.1-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 dfc1be5694146bd0fccbdd35d3b8a7faa68c4d672cfe37a32d7e087c67495368
MD5 6088703ed5154121988e85f0ec42a55f
BLAKE2b-256 c2724551e64a086d66fcffabd583d7e299bb76b75e7b9ab82f54e0e94757e25a

See more details on using hashes here.

File details

Details for the file taxonomy-0.8.1-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for taxonomy-0.8.1-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 466167608a626b472cb97affee1bd2d4bc5d48d2b9883327ce21e1ff05bfe92e
MD5 a2ccfc6094cbd9974b6b87cdf7491cb9
BLAKE2b-256 1977df5ac3ecab97b966ad6650b8bad653343f511e0a86656f27a171599e4d95

See more details on using hashes here.

File details

Details for the file taxonomy-0.8.1-cp39-cp39-manylinux_2_5_x86_64.manylinux1_x86_64.whl.

File metadata

File hashes

Hashes for taxonomy-0.8.1-cp39-cp39-manylinux_2_5_x86_64.manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 fbd4e41dfff78f17f703eae85b98d48b62250d6ff153cb493f6754420d099c9c
MD5 44543e2ef5b1369195df256f076f8afc
BLAKE2b-256 6a6b22fbedd7ff9d8112872e6678945037f0a9456fa4bc36b11d9073bb21d9f3

See more details on using hashes here.

File details

Details for the file taxonomy-0.8.1-cp39-cp39-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for taxonomy-0.8.1-cp39-cp39-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 5fb57a26734e2064913f8f7f96466fe212a791d411e54ea634c81035ce56dcae
MD5 a237f6b9ae76b9ff5d14a1921de66ed7
BLAKE2b-256 067aa7b10fb7ada733f98861f05477adf1799360b6d3772271f5ac50a3dc0b93

See more details on using hashes here.

File details

Details for the file taxonomy-0.8.1-cp39-cp39-macosx_10_7_x86_64.whl.

File metadata

  • Download URL: taxonomy-0.8.1-cp39-cp39-macosx_10_7_x86_64.whl
  • Upload date:
  • Size: 382.5 kB
  • Tags: CPython 3.9, macOS 10.7+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.7.10

File hashes

Hashes for taxonomy-0.8.1-cp39-cp39-macosx_10_7_x86_64.whl
Algorithm Hash digest
SHA256 a92adf0f548f9cdcd6aea8f05d61b97a991b0f135dfb6b316a1d16b1da2a1ad4
MD5 f775df5f7753d522c490cbd6ffc7c438
BLAKE2b-256 c8ceb0d8cfd55e2babba4044c59de4535b3a4182ae567607363e8306a0d3f9fd

See more details on using hashes here.

File details

Details for the file taxonomy-0.8.1-cp38-cp38-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for taxonomy-0.8.1-cp38-cp38-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 d4270cd38f680d68a27fdb276c54d692f0e883c2753da48fa3acae676802c935
MD5 7e26f7490f69686baca6410be35e8e23
BLAKE2b-256 2e1922f97eecd0cc3db2dedf87414b83d8675b3f6f8243e45c98723bb00142ac

See more details on using hashes here.

File details

Details for the file taxonomy-0.8.1-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.whl.

File metadata

File hashes

Hashes for taxonomy-0.8.1-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 8dd03b9a12a0bb8e311deb08a027ded91d5664cab859cf6d3ab3950e956750f6
MD5 b7ab28b080646ba401e5412c1d6020c3
BLAKE2b-256 3b29132f532a8f4f7c84a3338ebe8ea89a18ab82c309315506f281f46a8a3730

See more details on using hashes here.

File details

Details for the file taxonomy-0.8.1-cp37-cp37m-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for taxonomy-0.8.1-cp37-cp37m-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 249c6152e73474bafd6a2b82f8979e77020b9be8c2381cb2b03e7294d1124735
MD5 b0d5839a625cfafb47b9261175372287
BLAKE2b-256 616eb2969e94e8a863726331a51d28fb059f2f4e5c4dbe09d05586bfe23682cf

See more details on using hashes here.

File details

Details for the file taxonomy-0.8.1-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.whl.

File metadata

File hashes

Hashes for taxonomy-0.8.1-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 7645375c0420147db6c5644a9c81b520f1ea18157e9d355f8dc81e6d2fc1c058
MD5 84a111bd6012bb3f47483f4ccef5972b
BLAKE2b-256 359a876e7c5b4be53f192cb6ab1806877dd9745624b9b32bd7bbb01c32725177

See more details on using hashes here.

File details

Details for the file taxonomy-0.8.1-cp37-cp37m-macosx_10_7_x86_64.whl.

File metadata

  • Download URL: taxonomy-0.8.1-cp37-cp37m-macosx_10_7_x86_64.whl
  • Upload date:
  • Size: 382.2 kB
  • Tags: CPython 3.7m, macOS 10.7+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.7.10

File hashes

Hashes for taxonomy-0.8.1-cp37-cp37m-macosx_10_7_x86_64.whl
Algorithm Hash digest
SHA256 f85c3ab91139f7de48b294c815ad7181bd1ec221071340bb89184909a5e633c5
MD5 28169093fcadf43fef357662e9e6b978
BLAKE2b-256 7c57de54efc65b019b0bfbc34dafdc09c3e1c3642e5d5bec157f0787ecdd4ef3

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page