Skip to main content

Routines for loading, saving, and manipulating taxonomic trees

Project description

Taxonomy

PyPI version Crates version CI

This is a Rust library for reading, writing, and editing biological taxonomies. There are associated Python bindings for accessing most of the functionality from Python.

This library was developed initially as a component in One Codex's metagenomic classification pipeline before being refactored out, expanded, and open-sourced. It is designed such that it can be used as is with a number of taxonomic formats or the Taxonomy trait it provides can be used to add last common ancestor, traversal, etc. methods to a downstream package's taxonomy implementation.

The library ships with a number of features:

  • Common support for taxonomy handling across Rust and Python
  • Fast and low(er) memory usage
  • NCBI taxonomy, JSON ("tree" and "node_link_data" formats), Newick, and PhyloXML support
  • Easily extensible (in Rust) to support other formats and operations

Installation

Rust

This library can be added to an existing Cargo.toml file and installed straight from crates.io.

Python

You can install the Python bindings directly from PyPI (binaries are only built for select architectures) with:

pip install taxonomy

Python Usage

The Python taxonomy API can open and manipulate all of the formats from the Rust library. Note that Taxonomy IDs in NCBI format are integers, but they're converted to strings on import. We find working with "string taxonomy IDs" greatly simplifies inter-operation between different taxonomy systems.

Loading a taxonomy

Taxonomy can be loaded from a variety of sources.

  1. Taxonomy.from_newick(value: str): loads a Taxonomy from a Newick-encoded string.

  2. Taxonomy.from_ncbi(nodes_path: str, names_path: str): loads a Taxonomy from a pair of NCBI dump files. The paths specified are to the individual files in the NCBI taxonomy directory (e.g. nodes.dmp and names.dmp).

  3. Taxonomy.from_json(value: str, /, path: List[str]): loads a Taxonomy from a JSON-encoded string. The format can either be of the tree or node_link_data types and will be automatically detected. If path is specified, the JSON will be traversed to that sub-object before being parsed as a taxonomy.

  4. Taxonomy.from_phyloxml(value: &str): loads a Taxonomy from a PhyloXML-encoded string. Experimental

Exporting a taxonomy

Assuming that the taxonomy has been instantiated as a variable named tax.

  1. tax.to_newick(): exports a Taxonomy as a Newick-encoded byte string.

  2. tax.to_json(/, as_node_link_data: bool): exports a Taxonomy as a JSON-encoded byte string. By default, the JSON format is a tree format unless the as_node_link_data parameter is set to True.

Using a taxonomy

Assuming that the taxonomy has been instantiated as a variable named tax. Note that TaxonomyNode is a class with the following schema:

class TaxonomyNode:
    id: str
    name: str
    parent: Optional[str]
    rank: str

Note that tax_id in parameters passed in functions described below are string but for example in the case of NCBI need to be essentially quoting integers: 562 -> "562". In that case, passing something that can't be converted to a number will raise an exception even if the documentation below does not mention it.

tax.root -> TaxonomyNode

Points to the root of the taxonomy

tax.parent(tax_id: str, /, at_rank: str) -> Optional[TaxonomyNode]

Return the immediate parent TaxonomyNode of the node id.

If at_rank is provided, scan all the nodes in the node's lineage and return the parent id at that rank.

Examples:

parent = tax.parent("612")
parent = tax.parent("612", at_rank="species")
parent = tax.parent("612")
# Both variables will be `None` if we can't find the parent
parent = tax.parent("unknown")

tax.parent_with_distance(tax_id: str, /, at_rank: str) -> (Optional[TaxonomyNode], Optional[float])

Same as parent but return the distance in addition, as a (TaxonomyNode, float) tuple.

tax.node(tax_id: str) -> Optional[TaxonomyNode]

Returns the node at that id. Returns None if not found. You can also use indexing to accomplish that: tax["some_id"] but this will raise an exception if the node is not found.

tax.find_by_name(name: str) -> Optional[TaxonomyNode]

Returns the node with that name. Returns None if not found. In NCBI, it only accounts for scientific names and not synonyms.

tax.children(tax_id: str) -> List[TaxonomyNode]

Returns all nodes below the given tax id.

tax.lineage(tax_id: str) -> List[TaxonomyNode]

Returns all nodes above the given tax id, including itself.

tax.parents(tax_id: str) -> List[TaxonomyNode]

Returns all nodes above the given tax id.

tax.lca(id1: str, id2: str) -> Optional[TaxonomyNode]

Returns the lowest common ancestor for the 2 given nodes.

tax.prune(keep: List[str], remove: List[str])-> Taxonomy

Return a copy of the taxonomy containing:

  • only the nodes in keep and their parents if provided
  • all of the nodes except those in remove and their children if provided

tax.remove_node(tax_id: str)

Remove the node from the tree, re-attaching parents as needed: only a single node is removed.

tax.add_node(parent_tax_id: str, new_tax_id: str)

Add a new node to the tree at the parent provided.

edit_node(tax_id: str, /, name: str, rank: str, parent_id: str, parent_dist: float)

Edit properties on a taxonomy node.

Exceptions

Only one exception is raised intentionally by the library: TaxonomyError. If you get a pyo3_runtime.PanicException (or anything with pyo3 in its name), this is a bug in the underlying Rust library, please open an issue.

Development

Rust

There is a test suite runable with cargo test. To test the Python-bindings you need to use the additional python_test feature: cargo test --features python_test.

Python

To work on the Python library on a Mac OS X/Unix system (requires Python 3):

# you need the nightly version of Rust installed
curl https://sh.rustup.rs -sSf | sh
rustup default nightly

# finally, install the library in the local virtualenv
maturin develop --cargo-extra-args="--features=python"

# or using pip
pip install .

Building binary wheels and pushing to PyPI

# The Mac build requires switching through a few different python versions
maturin build --cargo-extra-args="--features=python" --release --strip

# The linux build is automated through cross-compiling in a docker image
docker run --rm -v $(pwd):/io konstin2/maturin:master build --cargo-extra-args="--features=python" --release --strip
twine upload target/wheels/*

Other Taxonomy Libraries

There are taxonomic toolkits for other programming languages that offer different features and provided some inspiration for this library:

ETE Toolkit (http://etetoolkit.org/) A Python taxonomy library

Taxize (https://ropensci.github.io/taxize-book/) An R toolkit for working with taxonomic data

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

taxonomy-0.7.1.tar.gz (82.8 kB view details)

Uploaded Source

Built Distributions

taxonomy-0.7.1-cp310-cp310-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (378.8 kB view details)

Uploaded CPython 3.10 manylinux: glibc 2.12+ x86-64

taxonomy-0.7.1-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.whl (378.8 kB view details)

Uploaded CPython 3.10 manylinux: glibc 2.5+ x86-64

taxonomy-0.7.1-cp310-cp310-macosx_10_7_x86_64.whl (351.4 kB view details)

Uploaded CPython 3.10 macOS 10.7+ x86-64

taxonomy-0.7.1-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (350.2 kB view details)

Uploaded CPython 3.9 manylinux: glibc 2.17+ ARM64

taxonomy-0.7.1-cp39-cp39-manylinux2010_x86_64.whl (370.0 kB view details)

Uploaded CPython 3.9 manylinux: glibc 2.12+ x86-64

taxonomy-0.7.1-cp39-cp39-macosx_10_7_x86_64.whl (351.3 kB view details)

Uploaded CPython 3.9 macOS 10.7+ x86-64

taxonomy-0.7.1-cp38-cp38-manylinux2010_x86_64.whl (369.4 kB view details)

Uploaded CPython 3.8 manylinux: glibc 2.12+ x86-64

taxonomy-0.7.1-cp38-cp38-macosx_10_7_x86_64.whl (351.0 kB view details)

Uploaded CPython 3.8 macOS 10.7+ x86-64

taxonomy-0.7.1-cp37-cp37m-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (350.2 kB view details)

Uploaded CPython 3.7m manylinux: glibc 2.17+ ARM64

taxonomy-0.7.1-cp37-cp37m-manylinux2010_x86_64.whl (369.9 kB view details)

Uploaded CPython 3.7m manylinux: glibc 2.12+ x86-64

taxonomy-0.7.1-cp37-cp37m-macosx_10_7_x86_64.whl (350.4 kB view details)

Uploaded CPython 3.7m macOS 10.7+ x86-64

taxonomy-0.7.1-cp36-cp36m-manylinux2010_x86_64.whl (368.8 kB view details)

Uploaded CPython 3.6m manylinux: glibc 2.12+ x86-64

File details

Details for the file taxonomy-0.7.1.tar.gz.

File metadata

  • Download URL: taxonomy-0.7.1.tar.gz
  • Upload date:
  • Size: 82.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.7.10

File hashes

Hashes for taxonomy-0.7.1.tar.gz
Algorithm Hash digest
SHA256 c97b2a52d129db9c500aeb75550905413ab669d05b66506384db1616b9ceca38
MD5 7df7ec5da81f58e03b1f66373aa3954c
BLAKE2b-256 a723864f3888f7f07d25425b6f99fea36c14cee64effe9be64181fb0a3158d13

See more details on using hashes here.

File details

Details for the file taxonomy-0.7.1-cp310-cp310-manylinux_2_12_x86_64.manylinux2010_x86_64.whl.

File metadata

File hashes

Hashes for taxonomy-0.7.1-cp310-cp310-manylinux_2_12_x86_64.manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 fec17d6d21887db9d7bfe54d58e1cf46708ebe24d6ef4d3e6a1f5ad258c687b0
MD5 6d45f0e8838e8b99539a42b9ff267585
BLAKE2b-256 58cbbbbdf48a78127ff830557253763b5a1a92c586f0542754c0b2dfdd194f32

See more details on using hashes here.

File details

Details for the file taxonomy-0.7.1-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.whl.

File metadata

File hashes

Hashes for taxonomy-0.7.1-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 ad3e70c6f16198175b03f7f7a3e0aa3e1805dcb3268c3b3870727ee6474358be
MD5 c2bc29f12b24bb28f5ebf666fcd036aa
BLAKE2b-256 e06d4222a43334de8525ef25dc24c5b698f788dd65d1e1d17d9784be4c7ec649

See more details on using hashes here.

File details

Details for the file taxonomy-0.7.1-cp310-cp310-macosx_10_7_x86_64.whl.

File metadata

File hashes

Hashes for taxonomy-0.7.1-cp310-cp310-macosx_10_7_x86_64.whl
Algorithm Hash digest
SHA256 cbddf35d0607aa48ab00941e5ddbec0b459e13b1b970dd363863f93e2d376f91
MD5 e0b4ab0336ddc90a8f040c8fde6d8fe9
BLAKE2b-256 41f82de922c3382aa26d0dd80a2b08d4a1133653458fd4b3bf9345314b460519

See more details on using hashes here.

File details

Details for the file taxonomy-0.7.1-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for taxonomy-0.7.1-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 1cd0e1e75fd9576fade65ab2228bf206057c4f8767b9c5fa1d009acbc04244b8
MD5 95907e4e82a5c63cc9a2a3b4fd4ad4f2
BLAKE2b-256 b898f826f12a98545498dd491f81b5fea4e52054f6561d043bec4a6f1ec47212

See more details on using hashes here.

File details

Details for the file taxonomy-0.7.1-cp39-cp39-manylinux2010_x86_64.whl.

File metadata

  • Download URL: taxonomy-0.7.1-cp39-cp39-manylinux2010_x86_64.whl
  • Upload date:
  • Size: 370.0 kB
  • Tags: CPython 3.9, manylinux: glibc 2.12+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.7.10

File hashes

Hashes for taxonomy-0.7.1-cp39-cp39-manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 783d37c4ca947253adaee0fe46038d119cf1be6eef52650533cf6283b1590523
MD5 18bef78225d50d61069e3b731499f5ff
BLAKE2b-256 a9deb72b783be17dce148595e2d9575f5ce0af5310c7818b322cb29318a9fb58

See more details on using hashes here.

File details

Details for the file taxonomy-0.7.1-cp39-cp39-macosx_10_7_x86_64.whl.

File metadata

  • Download URL: taxonomy-0.7.1-cp39-cp39-macosx_10_7_x86_64.whl
  • Upload date:
  • Size: 351.3 kB
  • Tags: CPython 3.9, macOS 10.7+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.7.10

File hashes

Hashes for taxonomy-0.7.1-cp39-cp39-macosx_10_7_x86_64.whl
Algorithm Hash digest
SHA256 3ef79038f71175c7b8c38fc6861f8af7ac49403d81491f462718943ef9869013
MD5 7d5813a801f7e007e2459a5b2307e093
BLAKE2b-256 e36361fe46270724e342b76bfa55e07d6b98b15995e24cd3ac0d5152e2cd9adc

See more details on using hashes here.

File details

Details for the file taxonomy-0.7.1-cp38-cp38-manylinux2010_x86_64.whl.

File metadata

  • Download URL: taxonomy-0.7.1-cp38-cp38-manylinux2010_x86_64.whl
  • Upload date:
  • Size: 369.4 kB
  • Tags: CPython 3.8, manylinux: glibc 2.12+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.7.10

File hashes

Hashes for taxonomy-0.7.1-cp38-cp38-manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 7a3e128dfc693a361c7855014f1ab880b2b280b0e20e40be380ffc5fd0d9b088
MD5 92d6aa483295b94dca50a410061c924e
BLAKE2b-256 858d8651642a8ed56d08f810580fd037acf3ef5290ee359804f910b4425e0b4b

See more details on using hashes here.

File details

Details for the file taxonomy-0.7.1-cp38-cp38-macosx_10_7_x86_64.whl.

File metadata

  • Download URL: taxonomy-0.7.1-cp38-cp38-macosx_10_7_x86_64.whl
  • Upload date:
  • Size: 351.0 kB
  • Tags: CPython 3.8, macOS 10.7+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.7.10

File hashes

Hashes for taxonomy-0.7.1-cp38-cp38-macosx_10_7_x86_64.whl
Algorithm Hash digest
SHA256 955a7d602459dfa61005d9dbecceadfda48045d1efa09bdf49a5d4212e1ecfa2
MD5 9ec281aa3dc81b83d2d8c8b4253ad5e2
BLAKE2b-256 782211701c589512becaea3c8956b686095f27f0e7f7f37723339eb77a461c8f

See more details on using hashes here.

File details

Details for the file taxonomy-0.7.1-cp37-cp37m-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for taxonomy-0.7.1-cp37-cp37m-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 de83d250ed2fbb51dc71852da96e05e00873f993f445e030983c058a7f594c4c
MD5 89c57b30e17b52a70a556bdbf491f333
BLAKE2b-256 f5c391b2385bd3b38ab881a02fdd46ed5ae39b49bd1a867416d56351d88fd1e4

See more details on using hashes here.

File details

Details for the file taxonomy-0.7.1-cp37-cp37m-manylinux2010_x86_64.whl.

File metadata

  • Download URL: taxonomy-0.7.1-cp37-cp37m-manylinux2010_x86_64.whl
  • Upload date:
  • Size: 369.9 kB
  • Tags: CPython 3.7m, manylinux: glibc 2.12+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.7.10

File hashes

Hashes for taxonomy-0.7.1-cp37-cp37m-manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 f9d4e6884815d31cad1f686afec5456b7c6489eeb0b05df44fff360f09425a62
MD5 d9daa465b6af055556500be5c8026d36
BLAKE2b-256 3f51896f3f2ab9e98e3c98fb2fb52f8290b3c07469bf07ee33c913b4654106ba

See more details on using hashes here.

File details

Details for the file taxonomy-0.7.1-cp37-cp37m-macosx_10_7_x86_64.whl.

File metadata

  • Download URL: taxonomy-0.7.1-cp37-cp37m-macosx_10_7_x86_64.whl
  • Upload date:
  • Size: 350.4 kB
  • Tags: CPython 3.7m, macOS 10.7+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.7.10

File hashes

Hashes for taxonomy-0.7.1-cp37-cp37m-macosx_10_7_x86_64.whl
Algorithm Hash digest
SHA256 5b854b7d0b65f39bb9af4b3cca9e2e8034874c9edd79977f8ae3fe57a393eca8
MD5 4340292c1bb0769f3ac5af1a7cb0e267
BLAKE2b-256 840eec020d148751fc7e508ea843346ab46101f68b6fea750ef0c241aaa907d3

See more details on using hashes here.

File details

Details for the file taxonomy-0.7.1-cp36-cp36m-manylinux2010_x86_64.whl.

File metadata

  • Download URL: taxonomy-0.7.1-cp36-cp36m-manylinux2010_x86_64.whl
  • Upload date:
  • Size: 368.8 kB
  • Tags: CPython 3.6m, manylinux: glibc 2.12+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.7.10

File hashes

Hashes for taxonomy-0.7.1-cp36-cp36m-manylinux2010_x86_64.whl
Algorithm Hash digest
SHA256 0adfb3a480d57c424cc9a77e0a61af8ec117e7619748b2707582e86eed450d2e
MD5 dcb1a42080047591f50cfd3ae8c8ecf4
BLAKE2b-256 3f2f60eea5031207d1dce381dac5677bb6b2edc2d3d9f6337cc6968b22e65a13

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page