Skip to main content

Routines for loading, saving, and manipulating taxonomic trees

Project description

Taxonomy

PyPI version Crates version CI

This is a Rust library for reading, writing, and editing biological taxonomies. There are associated Python bindings for accessing most of the functionality from Python.

This library was developed initially as a component in One Codex's metagenomic classification pipeline before being refactored out, expanded, and open-sourced. It is designed such that it can be used as is with a number of taxonomic formats or the Taxonomy trait it provides can be used to add last common ancestor, traversal, etc. methods to a downstream package's taxonomy implementation.

The library ships with a number of features:

  • Common support for taxonomy handling across Rust and Python
  • Fast and low(er) memory usage
  • NCBI taxonomy, JSON ("tree" and "node_link_data" formats), Newick, and PhyloXML support
  • Easily extensible (in Rust) to support other formats and operations

Installation

Rust

This library can be added to an existing Cargo.toml file and installed straight from crates.io.

Python

You can install the Python bindings directly from PyPI (binaries are only built for select architectures) with:

pip install taxonomy

Python Usage

The Python taxonomy API can open and manipulate all of the formats from the Rust library. Note that Taxonomy IDs in NCBI format are integers, but they're converted to strings on import. We find working with "string taxonomy IDs" greatly simplifies inter-operation between different taxonomy systems.

Loading a taxonomy

Taxonomy can be loaded from a variety of sources.

  1. Taxonomy.from_newick(value: str): loads a Taxonomy from a Newick-encoded string.

  2. Taxonomy.from_ncbi(nodes_path: str, names_path: str): loads a Taxonomy from a pair of NCBI dump files. The paths specified are to the individual files in the NCBI taxonomy directory (e.g. nodes.dmp and names.dmp).

  3. Taxonomy.from_json(value: str, /, path: List[str]): loads a Taxonomy from a JSON-encoded string. The format can either be of the tree or node_link_data types and will be automatically detected. If path is specified, the JSON will be traversed to that sub-object before being parsed as a taxonomy.

  4. Taxonomy.from_phyloxml(value: &str): loads a Taxonomy from a PhyloXML-encoded string. Experimental

Exporting a taxonomy

Assuming that the taxonomy has been instantiated as a variable named tax.

  1. tax.to_newick(): exports a Taxonomy as a Newick-encoded byte string.

  2. tax.to_json(/, as_node_link_data: bool): exports a Taxonomy as a JSON-encoded byte string. By default, the JSON format is a tree format unless the as_node_link_data parameter is set to True.

Using a taxonomy

Assuming that the taxonomy has been instantiated as a variable named tax. Note that TaxonomyNode is a class with the following schema:

class TaxonomyNode:
    id: str
    name: str
    parent: Optional[str]
    rank: str

Note that tax_id in parameters passed in functions described below are string but for example in the case of NCBI need to be essentially quoting integers: 562 -> "562". In that case, passing something that can't be converted to a number will raise an exception even if the documentation below does not mention it.

tax.root -> TaxonomyNode

Points to the root of the taxonomy

tax.parent(tax_id: str, /, at_rank: str) -> Optional[TaxonomyNode]

Return the immediate parent TaxonomyNode of the node id.

If at_rank is provided, scan all the nodes in the node's lineage and return the parent id at that rank.

Examples:

parent = tax.parent("612")
parent = tax.parent("612", at_rank="species")
parent = tax.parent("612")
# Both variables will be `None` if we can't find the parent
parent = tax.parent("unknown")

tax.parent_with_distance(tax_id: str, /, at_rank: str) -> (Optional[TaxonomyNode], Optional[float])

Same as parent but return the distance in addition, as a (TaxonomyNode, float) tuple.

tax.node(tax_id: str) -> Optional[TaxonomyNode]

Returns the node at that id. Returns None if not found. You can also use indexing to accomplish that: tax["some_id"] but this will raise an exception if the node is not found.

tax.find_by_name(name: str) -> Optional[TaxonomyNode]

Returns the node with that name. Returns None if not found. In NCBI, it only accounts for scientific names and not synonyms.

tax.children(tax_id: str) -> List[TaxonomyNode]

Returns all nodes below the given tax id.

tax.lineage(tax_id: str) -> List[TaxonomyNode]

Returns all nodes above the given tax id, including itself.

tax.parents(tax_id: str) -> List[TaxonomyNode]

Returns all nodes above the given tax id.

tax.lca(id1: str, id2: str) -> Optional[TaxonomyNode]

Returns the lowest common ancestor for the 2 given nodes.

tax.prune(keep: List[str], remove: List[str])-> Taxonomy

Return a copy of the taxonomy containing:

  • only the nodes in keep and their parents if provided
  • all of the nodes except those in remove and their children if provided

tax.remove_node(tax_id: str)

Remove the node from the tree, re-attaching parents as needed: only a single node is removed.

tax.add_node(parent_tax_id: str, new_tax_id: str)

Add a new node to the tree at the parent provided.

edit_node(tax_id: str, /, name: str, rank: str, parent_id: str, parent_dist: float)

Edit properties on a taxonomy node.

Exceptions

Only one exception is raised intentionally by the library: TaxonomyError. If you get a pyo3_runtime.PanicException (or anything with pyo3 in its name), this is a bug in the underlying Rust library, please open an issue.

Development

Rust

There is a test suite runable with cargo test. To test the Python-bindings you need to use the additional python_test feature: cargo test --features python_test.

Python

To work on the Python library on a Mac OS X/Unix system (requires Python 3):

# you need the nightly version of Rust installed
curl https://sh.rustup.rs -sSf | sh
rustup default nightly

# finally, install the library in the local virtualenv
maturin develop --cargo-extra-args="--features=python"

# or using pip
pip install .

Building binary wheels and pushing to PyPI

# The Mac build requires switching through a few different python versions
maturin build --cargo-extra-args="--features=python" --release --strip

# The linux build is automated through cross-compiling in a docker image
docker run --rm -v $(pwd):/io konstin2/maturin:master build --cargo-extra-args="--features=python" --release --strip
twine upload target/wheels/*

Other Taxonomy Libraries

There are taxonomic toolkits for other programming languages that offer different features and provided some inspiration for this library:

ETE Toolkit (http://etetoolkit.org/) A Python taxonomy library

Taxize (https://ropensci.github.io/taxize-book/) An R toolkit for working with taxonomic data

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

taxonomy-0.7.0.tar.gz (82.7 kB view details)

Uploaded Source

Built Distributions

taxonomy-0.7.0-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (316.7 kB view details)

Uploaded CPython 3.9 manylinux: glibc 2.17+ ARM64

taxonomy-0.7.0-cp38-cp38-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (316.7 kB view details)

Uploaded CPython 3.8 manylinux: glibc 2.17+ ARM64

taxonomy-0.7.0-cp38-cp38-manylinux1_x86_64.whl (343.9 kB view details)

Uploaded CPython 3.8

taxonomy-0.7.0-cp38-cp38-macosx_10_7_x86_64.whl (308.7 kB view details)

Uploaded CPython 3.8 macOS 10.7+ x86-64

taxonomy-0.7.0-cp37-cp37m-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (316.7 kB view details)

Uploaded CPython 3.7m manylinux: glibc 2.17+ ARM64

taxonomy-0.7.0-cp37-cp37m-manylinux1_x86_64.whl (343.9 kB view details)

Uploaded CPython 3.7m

taxonomy-0.7.0-cp37-cp37m-macosx_10_7_x86_64.whl (308.7 kB view details)

Uploaded CPython 3.7m macOS 10.7+ x86-64

taxonomy-0.7.0-cp36-cp36m-manylinux1_x86_64.whl (344.4 kB view details)

Uploaded CPython 3.6m

taxonomy-0.7.0-cp35-cp35m-manylinux1_x86_64.whl (344.3 kB view details)

Uploaded CPython 3.5m

File details

Details for the file taxonomy-0.7.0.tar.gz.

File metadata

  • Download URL: taxonomy-0.7.0.tar.gz
  • Upload date:
  • Size: 82.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.48.0 CPython/3.8.3

File hashes

Hashes for taxonomy-0.7.0.tar.gz
Algorithm Hash digest
SHA256 b8b1165c881768e662dd080e97332b34a00613c24046bc1516ec84cf45434a46
MD5 ff84e4e280fb493169c8a5debd711263
BLAKE2b-256 cfe2900891954bd9d92d2c3d6e27fb0b0c354ffbba90aba00eac3fb6af0ac859

See more details on using hashes here.

File details

Details for the file taxonomy-0.7.0-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for taxonomy-0.7.0-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 4cb2296cf9a733d500c09cb2a4bdc17c873db46f4b613dfad4c9134943b5e549
MD5 9fd0641e3467986f459dd8578e2aa519
BLAKE2b-256 c0a9c97628b3de64c11172854617145aa475d05e1fd1722581e0ebf771eca62e

See more details on using hashes here.

File details

Details for the file taxonomy-0.7.0-cp38-cp38-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for taxonomy-0.7.0-cp38-cp38-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 28419bcc2cdac2678fc2d8c6b510053100468b413923a280f5181f381ad1bc84
MD5 d17bb79e7b82c1aa2c69e7c7c5fdf9dd
BLAKE2b-256 92b4b188589eb3d8d8502db098e52f7cff310d817cf38d5da3ad02c77cd13ba4

See more details on using hashes here.

File details

Details for the file taxonomy-0.7.0-cp38-cp38-manylinux1_x86_64.whl.

File metadata

  • Download URL: taxonomy-0.7.0-cp38-cp38-manylinux1_x86_64.whl
  • Upload date:
  • Size: 343.9 kB
  • Tags: CPython 3.8
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.48.0 CPython/3.8.3

File hashes

Hashes for taxonomy-0.7.0-cp38-cp38-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 6c7f9787f92abaed92f95bd0e5b78365976e48e732745dde8831420faed9a8a7
MD5 fc28ef0b4fea727c680c70926c3af761
BLAKE2b-256 026e3aeccce75b103acba6e8e218ac9fbdd336e14877dd90d56671587bd708b6

See more details on using hashes here.

File details

Details for the file taxonomy-0.7.0-cp38-cp38-macosx_10_7_x86_64.whl.

File metadata

  • Download URL: taxonomy-0.7.0-cp38-cp38-macosx_10_7_x86_64.whl
  • Upload date:
  • Size: 308.7 kB
  • Tags: CPython 3.8, macOS 10.7+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.48.0 CPython/3.8.3

File hashes

Hashes for taxonomy-0.7.0-cp38-cp38-macosx_10_7_x86_64.whl
Algorithm Hash digest
SHA256 3b448c6093174b7a0f8ba5be088e1adfdba8a622156a0e001bb7166e06fda7db
MD5 68a139d5be7e5ae7792e3404e695d98f
BLAKE2b-256 0a31cbacc8cfa8799227e81191e2cc8d98349a09541ee3562cd8354cc87781c6

See more details on using hashes here.

File details

Details for the file taxonomy-0.7.0-cp37-cp37m-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for taxonomy-0.7.0-cp37-cp37m-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 7bd56f5986f9a2a45da3fba1272461fa20f12d140a0acd147de0e4794a433c71
MD5 2d03113c93149d2136f53cbd3425b156
BLAKE2b-256 be08de0a3b8a84f3dabfa7b4daf23033f0e1a4fb91757c776007448f46185673

See more details on using hashes here.

File details

Details for the file taxonomy-0.7.0-cp37-cp37m-manylinux1_x86_64.whl.

File metadata

  • Download URL: taxonomy-0.7.0-cp37-cp37m-manylinux1_x86_64.whl
  • Upload date:
  • Size: 343.9 kB
  • Tags: CPython 3.7m
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.48.0 CPython/3.8.3

File hashes

Hashes for taxonomy-0.7.0-cp37-cp37m-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 273ea3a156f1e560cb1158a25a2e689321404b1b77be2c2e226e561eb31401b0
MD5 5eb491c5e9c7d9f78027295d0961e558
BLAKE2b-256 c9f198d433eacfad9679789ef3f9f7ffa44f26a31286b56a8452136d0a5944af

See more details on using hashes here.

File details

Details for the file taxonomy-0.7.0-cp37-cp37m-macosx_10_7_x86_64.whl.

File metadata

  • Download URL: taxonomy-0.7.0-cp37-cp37m-macosx_10_7_x86_64.whl
  • Upload date:
  • Size: 308.7 kB
  • Tags: CPython 3.7m, macOS 10.7+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.48.0 CPython/3.8.3

File hashes

Hashes for taxonomy-0.7.0-cp37-cp37m-macosx_10_7_x86_64.whl
Algorithm Hash digest
SHA256 f36c3bb91b270b68b832907964982ab2c8809ed53979757bcc96f2e52fb4b516
MD5 f782f08f43f090167723cbfbcb24a2b6
BLAKE2b-256 5fc403f2535a69b17c85ce3206bf9b24d28b6348c4987a7ea76c1c43642bcee5

See more details on using hashes here.

File details

Details for the file taxonomy-0.7.0-cp36-cp36m-manylinux1_x86_64.whl.

File metadata

  • Download URL: taxonomy-0.7.0-cp36-cp36m-manylinux1_x86_64.whl
  • Upload date:
  • Size: 344.4 kB
  • Tags: CPython 3.6m
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.48.0 CPython/3.8.3

File hashes

Hashes for taxonomy-0.7.0-cp36-cp36m-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 f7c23da2522b819f66245053ec98088d2c6b962b06469dcbf6a0dc326b56dfac
MD5 09d1b741a64d5d7162522e6bb93d2c75
BLAKE2b-256 9b252bf9fa4d0ac906e2ac6b1c4df4fb8cfa0a7aa69fb23edd850f17203810c8

See more details on using hashes here.

File details

Details for the file taxonomy-0.7.0-cp35-cp35m-manylinux1_x86_64.whl.

File metadata

  • Download URL: taxonomy-0.7.0-cp35-cp35m-manylinux1_x86_64.whl
  • Upload date:
  • Size: 344.3 kB
  • Tags: CPython 3.5m
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.48.0 CPython/3.8.3

File hashes

Hashes for taxonomy-0.7.0-cp35-cp35m-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 c84f31a1de6e6f4f241d5f80c750f862addb9cea8482a01b840e016ac8b6ff47
MD5 8e22114db54490a46777f45ebf3f287d
BLAKE2b-256 a86acf630950d89ecc6919d0246964445ce64c40b12f5e21bccbe57f21ae4943

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page