Routines for loading, saving, and manipulating taxonomic trees
Project description
Taxonomy
This is a Rust library for reading, writing, and editing biological taxonomies. There are associated Python bindings for accessing most of the functionality from Python.
This library was developed initially as a component in One Codex's metagenomic classification pipeline before being refactored out, expanded, and open-sourced. It is designed such that it can be used as is with a number of taxonomic formats or the Taxonomy trait it provides can be used to add last common ancestor, traversal, etc. methods to a downstream package's taxonomy implementation.
The library ships with a number of features:
- Common support for taxonomy handling across Rust and Python
- Fast and low(er) memory usage
- NCBI taxonomy, JSON ("tree" and "node_link_data" formats), Newick, and PhyloXML support
- Easily extensible (in Rust) to support other formats and operations
Installation
Rust
This library can be added to an existing Cargo.toml file and installed straight from crates.io.
Python
You can install the Python bindings directly from PyPI (binaries are only built for select architectures) with:
pip install taxonomy
Python Usage
The Python taxonomy API can open and manipulate all of the formats from the Rust library. Note that Taxonomy IDs in NCBI format are integers, but they're converted to strings on import. We find working with "string taxonomy IDs" greatly simplifies inter-operation between different taxonomy systems.
Loading a taxonomy
Taxonomy can be loaded from a variety of sources.
-
Taxonomy.from_newick(value: str)
: loads a Taxonomy from a Newick-encoded string. -
Taxonomy.from_ncbi(nodes_path: str, names_path: str)
: loads a Taxonomy from a pair of NCBI dump files. The paths specified are to the individual files in the NCBI taxonomy directory (e.g. nodes.dmp and names.dmp). -
Taxonomy.from_json(value: str, /, path: List[str])
: loads a Taxonomy from a JSON-encoded string. The format can either be of the tree or node_link_data types and will be automatically detected. Ifpath
is specified, the JSON will be traversed to that sub-object before being parsed as a taxonomy. -
Taxonomy.from_phyloxml(value: &str)
: loads a Taxonomy from a PhyloXML-encoded string. Experimental
Exporting a taxonomy
Assuming that the taxonomy has been instantiated as a variable named tax
.
-
tax.to_newick()
: exports a Taxonomy as a Newick-encoded byte string. -
tax.to_json(/, as_node_link_data: bool)
: exports a Taxonomy as a JSON-encoded byte string. By default, the JSON format is a tree format unless theas_node_link_data
parameter is set to True.
Using a taxonomy
Assuming that the taxonomy has been instantiated as a variable named tax
. Note that TaxonomyNode
is a class with
the following schema:
class TaxonomyNode:
id: str
name: str
parent: Optional[str]
rank: str
Note that tax_id in parameters passed in functions described below are string but for example in the case of NCBI need
to be essentially quoting integers: 562 -> "562"
. In that case, passing something that can't be converted to a number
will raise an exception even if the documentation below does not mention it.
tax.root -> TaxonomyNode
Points to the root of the taxonomy
tax.parent(tax_id: str, /, at_rank: str) -> Optional[TaxonomyNode]
Return the immediate parent TaxonomyNode of the node id.
If at_rank
is provided, scan all the nodes in the node's lineage and return
the parent id at that rank.
Examples:
parent = tax.parent("612")
parent = tax.parent("612", at_rank="species")
parent = tax.parent("612")
# Both variables will be `None` if we can't find the parent
parent = tax.parent("unknown")
tax.parent_with_distance(tax_id: str, /, at_rank: str) -> (Optional[TaxonomyNode], Optional[float])
Same as parent
but return the distance in addition, as a (TaxonomyNode, float)
tuple.
tax.node(tax_id: str) -> Optional[TaxonomyNode]
Returns the node at that id. Returns None
if not found.
You can also use indexing to accomplish that: tax["some_id"]
but this will raise an exception if the node
is not found.
tax.find_by_name(name: str) -> Optional[TaxonomyNode]
Returns the node with that name. Returns None
if not found.
In NCBI, it only accounts for scientific names and not synonyms.
tax.children(tax_id: str) -> List[TaxonomyNode]
Returns all nodes below the given tax id.
tax.lineage(tax_id: str) -> List[TaxonomyNode]
Returns all nodes above the given tax id, including itself.
tax.parents(tax_id: str) -> List[TaxonomyNode]
Returns all nodes above the given tax id.
tax.lca(id1: str, id2: str) -> Optional[TaxonomyNode]
Returns the lowest common ancestor for the 2 given nodes.
tax.prune(keep: List[str], remove: List[str])-> Taxonomy
Return a copy of the taxonomy containing:
- only the nodes in
keep
and their parents if provided - all of the nodes except those in remove and their children if provided
tax.remove_node(tax_id: str)
Remove the node from the tree, re-attaching parents as needed: only a single node is removed.
tax.add_node(parent_tax_id: str, new_tax_id: str)
Add a new node to the tree at the parent provided.
edit_node(tax_id: str, /, name: str, rank: str, parent_id: str, parent_dist: float)
Edit properties on a taxonomy node.
Exceptions
Only one exception is raised intentionally by the library: TaxonomyError
.
If you get a pyo3_runtime.PanicException
(or anything with pyo3
in its name), this is a bug in the underlying Rust library, please open an issue.
Development
Rust
There is a test suite runable with cargo test
. To test the Python-bindings you need to use the additional python_test
feature: cargo test --features python_test
.
Python
To work on the Python library on a Mac OS X/Unix system (requires Python 3):
# you need the nightly version of Rust installed
curl https://sh.rustup.rs -sSf | sh
rustup default nightly
# finally, install the library in the local virtualenv
maturin develop --cargo-extra-args="--features=python"
# or using pip
pip install .
Building binary wheels and pushing to PyPI
# The Mac build requires switching through a few different python versions
maturin build --cargo-extra-args="--features=python" --release --strip
# The linux build is automated through cross-compiling in a docker image
docker run --rm -v $(pwd):/io konstin2/maturin:master build --cargo-extra-args="--features=python" --release --strip
twine upload target/wheels/*
Other Taxonomy Libraries
There are taxonomic toolkits for other programming languages that offer different features and provided some inspiration for this library:
ETE Toolkit (http://etetoolkit.org/) A Python taxonomy library
Taxize (https://ropensci.github.io/taxize-book/) An R toolkit for working with taxonomic data
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
File details
Details for the file taxonomy-0.7.0.tar.gz
.
File metadata
- Download URL: taxonomy-0.7.0.tar.gz
- Upload date:
- Size: 82.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.48.0 CPython/3.8.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | b8b1165c881768e662dd080e97332b34a00613c24046bc1516ec84cf45434a46 |
|
MD5 | ff84e4e280fb493169c8a5debd711263 |
|
BLAKE2b-256 | cfe2900891954bd9d92d2c3d6e27fb0b0c354ffbba90aba00eac3fb6af0ac859 |
File details
Details for the file taxonomy-0.7.0-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
.
File metadata
- Download URL: taxonomy-0.7.0-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
- Upload date:
- Size: 316.7 kB
- Tags: CPython 3.9, manylinux: glibc 2.17+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.1 importlib_metadata/4.6.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.61.1 CPython/3.7.10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4cb2296cf9a733d500c09cb2a4bdc17c873db46f4b613dfad4c9134943b5e549 |
|
MD5 | 9fd0641e3467986f459dd8578e2aa519 |
|
BLAKE2b-256 | c0a9c97628b3de64c11172854617145aa475d05e1fd1722581e0ebf771eca62e |
File details
Details for the file taxonomy-0.7.0-cp38-cp38-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
.
File metadata
- Download URL: taxonomy-0.7.0-cp38-cp38-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
- Upload date:
- Size: 316.7 kB
- Tags: CPython 3.8, manylinux: glibc 2.17+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.1 importlib_metadata/4.6.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.61.1 CPython/3.7.10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 28419bcc2cdac2678fc2d8c6b510053100468b413923a280f5181f381ad1bc84 |
|
MD5 | d17bb79e7b82c1aa2c69e7c7c5fdf9dd |
|
BLAKE2b-256 | 92b4b188589eb3d8d8502db098e52f7cff310d817cf38d5da3ad02c77cd13ba4 |
File details
Details for the file taxonomy-0.7.0-cp38-cp38-manylinux1_x86_64.whl
.
File metadata
- Download URL: taxonomy-0.7.0-cp38-cp38-manylinux1_x86_64.whl
- Upload date:
- Size: 343.9 kB
- Tags: CPython 3.8
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.48.0 CPython/3.8.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6c7f9787f92abaed92f95bd0e5b78365976e48e732745dde8831420faed9a8a7 |
|
MD5 | fc28ef0b4fea727c680c70926c3af761 |
|
BLAKE2b-256 | 026e3aeccce75b103acba6e8e218ac9fbdd336e14877dd90d56671587bd708b6 |
File details
Details for the file taxonomy-0.7.0-cp38-cp38-macosx_10_7_x86_64.whl
.
File metadata
- Download URL: taxonomy-0.7.0-cp38-cp38-macosx_10_7_x86_64.whl
- Upload date:
- Size: 308.7 kB
- Tags: CPython 3.8, macOS 10.7+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.48.0 CPython/3.8.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3b448c6093174b7a0f8ba5be088e1adfdba8a622156a0e001bb7166e06fda7db |
|
MD5 | 68a139d5be7e5ae7792e3404e695d98f |
|
BLAKE2b-256 | 0a31cbacc8cfa8799227e81191e2cc8d98349a09541ee3562cd8354cc87781c6 |
File details
Details for the file taxonomy-0.7.0-cp37-cp37m-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
.
File metadata
- Download URL: taxonomy-0.7.0-cp37-cp37m-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
- Upload date:
- Size: 316.7 kB
- Tags: CPython 3.7m, manylinux: glibc 2.17+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.1 importlib_metadata/4.6.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.61.1 CPython/3.7.10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7bd56f5986f9a2a45da3fba1272461fa20f12d140a0acd147de0e4794a433c71 |
|
MD5 | 2d03113c93149d2136f53cbd3425b156 |
|
BLAKE2b-256 | be08de0a3b8a84f3dabfa7b4daf23033f0e1a4fb91757c776007448f46185673 |
File details
Details for the file taxonomy-0.7.0-cp37-cp37m-manylinux1_x86_64.whl
.
File metadata
- Download URL: taxonomy-0.7.0-cp37-cp37m-manylinux1_x86_64.whl
- Upload date:
- Size: 343.9 kB
- Tags: CPython 3.7m
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.48.0 CPython/3.8.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 273ea3a156f1e560cb1158a25a2e689321404b1b77be2c2e226e561eb31401b0 |
|
MD5 | 5eb491c5e9c7d9f78027295d0961e558 |
|
BLAKE2b-256 | c9f198d433eacfad9679789ef3f9f7ffa44f26a31286b56a8452136d0a5944af |
File details
Details for the file taxonomy-0.7.0-cp37-cp37m-macosx_10_7_x86_64.whl
.
File metadata
- Download URL: taxonomy-0.7.0-cp37-cp37m-macosx_10_7_x86_64.whl
- Upload date:
- Size: 308.7 kB
- Tags: CPython 3.7m, macOS 10.7+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.48.0 CPython/3.8.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | f36c3bb91b270b68b832907964982ab2c8809ed53979757bcc96f2e52fb4b516 |
|
MD5 | f782f08f43f090167723cbfbcb24a2b6 |
|
BLAKE2b-256 | 5fc403f2535a69b17c85ce3206bf9b24d28b6348c4987a7ea76c1c43642bcee5 |
File details
Details for the file taxonomy-0.7.0-cp36-cp36m-manylinux1_x86_64.whl
.
File metadata
- Download URL: taxonomy-0.7.0-cp36-cp36m-manylinux1_x86_64.whl
- Upload date:
- Size: 344.4 kB
- Tags: CPython 3.6m
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.48.0 CPython/3.8.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | f7c23da2522b819f66245053ec98088d2c6b962b06469dcbf6a0dc326b56dfac |
|
MD5 | 09d1b741a64d5d7162522e6bb93d2c75 |
|
BLAKE2b-256 | 9b252bf9fa4d0ac906e2ac6b1c4df4fb8cfa0a7aa69fb23edd850f17203810c8 |
File details
Details for the file taxonomy-0.7.0-cp35-cp35m-manylinux1_x86_64.whl
.
File metadata
- Download URL: taxonomy-0.7.0-cp35-cp35m-manylinux1_x86_64.whl
- Upload date:
- Size: 344.3 kB
- Tags: CPython 3.5m
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.48.0 CPython/3.8.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | c84f31a1de6e6f4f241d5f80c750f862addb9cea8482a01b840e016ac8b6ff47 |
|
MD5 | 8e22114db54490a46777f45ebf3f287d |
|
BLAKE2b-256 | a86acf630950d89ecc6919d0246964445ce64c40b12f5e21bccbe57f21ae4943 |