Skip to main content

Routines for loading, saving, and manipulating taxonomic trees

Project description

Taxonomy

PyPI version Crates version CI

This is a Rust library for reading, writing, and editing biological taxonomies. There are associated Python bindings for accessing most of the functionality from Python.

This library was developed initially as a component in One Codex's metagenomic classification pipeline before being refactored out, expanded, and open-sourced. It is designed such that it can be used as is with a number of taxonomic formats or the Taxonomy trait it provides can be used to add last common ancestor, traversal, etc. methods to a downstream package's taxonomy implementation.

The library ships with a number of features:

  • Common support for taxonomy handling across Rust and Python
  • Fast and low(er) memory usage
  • NCBI taxonomy, JSON ("tree" and "node_link_data" formats), Newick, and PhyloXML support
  • Easily extensible (in Rust) to support other formats and operations

Installation

Rust

This library can be added to an existing Cargo.toml file and installed straight from crates.io.

Python

You can install the Python bindings directly from PyPI (binaries are only built for select architectures) with:

pip install taxonomy

Python Usage

The Python taxonomy API can open and manipulate all of the formats from the Rust library. Note that Taxonomy IDs in NCBI format are integers, but they're converted to strings on import. We find working with "string taxonomy IDs" greatly simplifies inter-operation between different taxonomy systems.

Loading a taxonomy

Taxonomy can be loaded from a variety of sources.

  1. Taxonomy.from_newick(value: str): loads a Taxonomy from a Newick-encoded string.

  2. Taxonomy.from_ncbi(nodes_path: str, names_path: str): loads a Taxonomy from a pair of NCBI dump files. The paths specified are to the individual files in the NCBI taxonomy directory (e.g. nodes.dmp and names.dmp).

  3. Taxonomy.from_json(value: str, /, path: List[str]): loads a Taxonomy from a JSON-encoded string. The format can either be of the tree or node_link_data types and will be automatically detected. If path is specified, the JSON will be traversed to that sub-object before being parsed as a taxonomy.

  4. Taxonomy.from_phyloxml(value: &str): loads a Taxonomy from a PhyloXML-encoded string. Experimental

Exporting a taxonomy

Assuming that the taxonomy has been instantiated as a variable named tax.

  1. tax.to_newick(): exports a Taxonomy as a Newick-encoded byte string.

  2. tax.to_json(/, as_node_link_data: bool): exports a Taxonomy as a JSON-encoded byte string. By default, the JSON format is a tree format unless the as_node_link_data parameter is set to True.

Using a taxonomy

Assuming that the taxonomy has been instantiated as a variable named tax. Note that TaxonomyNode is a class with the following schema:

class TaxonomyNode:
    id: str
    name: str
    parent: Optional[str]
    rank: str

Note that tax_id in parameters passed in functions described below are string but for example in the case of NCBI need to be essentially quoting integers: 562 -> "562". In that case, passing something that can't be converted to a number will raise an exception even if the documentation below does not mention it.

tax.root -> TaxonomyNode

Points to the root of the taxonomy

tax.parent(tax_id: str, /, at_rank: str) -> Optional[TaxonomyNode]

Return the immediate parent TaxonomyNode of the node id.

If at_rank is provided, scan all the nodes in the node's lineage and return the parent id at that rank.

Examples:

parent = tax.parent("612")
parent = tax.parent("612", at_rank="species")
parent = tax.parent("612")
# Both variables will be `None` if we can't find the parent
parent = tax.parent("unknown")

tax.parent_with_distance(tax_id: str, /, at_rank: str) -> (Optional[TaxonomyNode], Optional[float])

Same as parent but return the distance in addition, as a (TaxonomyNode, float) tuple.

tax.node(tax_id: str) -> Optional[TaxonomyNode]

Returns the node at that id. Returns None if not found. You can also use indexing to accomplish that: tax["some_id"] but this will raise an exception if the node is not found.

tax.find_by_name(name: str) -> Optional[TaxonomyNode]

Returns the node with that name. Returns None if not found. In NCBI, it only accounts for scientific names and not synonyms.

tax.children(tax_id: str) -> List[TaxonomyNode]

Returns all nodes below the given tax id.

tax.lineage(tax_id: str) -> List[TaxonomyNode]

Returns all nodes above the given tax id, including itself.

tax.parents(tax_id: str) -> List[TaxonomyNode]

Returns all nodes above the given tax id.

tax.lca(id1: str, id2: str) -> Optional[TaxonomyNode]

Returns the lowest common ancestor for the 2 given nodes.

tax.prune(keep: List[str], remove: List[str])-> Taxonomy

Return a copy of the taxonomy containing:

  • only the nodes in keep and their parents if provided
  • all of the nodes except those in remove and their children if provided

tax.remove_node(tax_id: str)

Remove the node from the tree, re-attaching parents as needed: only a single node is removed.

tax.add_node(parent_tax_id: str, new_tax_id: str)

Add a new node to the tree at the parent provided.

edit_node(tax_id: str, /, name: str, rank: str, parent_id: str, parent_dist: float)

Edit properties on a taxonomy node.

Exceptions

Only one exception is raised intentionally by the library: TaxonomyError. If you get a pyo3_runtime.PanicException (or anything with pyo3 in its name), this is a bug in the underlying Rust library, please open an issue.

Development

Rust

There is a test suite runable with cargo test. To test the Python-bindings you need to use the additional python_test feature: cargo test --features python_test.

Python

To work on the Python library on a Mac OS X/Unix system (requires Python 3):

# you need the nightly version of Rust installed
curl https://sh.rustup.rs -sSf | sh
rustup default nightly

# finally, install the library in the local virtualenv
maturin develop --cargo-extra-args="--features=python"

# or using pip
pip install .

Building binary wheels and pushing to PyPI

# The Mac build requires switching through a few different python versions
maturin build --cargo-extra-args="--features=python" --release --strip

# The linux build is automated through cross-compiling in a docker image
docker run --rm -v $(pwd):/io konstin2/maturin:master build --cargo-extra-args="--features=python" --release --strip
twine upload target/wheels/*

Other Taxonomy Libraries

There are taxonomic toolkits for other programming languages that offer different features and provided some inspiration for this library:

ETE Toolkit (http://etetoolkit.org/) A Python taxonomy library

Taxize (https://ropensci.github.io/taxize-book/) An R toolkit for working with taxonomic data

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

taxonomy-0.6.0.tar.gz (83.3 kB view details)

Uploaded Source

Built Distributions

taxonomy-0.6.0-cp38-cp38-manylinux1_x86_64.whl (346.2 kB view details)

Uploaded CPython 3.8

taxonomy-0.6.0-cp38-cp38-macosx_10_7_x86_64.whl (310.7 kB view details)

Uploaded CPython 3.8 macOS 10.7+ x86-64

taxonomy-0.6.0-cp37-cp37m-manylinux1_x86_64.whl (346.3 kB view details)

Uploaded CPython 3.7m

taxonomy-0.6.0-cp37-cp37m-macosx_10_7_x86_64.whl (310.7 kB view details)

Uploaded CPython 3.7m macOS 10.7+ x86-64

taxonomy-0.6.0-cp36-cp36m-manylinux1_x86_64.whl (346.6 kB view details)

Uploaded CPython 3.6m

taxonomy-0.6.0-cp35-cp35m-manylinux1_x86_64.whl (346.3 kB view details)

Uploaded CPython 3.5m

File details

Details for the file taxonomy-0.6.0.tar.gz.

File metadata

  • Download URL: taxonomy-0.6.0.tar.gz
  • Upload date:
  • Size: 83.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.48.0 CPython/3.8.3

File hashes

Hashes for taxonomy-0.6.0.tar.gz
Algorithm Hash digest
SHA256 ebae29b73b65e21e9015c550bf483eca3edf84ba4b4851b66b5e1ec5d571af29
MD5 f8de0ae9a484351ff3a53765b46901d7
BLAKE2b-256 5ef62fd820956cbd5feefb0f109ebcb7e0ae044446ce77b6deca0ef876335236

See more details on using hashes here.

File details

Details for the file taxonomy-0.6.0-cp38-cp38-manylinux1_x86_64.whl.

File metadata

  • Download URL: taxonomy-0.6.0-cp38-cp38-manylinux1_x86_64.whl
  • Upload date:
  • Size: 346.2 kB
  • Tags: CPython 3.8
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.48.0 CPython/3.8.3

File hashes

Hashes for taxonomy-0.6.0-cp38-cp38-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 78cfad8e349d2a7cdbe55ef7dff146834dcbcd39f559f30681ce190fb2de0b54
MD5 b626ffa6b4df3e7168fdd7bee36385dc
BLAKE2b-256 31d31cb28a7b801091a0dc29fdb0ec2709e122675a9732c67e0e046d79ee1c2c

See more details on using hashes here.

File details

Details for the file taxonomy-0.6.0-cp38-cp38-macosx_10_7_x86_64.whl.

File metadata

  • Download URL: taxonomy-0.6.0-cp38-cp38-macosx_10_7_x86_64.whl
  • Upload date:
  • Size: 310.7 kB
  • Tags: CPython 3.8, macOS 10.7+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.48.0 CPython/3.8.3

File hashes

Hashes for taxonomy-0.6.0-cp38-cp38-macosx_10_7_x86_64.whl
Algorithm Hash digest
SHA256 45100e5eaa93ecec69cd31d33cb7c33f34b19a39879061f8fa15e2cbd042e240
MD5 291f3c1f494f0ca3531ffe4727639808
BLAKE2b-256 1f01444a2fdfe753c9c76257853b103ed34f18291cce93247429f11d038d608c

See more details on using hashes here.

File details

Details for the file taxonomy-0.6.0-cp37-cp37m-manylinux1_x86_64.whl.

File metadata

  • Download URL: taxonomy-0.6.0-cp37-cp37m-manylinux1_x86_64.whl
  • Upload date:
  • Size: 346.3 kB
  • Tags: CPython 3.7m
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.48.0 CPython/3.8.3

File hashes

Hashes for taxonomy-0.6.0-cp37-cp37m-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 2b64a85553654508e2cd0a0d87a9775093029146ddfa1836b0b5edc6a01a2a49
MD5 9de298ea78e75cf55c68d2e49b2050e1
BLAKE2b-256 fde66421369c68260820f183994ef24cfe10681cab4488ae6a6e688d7bd4af58

See more details on using hashes here.

File details

Details for the file taxonomy-0.6.0-cp37-cp37m-macosx_10_7_x86_64.whl.

File metadata

  • Download URL: taxonomy-0.6.0-cp37-cp37m-macosx_10_7_x86_64.whl
  • Upload date:
  • Size: 310.7 kB
  • Tags: CPython 3.7m, macOS 10.7+ x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.48.0 CPython/3.8.3

File hashes

Hashes for taxonomy-0.6.0-cp37-cp37m-macosx_10_7_x86_64.whl
Algorithm Hash digest
SHA256 3b068011297bb5b94842e719fe3f16ab306b81c826110d85bd7c4c7615e0f670
MD5 378572d40ee8c58490f8e7d8c1838f27
BLAKE2b-256 a1206f9b09256bd0ecbd3420409249756ed4e899ff1edf5b9eaec6df87f0ddd6

See more details on using hashes here.

File details

Details for the file taxonomy-0.6.0-cp36-cp36m-manylinux1_x86_64.whl.

File metadata

  • Download URL: taxonomy-0.6.0-cp36-cp36m-manylinux1_x86_64.whl
  • Upload date:
  • Size: 346.6 kB
  • Tags: CPython 3.6m
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.48.0 CPython/3.8.3

File hashes

Hashes for taxonomy-0.6.0-cp36-cp36m-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 66c5d7572f88889c01c6ffafeaf21faca05ee3d46017279d988491843814df59
MD5 9693e187e96b7fc87cdd80dab00df1bd
BLAKE2b-256 f9295628b70cb8dd66fbb33858fb590061dc73f173acba5df74336159f5bec85

See more details on using hashes here.

File details

Details for the file taxonomy-0.6.0-cp35-cp35m-manylinux1_x86_64.whl.

File metadata

  • Download URL: taxonomy-0.6.0-cp35-cp35m-manylinux1_x86_64.whl
  • Upload date:
  • Size: 346.3 kB
  • Tags: CPython 3.5m
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.48.0 CPython/3.8.3

File hashes

Hashes for taxonomy-0.6.0-cp35-cp35m-manylinux1_x86_64.whl
Algorithm Hash digest
SHA256 9a2c757036a8b3d94eb660690203bb596c577f7c3049d2b43221f8bad219fd9d
MD5 14b6618d0ede07d9725d05226060495b
BLAKE2b-256 6766732cf5cc8c82d543ea8fd24e22875f885ece92a21cb1fd5ba8a40624de48

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page