Load, save, and manipulate taxonomic trees
Project description
# Taxonomy
[![PyPI version](https://badge.fury.io/py/taxonomy.svg)](https://pypi.org/project/taxonomy/)
[![Crates version](https://img.shields.io/crates/v/taxonomy.svg)](https://crates.io/crates/taxonomy)
[![CircleCI](https://circleci.com/gh/onecodex/taxonomy.svg?style=shield)](https://circleci.com/gh/onecodex/taxonomy)
This is a Rust library for reading, writing, and editing biological taxonomies. There are associated Python bindings for accessing most of the functionality from Python.
This library was developed initially as a component in One Codex's metagenomic classification pipeline before being refactored out, expanded, and open-sourced. It is designed such that it can be used *as is* with a number of taxonomic formats *or* the Taxonomy trait it provides can be used to add last common ancestor, traversal, etc. methods to a downstream package's taxonomy implementation.
The library ships with a number of features:
- [X] Common support for taxonomy handling across Rust and Python
- [X] Fast and low(er) memory usage
- [X] NCBI taxonomy, JSON ("tree" and "node_link_data" formats), Newick, and PhyloXML support
- [X] Easily extensible (in Rust) to support other formats and operations
# Python Usage
The Python taxonomy API can open and manipulate all of the formats from the Rust library:
```python
from taxonomy import Taxonomy
tax = Taxonomy.from_newick('(A,(B,C)D)E;')
assert tax.parent('A') == 'E'
assert tax.parent('B') == 'D'
```
If you have the NCBI taxonomy locally ([found on their FTP](ftp://ftp.ncbi.nlm.nih.gov/pub/taxonomy/taxdump.tar.gz)), you can use that too:
```python
ncbi_tax = Taxonomy.from_ncbi('./nodes.dmp', './names.dmp')
assert tax.name('562') == 'Escherichia coli'
assert tax.rank('562') == 'species'
```
Note that Taxonomy IDs in NCBI format are integers, but they're converted to strings on import. We find working with "string taxonomy IDs" greatly simplifies interoperation between different taxonomy systems.
# Installation
## Rust
This library can be added to an existing Cargo.toml file and installed straight from crates.io.
## Python
You can install the Python bindings directly from PyPI (binaries are only built for select architextures) with:
```bash
pip install taxonomy
```
# Development
## Rust
There is a test suite runable with `cargo test`.
## Python
To work on the Python library on a Mac OS X/Unix system (requires Python 3):
```bash
# you need the nightly version of Rust installed
curl https://sh.rustup.rs -sSf | sh
rustup default nightly
# finally, install the library
./setup.py install # (or ./setup.py develop)
```
# Other Taxonomy Libraries
There are taxonomic toolkits for other programming languages that offer different features and provided some inspiration for this library:
*ETE Toolkit (http://etetoolkit.org/)* A Python taxonomy library
*Taxize (https://ropensci.github.io/taxize-book/)* An R toolkit for working with taxonomic data
[![PyPI version](https://badge.fury.io/py/taxonomy.svg)](https://pypi.org/project/taxonomy/)
[![Crates version](https://img.shields.io/crates/v/taxonomy.svg)](https://crates.io/crates/taxonomy)
[![CircleCI](https://circleci.com/gh/onecodex/taxonomy.svg?style=shield)](https://circleci.com/gh/onecodex/taxonomy)
This is a Rust library for reading, writing, and editing biological taxonomies. There are associated Python bindings for accessing most of the functionality from Python.
This library was developed initially as a component in One Codex's metagenomic classification pipeline before being refactored out, expanded, and open-sourced. It is designed such that it can be used *as is* with a number of taxonomic formats *or* the Taxonomy trait it provides can be used to add last common ancestor, traversal, etc. methods to a downstream package's taxonomy implementation.
The library ships with a number of features:
- [X] Common support for taxonomy handling across Rust and Python
- [X] Fast and low(er) memory usage
- [X] NCBI taxonomy, JSON ("tree" and "node_link_data" formats), Newick, and PhyloXML support
- [X] Easily extensible (in Rust) to support other formats and operations
# Python Usage
The Python taxonomy API can open and manipulate all of the formats from the Rust library:
```python
from taxonomy import Taxonomy
tax = Taxonomy.from_newick('(A,(B,C)D)E;')
assert tax.parent('A') == 'E'
assert tax.parent('B') == 'D'
```
If you have the NCBI taxonomy locally ([found on their FTP](ftp://ftp.ncbi.nlm.nih.gov/pub/taxonomy/taxdump.tar.gz)), you can use that too:
```python
ncbi_tax = Taxonomy.from_ncbi('./nodes.dmp', './names.dmp')
assert tax.name('562') == 'Escherichia coli'
assert tax.rank('562') == 'species'
```
Note that Taxonomy IDs in NCBI format are integers, but they're converted to strings on import. We find working with "string taxonomy IDs" greatly simplifies interoperation between different taxonomy systems.
# Installation
## Rust
This library can be added to an existing Cargo.toml file and installed straight from crates.io.
## Python
You can install the Python bindings directly from PyPI (binaries are only built for select architextures) with:
```bash
pip install taxonomy
```
# Development
## Rust
There is a test suite runable with `cargo test`.
## Python
To work on the Python library on a Mac OS X/Unix system (requires Python 3):
```bash
# you need the nightly version of Rust installed
curl https://sh.rustup.rs -sSf | sh
rustup default nightly
# finally, install the library
./setup.py install # (or ./setup.py develop)
```
# Other Taxonomy Libraries
There are taxonomic toolkits for other programming languages that offer different features and provided some inspiration for this library:
*ETE Toolkit (http://etetoolkit.org/)* A Python taxonomy library
*Taxize (https://ropensci.github.io/taxize-book/)* An R toolkit for working with taxonomic data
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
No source distribution files available for this release.See tutorial on generating distribution archives.
Built Distributions
File details
Details for the file taxonomy-0.3.1-cp37-cp37m-manylinux1_x86_64.whl
.
File metadata
- Download URL: taxonomy-0.3.1-cp37-cp37m-manylinux1_x86_64.whl
- Upload date:
- Size: 4.7 MB
- Tags: CPython 3.7m
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.19.1 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.7.2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3e9da9a39ada679808f1def4f9423a90c5ff756ef7ebffb83df87fe7f6637ad3 |
|
MD5 | 3fd16a565b1c8d5cea78e507f8005b85 |
|
BLAKE2b-256 | dfc0da3b64a6ebc85f37efa7b15a3fb57712174bf0b0ecb9d92fb4335be7960e |
File details
Details for the file taxonomy-0.3.1-cp37-cp37m-macosx_10_14_x86_64.whl
.
File metadata
- Download URL: taxonomy-0.3.1-cp37-cp37m-macosx_10_14_x86_64.whl
- Upload date:
- Size: 501.0 kB
- Tags: CPython 3.7m, macOS 10.14+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.19.1 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.7.2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 18fef9d9e6a2cc582010d5c43786a599c555d04536af4c6403356d52c8ebaba5 |
|
MD5 | 59d0f76ce586f8dd962bb3c3e55a5144 |
|
BLAKE2b-256 | 5172093ac939ecfd7519c00ce6332a839e46c3fc1e7e74aec547509edfc94071 |
File details
Details for the file taxonomy-0.3.1-cp36-cp36m-manylinux1_x86_64.whl
.
File metadata
- Download URL: taxonomy-0.3.1-cp36-cp36m-manylinux1_x86_64.whl
- Upload date:
- Size: 4.7 MB
- Tags: CPython 3.6m
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.19.1 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.7.2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9bd6b9bbfaec1669dbeea204e7fb3f7e763e023dc51b1e4ca303e59e1ca2cdc0 |
|
MD5 | 49e6caa5a70f376664a20b19dc8ba6e6 |
|
BLAKE2b-256 | 1267225bf2c55e6fcb2b9e52820de842b3e92d15c59106c51b6496f148eac5f8 |
File details
Details for the file taxonomy-0.3.1-cp36-cp36m-macosx_10_13_x86_64.whl
.
File metadata
- Download URL: taxonomy-0.3.1-cp36-cp36m-macosx_10_13_x86_64.whl
- Upload date:
- Size: 499.9 kB
- Tags: CPython 3.6m, macOS 10.13+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.20.0 setuptools/39.0.1 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.6.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 63f9208dff7d98923cbbac57e4cbf3151a3d3df646a5edc194b0e9e5897b8001 |
|
MD5 | 46fc9f3dfaf88a4a7728908c725d052c |
|
BLAKE2b-256 | 62ee4bd22e957dd1bdbd536cf64ddf7967206698c8fe594637e9650919680d7e |
File details
Details for the file taxonomy-0.3.1-cp35-cp35m-manylinux1_x86_64.whl
.
File metadata
- Download URL: taxonomy-0.3.1-cp35-cp35m-manylinux1_x86_64.whl
- Upload date:
- Size: 4.7 MB
- Tags: CPython 3.5m
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.19.1 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.7.2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5a683e711c1a3196603f0bb1a1664f55f42341bc319f564ba151b3955bfcf701 |
|
MD5 | 1fddbdf02d771fc05fd2df8740f6a3e9 |
|
BLAKE2b-256 | a9de3bbea08c69e149e7599663f83cad5781ff15943b6655d9ff7c8bb5add4a8 |