Skip to main content

A Python package for obtaining complete lineages and the lowest common ancestor (LCA) from a set of taxonomic identifiers.

Project description

taxopy

A Python package for obtaining complete lineages and the lowest common ancestor (LCA) from a set of taxonomic identifiers.

Installation

There are two ways to install taxopy:

  • Using pip:
pip install taxopy
  • Using conda:
conda install -c conda-forge -c bioconda taxopy

Usage

import taxopy

First you need to download taxonomic information from NCBI's servers and put this data into a TaxDb object:

taxdb = taxopy.TaxDb()
# You can also use your own set of taxonomy files:
taxdb = taxopy.TaxDb(nodes_dmp="taxdb/nodes.dmp", names_dmp="taxdb/names.dmp", keep_files=True)

The TaxDb object stores the name, rank and parent-child relationships of each taxonomic identifier:

print(taxdb.taxid2name['2'])
print(taxdb.taxid2parent['2'])
print(taxdb.taxid2rank['2'])
Bacteria
131567
superkingdom

To get information of a given taxon you can create a Taxon object using its taxonomic identifier:

saccharomyces = taxopy.Taxon('4930', taxdb)
human = taxopy.Taxon('9606', taxdb)
gorilla = taxopy.Taxon('9593', taxdb)
lagomorpha = taxopy.Taxon('9975', taxdb)

Each Taxon object stores a variety of information, such as the rank, identifier and name of the input taxon, and the identifiers and names of all the parent taxa:

print(lagomorpha.rank)
print(lagomorpha.name)
print(lagomorpha.name_lineage)
print(lagomorpha.rank_name_dictionary)
order
Lagomorpha
['Lagomorpha', 'Glires', 'Euarchontoglires', 'Boreoeutheria', 'Eutheria', 'Theria', 'Mammalia', 'Amniota', 'Tetrapoda', 'Dipnotetrapodomorpha', 'Sarcopterygii', 'Euteleostomi', 'Teleostomi', 'Gnathostomata', 'Vertebrata', 'Craniata', 'Chordata', 'Deuterostomia', 'Bilateria', 'Eumetazoa', 'Metazoa', 'Opisthokonta', 'Eukaryota', 'cellular organisms', 'root']
{'order': 'Lagomorpha', 'clade': 'Opisthokonta', 'superorder': 'Euarchontoglires', 'class': 'Mammalia', 'superclass': 'Sarcopterygii', 'subphylum': 'Craniata', 'phylum': 'Chordata', 'kingdom': 'Metazoa', 'superkingdom': 'Eukaryota'}

You can get the lowest common ancestor of a list of taxa using the find_lca function:

human_lagomorpha_lca = taxopy.find_lca([human, lagomorpha], taxdb)
print(human_lagomorpha_lca.name)
Euarchontoglires

You may also use the find_majority_vote to discover the most specific taxon that is shared by more than half of the lineages of a list of taxa:

majority_vote = taxopy.find_majority_vote([human, gorilla, lagomorpha], taxdb)
print(majority_vote.name)
Homininae

The find_majority_vote function allows you to control its stringency via the fraction parameter. For instance, if you would set fraction to 0.75 the resulting taxon would be shared by more than 75% of the input lineages. By default, fraction is 0.5.

majority_vote = taxopy.find_majority_vote([human, gorilla, lagomorpha], taxdb, fraction=0.75)
print(majority_vote.name)
Euarchontoglires

You can also assign weights to each input lineage:

majority_vote = taxopy.find_majority_vote([saccharomyces, human, gorilla, lagomorpha], taxdb)
weighted_majority_vote = taxopy.find_majority_vote([saccharomyces, human, gorilla, lagomorpha], taxdb, weights=[3, 1, 1, 1])
print(majority_vote.name)
print(weighted_majority_vote.name)
Euarchontoglires
Opisthokonta

Acknowledgements

Some of the code used in taxopy was taken from the CAT/BAT tool for taxonomic classification of contigs and metagenome-assembled genomes.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

taxopy-0.4.0.tar.gz (18.3 kB view details)

Uploaded Source

Built Distribution

taxopy-0.4.0-py3-none-any.whl (20.4 kB view details)

Uploaded Python 3

File details

Details for the file taxopy-0.4.0.tar.gz.

File metadata

  • Download URL: taxopy-0.4.0.tar.gz
  • Upload date:
  • Size: 18.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.1.1.post20200322 requests-toolbelt/0.9.1 tqdm/4.43.0 CPython/3.7.6

File hashes

Hashes for taxopy-0.4.0.tar.gz
Algorithm Hash digest
SHA256 b4043b02b9b92398dcca8249a6e134f6ed9770dc22d4e9ea746a2321ecfd7433
MD5 623a6738cb85709cae2d643eabba5f54
BLAKE2b-256 9c0cd30449c5005cafd76d5b017fe8b0aa914ce09b8fbfb4f2307be6c20c8cdf

See more details on using hashes here.

File details

Details for the file taxopy-0.4.0-py3-none-any.whl.

File metadata

  • Download URL: taxopy-0.4.0-py3-none-any.whl
  • Upload date:
  • Size: 20.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.1.1.post20200322 requests-toolbelt/0.9.1 tqdm/4.43.0 CPython/3.7.6

File hashes

Hashes for taxopy-0.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 53be1605da9680fe2874ec8b53acde47d84289e151bd6badd5a7881ce9c66f3c
MD5 1267b8f550aed4ae5313630b21d7aa23
BLAKE2b-256 42e32bfc02e46bfd05cb729fb40dd7200ab38a137687e11f4707966b130f95ad

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page