Skip to main content

A Python package for obtaining complete lineages and the lowest common ancestor (LCA) from a set of taxonomic identifiers.

Project description

taxopy

A Python package for manipulating NCBI-formatted taxonomic databases. Allows you to obtain complete lineages, determine lowest common ancestors (LCAs), get taxa names from their taxids, and more!

Installation

There are two ways to install taxopy:

  • Using pip:
pip install taxopy
  • Using conda:
conda install -c conda-forge -c bioconda taxopy

Usage

import taxopy

First you need to download taxonomic information from NCBI's servers and put this data into a TaxDb object:

taxdb = taxopy.TaxDb()
# You can also use your own set of taxonomy files:
taxdb = taxopy.TaxDb(nodes_dmp="taxdb/nodes.dmp", names_dmp="taxdb/names.dmp")
# If you want to support legacy taxonomic identifiers (that were merged to other identifier), you also need to provide a `merged.dmp` file. This is not necessary if the data is being downloaded from NCBI.
taxdb = taxopy.TaxDb(nodes_dmp="taxdb/nodes.dmp", names_dmp="taxdb/names.dmp", merged_dmp="taxdb/merged.dmp", keep_files=True)

The TaxDb object stores the name, rank and parent-child relationships of each taxonomic identifier:

print(taxdb.taxid2name[2])
print(taxdb.taxid2parent[2])
print(taxdb.taxid2rank[2])
Bacteria
131567
superkingdom

If you want to retrieve the new taxonomic identifier of a legacy identifier you can use the `` attribute:

print(taxdb.oldtaxid2newtaxid[260])
143224

To get information of a given taxon you can create a Taxon object using its taxonomic identifier:

saccharomyces = taxopy.Taxon(4930, taxdb)
human = taxopy.Taxon(9606, taxdb)
gorilla = taxopy.Taxon(9593, taxdb)
lagomorpha = taxopy.Taxon(9975, taxdb)

Each Taxon object stores a variety of information, such as the rank, identifier and name of the input taxon, and the identifiers and names of all the parent taxa:

print(lagomorpha.rank)
print(lagomorpha.name)
print(lagomorpha.name_lineage)
print(lagomorpha.rank_name_dictionary)
order
Lagomorpha
['Lagomorpha', 'Glires', 'Euarchontoglires', 'Boreoeutheria', 'Eutheria', 'Theria', 'Mammalia', 'Amniota', 'Tetrapoda', 'Dipnotetrapodomorpha', 'Sarcopterygii', 'Euteleostomi', 'Teleostomi', 'Gnathostomata', 'Vertebrata', 'Craniata', 'Chordata', 'Deuterostomia', 'Bilateria', 'Eumetazoa', 'Metazoa', 'Opisthokonta', 'Eukaryota', 'cellular organisms', 'root']
{'order': 'Lagomorpha', 'clade': 'Opisthokonta', 'superorder': 'Euarchontoglires', 'class': 'Mammalia', 'superclass': 'Sarcopterygii', 'subphylum': 'Craniata', 'phylum': 'Chordata', 'kingdom': 'Metazoa', 'superkingdom': 'Eukaryota'}

LCA and majority vote

You can get the lowest common ancestor of a list of taxa using the find_lca function:

human_lagomorpha_lca = taxopy.find_lca([human, lagomorpha], taxdb)
print(human_lagomorpha_lca.name)
Euarchontoglires

You may also use the find_majority_vote to discover the most specific taxon that is shared by more than half of the lineages of a list of taxa:

majority_vote = taxopy.find_majority_vote([human, gorilla, lagomorpha], taxdb)
print(majority_vote.name)
Homininae

The find_majority_vote function allows you to control its stringency via the fraction parameter. For instance, if you would set fraction to 0.75 the resulting taxon would be shared by more than 75% of the input lineages. By default, fraction is 0.5.

majority_vote = taxopy.find_majority_vote([human, gorilla, lagomorpha], taxdb, fraction=0.75)
print(majority_vote.name)
Euarchontoglires

You can also assign weights to each input lineage:

majority_vote = taxopy.find_majority_vote([saccharomyces, human, gorilla, lagomorpha], taxdb)
weighted_majority_vote = taxopy.find_majority_vote([saccharomyces, human, gorilla, lagomorpha], taxdb, weights=[3, 1, 1, 1])
print(majority_vote.name)
print(weighted_majority_vote.name)
Euarchontoglires
Opisthokonta

To check the level of agreement between the taxa that were aggregated using find_majority_vote and the output taxon, you can check the agreement attribute.

print(majority_vote.agreement)
print(weighted_majority_vote.agreement)
0.75
1.0

Taxid from name

If you only have the name of a taxon, you can get its corresponding taxid using the taxid_from_name function:

taxid = taxopy.taxid_from_name('Homininae', taxdb)
print(taxid)
[207598]

This function returns a list of all taxonomic identifiers associated with the input name. In the case of homonyms, the list will contain multiple taxonomic identifiers:

taxid = taxopy.taxid_from_name('Aotus', taxdb)
print(taxid)
[9504, 114498]

Acknowledgements

Some of the code used in taxopy was taken from the CAT/BAT tool for taxonomic classification of contigs and metagenome-assembled genomes.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

taxopy-0.9.2.tar.gz (21.4 kB view details)

Uploaded Source

Built Distribution

taxopy-0.9.2-py3-none-any.whl (22.1 kB view details)

Uploaded Python 3

File details

Details for the file taxopy-0.9.2.tar.gz.

File metadata

  • Download URL: taxopy-0.9.2.tar.gz
  • Upload date:
  • Size: 21.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.61.1 CPython/3.9.2

File hashes

Hashes for taxopy-0.9.2.tar.gz
Algorithm Hash digest
SHA256 415d091fb7e797c8373fd3edadfdc2066937509267f1cb3e5e460cae72cb2edd
MD5 9e9911952315ae44717dfeb8d0bdc3c0
BLAKE2b-256 a07231750a8e8ae3ccc83a6311b8275628195d43a2e89577cbd90008be43a624

See more details on using hashes here.

File details

Details for the file taxopy-0.9.2-py3-none-any.whl.

File metadata

  • Download URL: taxopy-0.9.2-py3-none-any.whl
  • Upload date:
  • Size: 22.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.0.1 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.61.1 CPython/3.9.2

File hashes

Hashes for taxopy-0.9.2-py3-none-any.whl
Algorithm Hash digest
SHA256 6cbebb2e21781637efbfeaa4e90cd5784ec1609cf72c8df308c1be966f4e44da
MD5 2d1af95c42ce5a632c7d1245c4326362
BLAKE2b-256 588cea2e5b30bd6b08b2ec807abf57495ba864983c450f01a0cb3b89f780ec0e

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page