Skip to main content

MRIGlobal's taxonomy related operators

Project description

MRItaxonomy - MRIGlobals Taxonomy Library

A compendium of convenient taxonomic related operations interfacing with NCBI

installation

pip install MRItaxonomy

import the whole module, or sub-modules

import MRItaxonomy

or

from MRItaxonomy import NCBI_fetch
from MRItaxonomy import accession2taxid
from MRItaxonomy import nnID
from MRItaxonomy import nucDL
from MRItaxonomy import protacc2taxid
from MRItaxonomy import slidingwindow
from MRItaxonomy import taxid
from MRItaxonomy import taxid2name

NCBI_fetch

functions to intially set up and update the NCBI data pulls. The initialize() command is automatically the first time the databases are accessed. This does not have to be done again

from MRItaxonomy import NCBI_fetch
NCBI_fetch.initialize()

To update, re-pull the latest from NCBI (will not re-download if no change) using the update() command

NCBI_fetch.update()

accession2taxid

contains a method that loads the accession-to-taxid mapping data object, and a function that reports the associated taxid for the passed accession

from MRItaxonomy import accession2taxid
accession2taxid.load_trie()    # note: does not have to be called. will automatically be applied the first time get_taxid() is run
accession2taxid.get_taxid(accession)

nnID

contains a method that, from a given taxon, returns a list of taxonomies that are near neighbors to the given taxon

from MRItaxonomy import nnID
nnID.get_id(taxon)

nucDL

contains methods to access NCBI's ftp site and download nucleotide records. takes in a taxid, thread count (for parallelism), a working directory, and a database choise between genbank/refseq

from MRItaxonomy import nucDL
nucDL.dl(tax, threads, path, db='genbank'/'refseq')

protacc2taxid

works similarly as accession2taxid does, but with protein accessions instead of nucleotide

from MRItaxonomy import protacc2taxid
protacc2taxid.load_dataframe()    # note: does not have to be called. will automatically be applied the first time get_taxid() is run
protacc2taxid.get_taxid(prot_accession)

slidingwindow

this module slides a window across a folder of nucleotide records and outputs window-sized reads along the length of the input nucleotide record. can specify what suffix to use for each chunked output (default=.fna)

from MRItaxonomy import slidingwindow
slidingwindow.reads_generation(path_of_fasta_folder, window_size=150, extension='.fna')

taxid

this module handles operations having to do with the taxonomic trees via the NCBI nodes.dmp and merged.dmp files

from MRItaxonomy import taxid
taxid.load_dbs()    # note: does not have to be called. will automatically be applied the first time another MRItaxonomy.taxid() function needs the databases

taxid.get_parent(taxid)    # returns the parent taxid for the given taxid

taxid.get_rank(taixd)    # returns the rank of the given taxid (superkingdom, kingdom, phylum, class, order, family, genus, species)

taxid.getnodeatrank(taxid, selected rank)    # returns the taxid at the taxonomic rank (superkingdom, kingdom, phylum, class, order, family, genus, species) for the given taxid

taxid.get_merge(taxid)    # if the taxid is in the nodes.dmp database, returns the taxid. otherwise, if it's in the merged database, return the associated merged.dmp entry. If neither is true, returns 0.

taxid2name

this module takes in a taxid and returns the associated scientific name

from MRItaxonomy import taxid2name
taxid.get_name(taxid)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

MRItaxonomy-1.1.1.tar.gz (20.8 kB view details)

Uploaded Source

Built Distribution

MRItaxonomy-1.1.1-py3-none-any.whl (24.2 kB view details)

Uploaded Python 3

File details

Details for the file MRItaxonomy-1.1.1.tar.gz.

File metadata

  • Download URL: MRItaxonomy-1.1.1.tar.gz
  • Upload date:
  • Size: 20.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.4.2 requests/2.22.0 setuptools/45.2.0 requests-toolbelt/0.8.0 tqdm/4.30.0 CPython/3.8.10

File hashes

Hashes for MRItaxonomy-1.1.1.tar.gz
Algorithm Hash digest
SHA256 b7977beb44a9a396fdbedc20614c393b2428d54d55c2607e47f47d917914b1b5
MD5 15c805f659159ce177cdbff3efb64021
BLAKE2b-256 412a3c42dd237910685193d6aa01415da60dc2ba1a0371148f957e21b8099da5

See more details on using hashes here.

File details

Details for the file MRItaxonomy-1.1.1-py3-none-any.whl.

File metadata

  • Download URL: MRItaxonomy-1.1.1-py3-none-any.whl
  • Upload date:
  • Size: 24.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.4.2 requests/2.22.0 setuptools/45.2.0 requests-toolbelt/0.8.0 tqdm/4.30.0 CPython/3.8.10

File hashes

Hashes for MRItaxonomy-1.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 a88eb9751371a0b029558dec046979bce258bff3cc51618ebb8e6f718545fefb
MD5 aa9e3ff6960fd66f1139c550acc2e4f3
BLAKE2b-256 f0be890850f3f72889e04225c34a88e8787279882aed796b9c75f62a5600f944

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page