Skip to main content

Toolbox to manage NCBI taxonomical data

Project description

Readme

ncbi-taxonomist is still under development and has to be considered unstable.

Synopsis

ncbi-taxonomist handles and manages phylogenetic data from NCBI. It can:

  • map between taxids and names
  • resolve lineages
  • store obtained taxa and their data locally in a SQLite database
  • group taxa into user defined groups (locally)

taxonomist has several simple operations, e.g. map or import, which work together using pipes, e.g. to populate a database, map will fetch data from Entrez and print it to STDOUT while import reads the STDIN to populate a local database. Taxonomic information is obtained from Entrez, but a predownloaded database can be imported as well.

The true strength is to find and link related metadata from other Entrez databases, e.g. fetching data for metagenomic data-sets for specific or diverse group of organisms. It can store phylogenetic groups, e.g. all taxa for a specific project, in a local database.

ncbi-taxonomist uses several operations to manage taxonomic information.

Status

The basic operations are working and unlikely to change in the near future. Metadata querying is still in development.

Containers

WIP

Operations

map

map taxCollect taxonomic information from Entrez phylogeny server or loading a downloaded NCBI Taxonomy database into a local database. Returns taxonomic nodes in JSON.

Map taxid and names remotely

src/ncbi-taxonomist.py map -t 2 -n human -r

Map taxid and names from local database

src/ncbi-taxonomist.py map -t 2 -n human -db taxa.db

Map sequence accessions

src/ncbi-taxonomist.py map -t 2 -n human -db taxa.db

Format specific filed from map output to csv using jq

src/ncbi-taxonomist.py map -t 2 -n human -r \ |
jq -r '[.taxon_id, .names.scientific_name] | @csv

import

Importing stores taxa in a local SQLite database. The taxa are fetched remotely.

Import taxa

src/ncbi-taxonomist.py map -t 2 -n human -r  | \
src/ncbi-taxonomist.py import -db testdb.sql

Resolve

Resolves lineages for names and taxids. The result is a JSON array with the taxa defining the lineage in ascending order. This guarnatees the query is the first element in the array.

Further extraction can be done via a script reading JSON arrays line-be-line or via othe tools, e.g. jq [REF]

Resolve and format via jq

src/ncbi_taxonomist/ncbi-taxonomist.py resolve  -n man  -db testdb2.sql |  \
jq -r  '[.[] |  .names.scientific_name ]| @tsv'

Resolve accessions remotely

src/ncbi-taxonomist.py map  -a MH842226.1 NQWZ01000003.1 -r | \
src/ncbi-taxonomist.py resolve -m -r

Extract

Extract nodes from a specified superkingdom and subtree WIP

Group WIP

Collect NCBI taxids into a group for later use

Output

JSON is returned because I have no clue what you want to do with the result. This allows to write quick parsers for your needs.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ncbi-taxonomist-0.3.0.dev128.tar.gz (19.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ncbi_taxonomist-0.3.0.dev128-py3-none-any.whl (49.0 kB view details)

Uploaded Python 3

File details

Details for the file ncbi-taxonomist-0.3.0.dev128.tar.gz.

File metadata

  • Download URL: ncbi-taxonomist-0.3.0.dev128.tar.gz
  • Upload date:
  • Size: 19.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.4.2 requests/2.21.0 setuptools/41.1.0 requests-toolbelt/0.8.0 tqdm/4.30.0 CPython/3.7.5

File hashes

Hashes for ncbi-taxonomist-0.3.0.dev128.tar.gz
Algorithm Hash digest
SHA256 cf06511d9c2edd70e9472d2591e7c0cf383759b0a4307950eb970a09b4d4fbd5
MD5 4a4945b611c59a6a2bad520ac0dc65f3
BLAKE2b-256 1d059c3a31bbeb19ddceaddf8e9556c07271cc0e19965b54263e7ce69f5d7b77

See more details on using hashes here.

File details

Details for the file ncbi_taxonomist-0.3.0.dev128-py3-none-any.whl.

File metadata

  • Download URL: ncbi_taxonomist-0.3.0.dev128-py3-none-any.whl
  • Upload date:
  • Size: 49.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.4.2 requests/2.21.0 setuptools/41.1.0 requests-toolbelt/0.8.0 tqdm/4.30.0 CPython/3.7.5

File hashes

Hashes for ncbi_taxonomist-0.3.0.dev128-py3-none-any.whl
Algorithm Hash digest
SHA256 a96553cba3cd82bcaadc2d9fa7ed77c7b049d40fc09e7e6691e29d092fc03735
MD5 2ad6cfbaa4f12f872e7d1a252057b6ed
BLAKE2b-256 212d51a7c494a30cad0e94fa0a00f77fc6c70029aff1083771c35541bab05c9b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page