Toolbox to manage NCBI taxonomical data
Project description
Readme
ncbi-taxonomist is still under development and has to be considered unstable.
Synopsis
ncbi-taxonomist handles and manages phylogenetic data from NCBI. It can:
- map between taxids and names
- resolve lineages
- store obtained taxa and their data locally in a SQLite database
- group taxa into user defined groups (locally)
taxonomist has several simple operations, e.g. map or import, which work together using pipes, e.g. to populate a database, map will fetch data from Entrez and print it to STDOUT while import reads the STDIN to populate a local database. Taxonomic information is obtained from Entrez, but a predownloaded database can be imported as well.
The true strength is to find and link related metadata from other Entrez databases, e.g. fetching data for metagenomic data-sets for specific or diverse group of organisms. It can store phylogenetic groups, e.g. all taxa for a specific project, in a local database.
ncbi-taxonomist uses several operations to manage taxonomic information.
Status
The basic operations are working and unlikely to change in the near future. Metadata querying is still in development.
Containers
WIP
Operations
map
map taxCollect taxonomic information from Entrez phylogeny server or loading a downloaded NCBI Taxonomy database into a local database. Returns taxonomic nodes in JSON.
Map taxid and names remotely
src/ncbi-taxonomist.py map -t 2 -n human -r
Map taxid and names from local database
src/ncbi-taxonomist.py map -t 2 -n human -db taxa.db
Map sequence accessions
src/ncbi-taxonomist.py map -t 2 -n human -db taxa.db
Format specific filed from map output to csv using jq
src/ncbi-taxonomist.py map -t 2 -n human -r \ |
jq -r '[.taxon_id, .names.scientific_name] | @csv
import
Importing stores taxa in a local SQLite database. The taxa are fetched remotely.
Import taxa
src/ncbi-taxonomist.py map -t 2 -n human -r | \
src/ncbi-taxonomist.py import -db testdb.sql
Resolve
Resolves lineages for names and taxids. The result is a JSON array with the taxa defining the lineage in ascending order. This guarnatees the query is the first element in the array.
Further extraction can be done via a script reading JSON arrays line-be-line or
via othe tools, e.g. jq
[REF]
Resolve and format via jq
src/ncbi_taxonomist/ncbi-taxonomist.py resolve -n man -db testdb2.sql | \
jq -r '[.[] | .names.scientific_name ]| @tsv'
Resolve accessions remotely
src/ncbi-taxonomist.py map -a MH842226.1 NQWZ01000003.1 -r | \
src/ncbi-taxonomist.py resolve -m -r
Extract
Extract nodes from a specified superkingdom and subtree WIP
Group WIP
Collect NCBI taxids into a group for later use
Output
JSON is returned because I have no clue what you want to do with the result. This allows to write quick parsers for your needs.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for ncbi-taxonomist-0.3.0.dev128.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | cf06511d9c2edd70e9472d2591e7c0cf383759b0a4307950eb970a09b4d4fbd5 |
|
MD5 | 4a4945b611c59a6a2bad520ac0dc65f3 |
|
BLAKE2b-256 | 1d059c3a31bbeb19ddceaddf8e9556c07271cc0e19965b54263e7ce69f5d7b77 |
Hashes for ncbi_taxonomist-0.3.0.dev128-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | a96553cba3cd82bcaadc2d9fa7ed77c7b049d40fc09e7e6691e29d092fc03735 |
|
MD5 | 2ad6cfbaa4f12f872e7d1a252057b6ed |
|
BLAKE2b-256 | 212d51a7c494a30cad0e94fa0a00f77fc6c70029aff1083771c35541bab05c9b |