Toolbox to manage NCBI taxonomical data

These details have not been verified by PyPI

Project links

Project description

Readme

Status

ncbi-taxonomist is still under development. The basic operations are working and unlikely to change in the near future. Metadata querying is still in development.

Synopsis

ncbi-taxonomist handles and manages phylogenetic data from NCBI. It can:

map between taxids and names
resolve lineages
store obtained taxa and their data locally in a SQLite database
group taxa into user defined groups (locally)

taxonomist has several simple operations, e.g. map or import, which work together using pipes, e.g. to populate a database, map will fetch data from Entrez and print it to STDOUT while import reads the STDIN to populate a local database. Taxonomic information is obtained from Entrez, but a predownloaded database can be imported as well.

The true strength is to find and link related metadata from other Entrez databases, e.g. fetching data for metagenomic data-sets for specific or diverse group of organisms. It can store phylogenetic groups, e.g. all taxa for a specific project, in a local database.

Install

$pip install ncbi-taxonomist --user

This will fetch and install ncbi-taxonomist and its required dependencies (see below) use for an user (no root required)

Dependencies

ncbi-taxonomist has two dependencies:

entrezpy: to handle remote requests to NCBI's Entrez databases
taxnompy: to parse taxonomic XML files from NCBI
- https://gitlab.com/ncbipy/taxonompy.git
- https://pypi.org/project/taxonompy/

These are libraries maintained by myself and rely solely on the Python standard library. Therefore, ncbi-taxonomist is less prone to suffer dependency hell.

Usage

ncbi-taxonomist <command> <options>

Available commands

call without any command or option to get an overview of available commands.

$ ncbi-taxonomist

Command help

To get the usage for a specific command, use:

$ ncbi-taxonomist <command> -h

For example, to see available options for the command map, use:

$ ncbi-taxonomist map -h.

Output

ncbi-taxonist uses JSON as main output because I have no clue what you want to do with the result. JSON allows to use or write quick formatter or viewers and read the data directly from STDIN.

jq is an excellent tool to filter and manipulate JSON data.

An example how to extract attributes from the ncbi-taxonomist map command JSON output:

ncbi-taxonomist map -t 2 -n human -r | # ncbi-taxonomist map step
jq -r '[.taxon.taxon_id,.taxon.rank, ( .taxon.names | to_entries[] | select(.value=="scientific_name").key) ]  |   @csv'
       ^  ^               ^          ^              ^              ^                                        ^  ^   ^
       |  |               |          |            jq pipe        jq pipe                                    |  |   |
       |  +-------+-------+          +----------------------------------------------------------------------+  jq  |
       |  Add taxon_id and                Extract scientific_name from taxon names and add to array         | pipe |
       |  rank attribute to                                                                                 |     jq csv output
       |         array                                                                                      |     from array
       |                                                                                                    |
       `- create array for csv output for each JSON output line---------------------------------------------+

For more jq help, please refer to:

Commands

map

map taxonomic information from Entrez phylogeny server or loading a downloaded NCBI Taxonomy database into a local database. Returns taxonomic nodes in JSON.

Map taxid and names remotely

ncbi-taxonomist map -t 2 -n human -r

Map taxid and names from local database

ncbi-taxonomist map -t 2 -n human -db taxa.db

Map sequence accessions

ncbi-taxonomist map -t 2 -n human -db taxa.db

Format specific filed from map output to csv using `jq`

Extract taxid, rank, and scientific name from map JSON output using jq:

ncbi-taxonomist map -t 2 -n human -r | \
jq -r '[.taxon.taxon_id,.taxon.rank,(.taxon.names|to_entries[]|select(.value=="scientific_name").key)]|@csv'

import

Importing stores taxa in a local SQLite database. The taxa are fetched remotely.

Import taxa

ncbi-taxonomist map -t 2 -n human -r  | ncbi-taxonomist import -db testdb.sql

Resolve

Resolves lineages for names and taxids. The result is a JSON array with the taxa defining the lineage in ascending order. This guarnatees the query is the first element in the array.

Further extraction can be done via a script reading JSON arrays line-be-line or via othe tools, e.g. jq [REF]

Resolve and format via `jq`

ncbi_taxonomist resolve  -n man  -db testdb2.sql |  \
jq -r  '[.[] |  .names.scientific_name ]| @tsv'

Resolve accessions remotely

ncbi-taxonomist map  -a MH842226.1 NQWZ01000003.1 -r | ncbi-taxonomist resolve -m -r

Extract

Extract nodes from a specified superkingdom and subtree WIP

Group WIP

Collect NCBI taxids into a group for later use

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

1.2.1

Nov 14, 2020

1.2.0

Oct 22, 2020

1.1.2

Jun 30, 2020

1.1.1.1

Jun 30, 2020

1.1.0

May 27, 2020

1.0.2

May 25, 2020

1.0.0

May 25, 2020

0.6.3.dev233 pre-release

May 14, 2020

0.6.2.dev231 pre-release

May 14, 2020

This version

0.5.0.dev170 pre-release

Jan 28, 2020

0.3.0.dev128 pre-release

Jan 17, 2020

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ncbi-taxonomist-0.5.0.dev170.tar.gz (23.3 kB view hashes)

Uploaded Jan 28, 2020 Source

Built Distribution

ncbi_taxonomist-0.5.0.dev170-py3-none-any.whl (53.9 kB view hashes)

Uploaded Jan 28, 2020 Python 3

Hashes for ncbi-taxonomist-0.5.0.dev170.tar.gz

Hashes for ncbi-taxonomist-0.5.0.dev170.tar.gz
Algorithm	Hash digest
SHA256	`0231dda62650f7b5a301bc782e8ff23f6729006868ca503d32f793bc0ca93f58`
MD5	`045ce4c95b826cebd61acd1669fc2327`
BLAKE2b-256	`121b5ca896a59de75edf0dc0d2f2e2d8d29283268651bfe13384bf3d32880f7e`

Hashes for ncbi_taxonomist-0.5.0.dev170-py3-none-any.whl

Hashes for ncbi_taxonomist-0.5.0.dev170-py3-none-any.whl
Algorithm	Hash digest
SHA256	`73a59299a2030130e8f49f18146f76c63bc6421a749c1a245f95c2373eeab3b8`
MD5	`aec3ef2660db595193e8d9efaeb1748d`
BLAKE2b-256	`7c381e59749d17cbaaf5b228d564b7b921415ede156801c461f4aaaa7cc90df2`

ncbi-taxonomist 0.5.0.dev170

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Readme

Status

Synopsis

Install

Dependencies

Usage

Available commands

Command help

Output

Commands

map

Map taxid and names remotely

Map taxid and names from local database

Map sequence accessions

Format specific filed from map output to csv using jq

import

Import taxa

Resolve

Resolve and format via jq

Resolve accessions remotely

Extract

Group WIP

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

Format specific filed from map output to csv using `jq`

Resolve and format via `jq`