Skip to main content

Toolbox to manage NCBI taxonomical data

Project description

Readme

Synopsis

ncbi-taxonomist handles and manages phylogenetic data from NCBI's Entrez taxonomy database . It can:

  • fetch information from Entrez' Taxonomy database
  • map between taxids and names
  • resolve lineages for taxid, names, and accessions
  • store obtained taxa and their data locally in a SQLite database
  • group taxa into user defined groups (locally)
  • extract specific ranks or lineages from subtrees

ncbi-taxonomist has several simple operations, e.g. map or import, which work together using pipes, e.g. to populate a database, collect will fetch data from Entrez and print it to STDOUT while import reads the STDIN to populate a local database. Taxonomic information is obtained from Entrez.

Install

$pip install ncbi-taxonomist --user

This will fetch and install ncbi-taxonomist and its required dependencies (see below) use for an user (no root required).

Containers

ncbi-taxonomist is available as Docker and Singularity image including jq to manipulate its JSON output.

Docker

The Docker image is available on its GitLab Docker registry and can be pulled using docker pull registry.gitlab.com/janpb/ncbi-taxonomist/ncbi-taxonomist:latest.

Singularity

The Singularity image is available in the Singularity library and can be pulled using singularity pull library://jpb/ncbi-taxonomist/ncbi-taxonomist.

Dependencies

ncbi-taxonomist has one dependency:

entrezpy is developed and maintained by myself and a collaboration with Prof. Edward C. Holmes at The University of Sydney. It relies solely on the Python standard library. Therefore, ncbi-taxonomist is less prone to suffer dependency hell.

Documentation

The documentation and further examples for ncbi-taxonomist can be found on Read the Docs.

Usage

ncbi-taxonomist <command> <options>

Available commands

call without any command or option to get an overview of available commands.

$ ncbi-taxonomist

Command help

To get the usage for a specific command, use:

$ ncbi-taxonomist <command> -h

For example, to see available options for the command map, use:

$ ncbi-taxonomist map -h

Output

ncbi-taxonist uses line based JSON and XML as output because I have no clue what you want to do with the result. Parsing JSON is simpler than XML (IMHO) and most programming languages have a JSON parser in their standard library. This allows to write quick formatter or viewers reading data directly from STDIN.

XML output os provided for convenience, but cannot be used between piped ncbi-taxonomist commands, only in the last command.

In addition, jq is an excellent tool to filter and manipulate JSON data. An example how to extract attributes from the ncbi-taxonomist map command JSON output can be found in the documentation:

For more jq help, please refer to:

Basic Commands

Examples how to use the basic commands. To get an overview, run ncbi-taxonomist without any argument:

$: ncbi-taxonomist

Use -h to get the usage for each command, e.g.:

$: ncbi-taxonomist map -h

map

map taxonomic information from Entrez phylogeny server or loading a downloaded NCBI Taxonomy database into a local database. Returns taxonomic nodes in JSON.

Map taxid and names

$: ncbi-taxonomist map --taxids 562, 10508 --names 'Homo sapiens', 'black willow'
{"mode":"mapping","query":"black willow","cast":"taxon","taxon":{"taxid":75714,"rank":"species","names":{"Salix nigra":"scientific_name","black willow":"CommonName"},"parentid":40685,"name":"Salix nigra"}}
{"mode":"mapping","query":"Homo sapiens","cast":"taxon","taxon":{"taxid":9606,"rank":"species","names":{"Homo sapiens":"scientific_name","human":"GenbankCommonName","man":"CommonName"},"parentid":9605,"name":"Homo sapiens"}}
{"mode":"mapping","query":"10508","cast":"taxon","taxon":{"taxid":10508,"rank":"family","names":{"Adenoviridae":"scientific_name"},"parentid":2732559,"name":"Adenoviridae"}}
{"mode":"mapping","query":"562","cast":"taxon","taxon":{"taxid":562,"rank":"species","names":{"Escherichia coli":"scientific_name","Bacillus coli":"Synonym","Bacterium coli":"Synonym","Bacterium coli commune":"Synonym","Enterococcus coli":"Synonym","E. coli":"CommonName","Escherichia sp. 3_2_53FAA":"Includes","Escherichia sp. MAR":"Includes","bacterium 10a":"Includes","bacterium E3":"Includes","Escherichia/Shigella coli":"EquivalentName","ATCC 11775":"type material","ATCC:11775":"type material","BCCM/LMG:2092":"type material","CCUG 24":"type material","CCUG 29300":"type material","CCUG:24":"type material","CCUG:29300":"type material","CIP 54.8":"type material","CIP:54.8":"type material","DSM 30083":"type material","DSM:30083":"type material","IAM 12119":"type material","IAM:12119":"type material","JCM 1649":"type material","JCM:1649":"type material","LMG 2092":"type material","LMG:2092":"type material","NBRC 102203":"type material","NBRC:102203":"type material","NCCB 54008":"type material","NCCB:54008":"type material","NCTC 9001":"type material","NCTC:9001":"type material","personal::U5/41":"type material","strain U5/41":"type material"},"parentid":561,"name":"Escherichia coli"}}

ncbi-taxonomist map --xml --taxids 562
<mapping><query cast="taxon">562</query><taxon><taxid>562</taxid><rank>species</rank><name>Escherichia coli</name><parentid>561</parentid><names><name type="scientific_name">Escherichia coli</name><name type="Synonym">Bacillus coli</name><name type="Synonym">Bacterium coli</name><name type="Synonym">Bacterium coli commune</name><name type="Synonym">Enterococcus coli</name><name type="CommonName">E. coli</name><name type="Includes">Escherichia sp. 3_2_53FAA</name><name type="Includes">Escherichia sp. MAR</name><name type="Includes">bacterium 10a</name><name type="Includes">bacterium E3</name><name type="EquivalentName">Escherichia/Shigella coli</name><name type="type material">ATCC 11775</name><name type="type material">ATCC:11775</name><name type="type material">BCCM/LMG:2092</name><name type="type material">CCUG 24</name><name type="type material">CCUG 29300</name><name type="type material">CCUG:24</name><name type="type material">CCUG:29300</name><name type="type material">CIP 54.8</name><name type="type material">CIP:54.8</name><name type="type material">DSM 30083</name><name type="type material">DSM:30083</name><name type="type material">IAM 12119</name><name type="type material">IAM:12119</name><name type="type material">JCM 1649</name><name type="type material">JCM:1649</name><name type="type material">LMG 2092</name><name type="type material">LMG:2092</name><name type="type material">NBRC 102203</name><name type="type material">NBRC:102203</name><name type="type material">NCCB 54008</name><name type="type material">NCCB:54008</name><name type="type material">NCTC 9001</name><name type="type material">NCTC:9001</name><name type="type material">personal::U5/41</name><name type="type material">strain U5/41</name></names></taxon></mapping>

Map accessions

$: ncbi-taxonomist map --accessions MH842226.1 NQWZ01000003 --entrezdb nucleotide
{"mode":"mapping","query":"NQWZ01000003","cast":"accs","accession":{"taxid":45151,"accessions":{"accessionversion":"NQWZ01000003.1","caption":"NQWZ01000003","extra":"gi|1391950314|gb|NQWZ01000003.1||gnl|WGS:NQWZ01|ARCrossB10_scaffold_00003"},"db":"nucleotide","uid":1391950314}}
{"mode":"mapping","query":"MH842226.1","cast":"accs","accession":{"taxid":122929,"accessions":{"accessionversion":"MH842226.1","caption":"MH842226","extra":"gi|1476663987|gb|MH842226.1|"},"db":"nucleotide","uid":1476663987}}

$: ncbi-taxonomist map --xml --accessions AGA95798 --entrezdb protein
<mapping><query cast="accession">AGA95798</query><accession><taxid>10407</taxid><uid>431983379</uid><database>protein</database><accessions><accessionversion>AGA95798.1</accessionversion><caption>AGA95798</caption><extra>gi|431983379|gb|AGA95798.1|</extra></accessions></accession></mapping>

Resolve lineages for names, taxids, and accessions.

Resolve names and taxids remotely:

$: ncbi-taxonomist resolve -t 562, 10508 -n human, 'black willow'
{"mode":"resolve","query":"black willow","cast":"taxon","taxon":{"taxid":75714,"rank":"species","names":{"Salix nigra":"scientific_name","black willow":"CommonName"},"parentid":40685,"name":"Salix nigra"},"lineage":[{"taxid":75714,"rank":"species","names":{"Salix nigra":"scientific_name","black willow":"CommonName"},"parentid":40685,"name":"Salix nigra"},{"taxid":40685,"rank":"genus","names":{"Salix":"scientific_name"},"parentid":238069,"name":"Salix"},{"taxid":238069,"rank":"tribe","names":{"Saliceae":"scientific_name"},"parentid":3688,"name":"Saliceae"},{"taxid":3688,"rank":"family","names":{"Salicaceae":"scientific_name"},"parentid":3646,"name":"Salicaceae"},{"taxid":3646,"rank":"order","names":{"Malpighiales":"scientific_name"},"parentid":91835,"name":"Malpighiales"},{"taxid":91835,"rank":"clade","names":{"fabids":"scientific_name"},"parentid":71275,"name":"fabids"},{"taxid":71275,"rank":"clade","names":{"rosids":"scientific_name"},"parentid":1437201,"name":"rosids"},{"taxid":1437201,"rank":"clade","names":{"Pentapetalae":"scientific_name"},"parentid":91827,"name":"Pentapetalae"},{"taxid":91827,"rank":"clade","names":{"Gunneridae":"scientific_name"},"parentid":71240,"name":"Gunneridae"},{"taxid":71240,"rank":"clade","names":{"eudicotyledons":"scientific_name"},"parentid":1437183,"name":"eudicotyledons"},{"taxid":1437183,"rank":"clade","names":{"Mesangiospermae":"scientific_name"},"parentid":3398,"name":"Mesangiospermae"},{"taxid":3398,"rank":"class","names":{"Magnoliopsida":"scientific_name"},"parentid":58024,"name":"Magnoliopsida"},{"taxid":58024,"rank":"clade","names":{"Spermatophyta":"scientific_name"},"parentid":78536,"name":"Spermatophyta"},{"taxid":78536,"rank":"clade","names":{"Euphyllophyta":"scientific_name"},"parentid":58023,"name":"Euphyllophyta"},{"taxid":58023,"rank":"clade","names":{"Tracheophyta":"scientific_name"},"parentid":3193,"name":"Tracheophyta"},{"taxid":3193,"rank":"clade","names":{"Embryophyta":"scientific_name"},"parentid":131221,"name":"Embryophyta"},{"taxid":131221,"rank":"subphylum","names":{"Streptophytina":"scientific_name"},"parentid":35493,"name":"Streptophytina"},{"taxid":35493,"rank":"phylum","names":{"Streptophyta":"scientific_name"},"parentid":33090,"name":"Streptophyta"},{"taxid":33090,"rank":"kingdom","names":{"Viridiplantae":"scientific_name"},"parentid":2759,"name":"Viridiplantae"},{"taxid":2759,"rank":"superkingdom","names":{"Eukaryota":"scientific_name"},"parentid":131567,"name":"Eukaryota"},{"taxid":131567,"rank":"no rank","names":{"cellular organisms":"scientific_name"},"parentid":null,"name":"cellular organisms"}]}
{"mode":"resolve","query":"human","cast":"taxon","taxon":{"taxid":9606,"rank":"species","names":{"Homo sapiens":"scientific_name","human":"GenbankCommonName","man":"CommonName"},"parentid":9605,"name":"Homo sapiens"},"lineage":[{"taxid":9606,"rank":"species","names":{"Homo sapiens":"scientific_name","human":"GenbankCommonName","man":"CommonName"},"parentid":9605,"name":"Homo sapiens"},{"taxid":9605,"rank":"genus","names":{"Homo":"scientific_name"},"parentid":207598,"name":"Homo"},{"taxid":207598,"rank":"subfamily","names":{"Homininae":"scientific_name"},"parentid":9604,"name":"Homininae"},{"taxid":9604,"rank":"family","names":{"Hominidae":"scientific_name"},"parentid":314295,"name":"Hominidae"},{"taxid":314295,"rank":"superfamily","names":{"Hominoidea":"scientific_name"},"parentid":9526,"name":"Hominoidea"},{"taxid":9526,"rank":"parvorder","names":{"Catarrhini":"scientific_name"},"parentid":314293,"name":"Catarrhini"},{"taxid":314293,"rank":"infraorder","names":{"Simiiformes":"scientific_name"},"parentid":376913,"name":"Simiiformes"},{"taxid":376913,"rank":"suborder","names":{"Haplorrhini":"scientific_name"},"parentid":9443,"name":"Haplorrhini"},{"taxid":9443,"rank":"order","names":{"Primates":"scientific_name"},"parentid":314146,"name":"Primates"},{"taxid":314146,"rank":"superorder","names":{"Euarchontoglires":"scientific_name"},"parentid":1437010,"name":"Euarchontoglires"},{"taxid":1437010,"rank":"clade","names":{"Boreoeutheria":"scientific_name"},"parentid":9347,"name":"Boreoeutheria"},{"taxid":9347,"rank":"clade","names":{"Eutheria":"scientific_name"},"parentid":32525,"name":"Eutheria"},{"taxid":32525,"rank":"clade","names":{"Theria":"scientific_name"},"parentid":40674,"name":"Theria"},{"taxid":40674,"rank":"class","names":{"Mammalia":"scientific_name"},"parentid":32524,"name":"Mammalia"},{"taxid":32524,"rank":"clade","names":{"Amniota":"scientific_name"},"parentid":32523,"name":"Amniota"},{"taxid":32523,"rank":"clade","names":{"Tetrapoda":"scientific_name"},"parentid":1338369,"name":"Tetrapoda"},{"taxid":1338369,"rank":"clade","names":{"Dipnotetrapodomorpha":"scientific_name"},"parentid":8287,"name":"Dipnotetrapodomorpha"},{"taxid":8287,"rank":"superclass","names":{"Sarcopterygii":"scientific_name"},"parentid":117571,"name":"Sarcopterygii"},{"taxid":117571,"rank":"clade","names":{"Euteleostomi":"scientific_name"},"parentid":117570,"name":"Euteleostomi"},{"taxid":117570,"rank":"clade","names":{"Teleostomi":"scientific_name"},"parentid":7776,"name":"Teleostomi"},{"taxid":7776,"rank":"clade","names":{"Gnathostomata":"scientific_name"},"parentid":7742,"name":"Gnathostomata"},{"taxid":7742,"rank":"clade","names":{"Vertebrata":"scientific_name"},"parentid":89593,"name":"Vertebrata"},{"taxid":89593,"rank":"subphylum","names":{"Craniata":"scientific_name"},"parentid":7711,"name":"Craniata"},{"taxid":7711,"rank":"phylum","names":{"Chordata":"scientific_name"},"parentid":33511,"name":"Chordata"},{"taxid":33511,"rank":"clade","names":{"Deuterostomia":"scientific_name"},"parentid":33213,"name":"Deuterostomia"},{"taxid":33213,"rank":"clade","names":{"Bilateria":"scientific_name"},"parentid":6072,"name":"Bilateria"},{"taxid":6072,"rank":"clade","names":{"Eumetazoa":"scientific_name"},"parentid":33208,"name":"Eumetazoa"},{"taxid":33208,"rank":"kingdom","names":{"Metazoa":"scientific_name"},"parentid":33154,"name":"Metazoa"},{"taxid":33154,"rank":"clade","names":{"Opisthokonta":"scientific_name"},"parentid":2759,"name":"Opisthokonta"},{"taxid":2759,"rank":"superkingdom","names":{"Eukaryota":"scientific_name"},"parentid":131567,"name":"Eukaryota"},{"taxid":131567,"rank":"no rank","names":{"cellular organisms":"scientific_name"},"parentid":null,"name":"cellular organisms"}]}
{"mode":"resolve","query":"562","cast":"taxon","taxon":{"taxid":562,"rank":"species","names":{"Escherichia coli":"scientific_name","Bacillus coli":"Synonym","Bacterium coli":"Synonym","Bacterium coli commune":"Synonym","Enterococcus coli":"Synonym","E. coli":"CommonName","Escherichia sp. 3_2_53FAA":"Includes","Escherichia sp. MAR":"Includes","bacterium 10a":"Includes","bacterium E3":"Includes","Escherichia/Shigella coli":"EquivalentName","ATCC 11775":"type material","ATCC:11775":"type material","BCCM/LMG:2092":"type material","CCUG 24":"type material","CCUG 29300":"type material","CCUG:24":"type material","CCUG:29300":"type material","CIP 54.8":"type material","CIP:54.8":"type material","DSM 30083":"type material","DSM:30083":"type material","IAM 12119":"type material","IAM:12119":"type material","JCM 1649":"type material","JCM:1649":"type material","LMG 2092":"type material","LMG:2092":"type material","NBRC 102203":"type material","NBRC:102203":"type material","NCCB 54008":"type material","NCCB:54008":"type material","NCTC 9001":"type material","NCTC:9001":"type material","personal::U5/41":"type material","strain U5/41":"type material"},"parentid":561,"name":"Escherichia coli"},"lineage":[{"taxid":562,"rank":"species","names":{"Escherichia coli":"scientific_name","Bacillus coli":"Synonym","Bacterium coli":"Synonym","Bacterium coli commune":"Synonym","Enterococcus coli":"Synonym","E. coli":"CommonName","Escherichia sp. 3_2_53FAA":"Includes","Escherichia sp. MAR":"Includes","bacterium 10a":"Includes","bacterium E3":"Includes","Escherichia/Shigella coli":"EquivalentName","ATCC 11775":"type material","ATCC:11775":"type material","BCCM/LMG:2092":"type material","CCUG 24":"type material","CCUG 29300":"type material","CCUG:24":"type material","CCUG:29300":"type material","CIP 54.8":"type material","CIP:54.8":"type material","DSM 30083":"type material","DSM:30083":"type material","IAM 12119":"type material","IAM:12119":"type material","JCM 1649":"type material","JCM:1649":"type material","LMG 2092":"type material","LMG:2092":"type material","NBRC 102203":"type material","NBRC:102203":"type material","NCCB 54008":"type material","NCCB:54008":"type material","NCTC 9001":"type material","NCTC:9001":"type material","personal::U5/41":"type material","strain U5/41":"type material"},"parentid":561,"name":"Escherichia coli"},{"taxid":561,"rank":"genus","names":{"Escherichia":"scientific_name"},"parentid":543,"name":"Escherichia"},{"taxid":543,"rank":"family","names":{"Enterobacteriaceae":"scientific_name"},"parentid":91347,"name":"Enterobacteriaceae"},{"taxid":91347,"rank":"order","names":{"Enterobacterales":"scientific_name"},"parentid":1236,"name":"Enterobacterales"},{"taxid":1236,"rank":"class","names":{"Gammaproteobacteria":"scientific_name"},"parentid":1224,"name":"Gammaproteobacteria"},{"taxid":1224,"rank":"phylum","names":{"Proteobacteria":"scientific_name"},"parentid":2,"name":"Proteobacteria"},{"taxid":2,"rank":"superkingdom","names":{"Bacteria":"scientific_name"},"parentid":131567,"name":"Bacteria"},{"taxid":131567,"rank":"no rank","names":{"cellular organisms":"scientific_name"},"parentid":null,"name":"cellular organisms"}]}
{"mode":"resolve","query":"10508","cast":"taxon","taxon":{"taxid":10508,"rank":"family","names":{"Adenoviridae":"scientific_name"},"parentid":2732559,"name":"Adenoviridae"},"lineage":[{"taxid":10508,"rank":"family","names":{"Adenoviridae":"scientific_name"},"parentid":2732559,"name":"Adenoviridae"},{"taxid":2732559,"rank":"order","names":{"Rowavirales":"scientific_name"},"parentid":2732529,"name":"Rowavirales"},{"taxid":2732529,"rank":"class","names":{"Tectiliviricetes":"scientific_name"},"parentid":2732008,"name":"Tectiliviricetes"},{"taxid":2732008,"rank":"phylum","names":{"Preplasmiviricota":"scientific_name"},"parentid":2732005,"name":"Preplasmiviricota"},{"taxid":2732005,"rank":"kingdom","names":{"Bamfordvirae":"scientific_name"},"parentid":2732004,"name":"Bamfordvirae"},{"taxid":2732004,"rank":"clade","names":{"Varidnaviria":"scientific_name"},"parentid":10239,"name":"Varidnaviria"},{"taxid":10239,"rank":"superkingdom","names":{"Viruses":"scientific_name"},"parentid":null,"name":"Viruses"}]}

$: ncbi-taxonomist resolve -x -t 562
<resolve><query value="562" cast="taxid"><taxon><taxid>562</taxid><rank>species</rank><name>Escherichia coli</name><parentid>561</parentid><names><name type="scientific_name">Escherichia coli</name><name type="Synonym">Bacillus coli</name><name type="Synonym">Bacterium coli</name><name type="Synonym">Bacterium coli commune</name><name type="Synonym">Enterococcus coli</name><name type="CommonName">E. coli</name><name type="Includes">Escherichia sp. 3_2_53FAA</name><name type="Includes">Escherichia sp. MAR</name><name type="Includes">bacterium 10a</name><name type="Includes">bacterium E3</name><name type="EquivalentName">Escherichia/Shigella coli</name><name type="type material">ATCC 11775</name><name type="type material">ATCC:11775</name><name type="type material">BCCM/LMG:2092</name><name type="type material">CCUG 24</name><name type="type material">CCUG 29300</name><name type="type material">CCUG:24</name><name type="type material">CCUG:29300</name><name type="type material">CIP 54.8</name><name type="type material">CIP:54.8</name><name type="type material">DSM 30083</name><name type="type material">DSM:30083</name><name type="type material">IAM 12119</name><name type="type material">IAM:12119</name><name type="type material">JCM 1649</name><name type="type material">JCM:1649</name><name type="type material">LMG 2092</name><name type="type material">LMG:2092</name><name type="type material">NBRC 102203</name><name type="type material">NBRC:102203</name><name type="type material">NCCB 54008</name><name type="type material">NCCB:54008</name><name type="type material">NCTC 9001</name><name type="type material">NCTC:9001</name><name type="type material">personal::U5/41</name><name type="type material">strain U5/41</name></names></taxon></query><lineage><taxon><taxid>562</taxid><rank>species</rank><name>Escherichia coli</name><parentid>561</parentid><names><name type="scientific_name">Escherichia coli</name><name type="Synonym">Bacillus coli</name><name type="Synonym">Bacterium coli</name><name type="Synonym">Bacterium coli commune</name><name type="Synonym">Enterococcus coli</name><name type="CommonName">E. coli</name><name type="Includes">Escherichia sp. 3_2_53FAA</name><name type="Includes">Escherichia sp. MAR</name><name type="Includes">bacterium 10a</name><name type="Includes">bacterium E3</name><name type="EquivalentName">Escherichia/Shigella coli</name><name type="type material">ATCC 11775</name><name type="type material">ATCC:11775</name><name type="type material">BCCM/LMG:2092</name><name type="type material">CCUG 24</name><name type="type material">CCUG 29300</name><name type="type material">CCUG:24</name><name type="type material">CCUG:29300</name><name type="type material">CIP 54.8</name><name type="type material">CIP:54.8</name><name type="type material">DSM 30083</name><name type="type material">DSM:30083</name><name type="type material">IAM 12119</name><name type="type material">IAM:12119</name><name type="type material">JCM 1649</name><name type="type material">JCM:1649</name><name type="type material">LMG 2092</name><name type="type material">LMG:2092</name><name type="type material">NBRC 102203</name><name type="type material">NBRC:102203</name><name type="type material">NCCB 54008</name><name type="type material">NCCB:54008</name><name type="type material">NCTC 9001</name><name type="type material">NCTC:9001</name><name type="type material">personal::U5/41</name><name type="type material">strain U5/41</name></names></taxon><taxon><taxid>561</taxid><rank>genus</rank><name>Escherichia</name><parentid>543</parentid><names><name type="scientific_name">Escherichia</name></names></taxon><taxon><taxid>543</taxid><rank>family</rank><name>Enterobacteriaceae</name><parentid>91347</parentid><names><name type="scientific_name">Enterobacteriaceae</name></names></taxon><taxon><taxid>91347</taxid><rank>order</rank><name>Enterobacterales</name><parentid>1236</parentid><names><name type="scientific_name">Enterobacterales</name></names></taxon><taxon><taxid>1236</taxid><rank>class</rank><name>Gammaproteobacteria</name><parentid>1224</parentid><names><name type="scientific_name">Gammaproteobacteria</name></names></taxon><taxon><taxid>1224</taxid><rank>phylum</rank><name>Proteobacteria</name><parentid>2</parentid><names><name type="scientific_name">Proteobacteria</name></names></taxon><taxon><taxid>2</taxid><rank>superkingdom</rank><name>Bacteria</name><parentid>131567</parentid><names><name type="scientific_name">Bacteria</name></names></taxon><taxon><taxid>131567</taxid><rank>no rank</rank><name>cellular organisms</name><parentid>None</parentid><names><name type="scientific_name">cellular organisms</name></names></taxon></lineage></resolve>

Resolve accessions

Accessions have to be mapped prior to resolve.

$: ncbi-taxonomist map  -a MH842226.1 NQWZ01000003.1 | ncbi-taxonomist resolve --mapping
{"mode":"resolve","query":"MH842226.1","cast":"accs","accs":{"taxid":122929,"accessions":{"accessionversion":"MH842226.1","caption":"MH842226","extra":"gi|1476663987|gb|MH842226.1|"},"db":"nucleotide","uid":1476663987},"lineage":[{"taxid":122929,"rank":"clade","names":{"Norovirus GII":"scientific_name","Norovirus genogroup 2":"EquivalentName","Norovirus genogroup II":"EquivalentName","Norwalk-like virus genogroup 2":"EquivalentName","Norwalk-like viruses genogroup 2":"EquivalentName","human calicivirus genogroup 2":"EquivalentName"},"parentid":11983,"name":"Norovirus GII"},{"taxid":11983,"rank":"species","names":{"Norwalk virus":"scientific_name"},"parentid":142786,"name":"Norwalk virus"},{"taxid":142786,"rank":"genus","names":{"Norovirus":"scientific_name"},"parentid":11974,"name":"Norovirus"},{"taxid":11974,"rank":"family","names":{"Caliciviridae":"scientific_name"},"parentid":464095,"name":"Caliciviridae"},{"taxid":464095,"rank":"order","names":{"Picornavirales":"scientific_name"},"parentid":2732506,"name":"Picornavirales"},{"taxid":2732506,"rank":"class","names":{"Pisoniviricetes":"scientific_name"},"parentid":2732408,"name":"Pisoniviricetes"},{"taxid":2732408,"rank":"phylum","names":{"Pisuviricota":"scientific_name"},"parentid":2732396,"name":"Pisuviricota"},{"taxid":2732396,"rank":"kingdom","names":{"Orthornavirae":"scientific_name"},"parentid":2559587,"name":"Orthornavirae"},{"taxid":2559587,"rank":"clade","names":{"Riboviria":"scientific_name"},"parentid":10239,"name":"Riboviria"},{"taxid":10239,"rank":"superkingdom","names":{"Viruses":"scientific_name"},"parentid":null,"name":"Viruses"}]}
{"mode":"resolve","query":"NQWZ01000003.1","cast":"accs","accs":{"taxid":45151,"accessions":{"accessionversion":"NQWZ01000003.1","caption":"NQWZ01000003","extra":"gi|1391950314|gb|NQWZ01000003.1||gnl|WGS:NQWZ01|ARCrossB10_scaffold_00003"},"db":"nucleotide","uid":1391950314},"lineage":[{"taxid":45151,"rank":"species","names":{"Pyrenophora tritici-repentis":"scientific_name","Drechslera tritici-repentis":"Synonym","Pyrenophora triticirepentis":"Synonym","Pyrenophora sp. CBS 259.59":"Includes","Pyrenophora sp. MUCL 18687":"Includes"},"parentid":5027,"name":"Pyrenophora tritici-repentis"},{"taxid":5027,"rank":"genus","names":{"Pyrenophora":"scientific_name"},"parentid":28556,"name":"Pyrenophora"},{"taxid":28556,"rank":"family","names":{"Pleosporaceae":"scientific_name"},"parentid":715340,"name":"Pleosporaceae"},{"taxid":715340,"rank":"suborder","names":{"Pleosporineae":"scientific_name"},"parentid":92860,"name":"Pleosporineae"},{"taxid":92860,"rank":"order","names":{"Pleosporales":"scientific_name"},"parentid":451868,"name":"Pleosporales"},{"taxid":451868,"rank":"subclass","names":{"Pleosporomycetidae":"scientific_name"},"parentid":147541,"name":"Pleosporomycetidae"},{"taxid":147541,"rank":"class","names":{"Dothideomycetes":"scientific_name"},"parentid":715962,"name":"Dothideomycetes"},{"taxid":715962,"rank":"clade","names":{"dothideomyceta":"scientific_name"},"parentid":716546,"name":"dothideomyceta"},{"taxid":716546,"rank":"clade","names":{"leotiomyceta":"scientific_name"},"parentid":147538,"name":"leotiomyceta"},{"taxid":147538,"rank":"subphylum","names":{"Pezizomycotina":"scientific_name"},"parentid":716545,"name":"Pezizomycotina"},{"taxid":716545,"rank":"clade","names":{"saccharomyceta":"scientific_name"},"parentid":4890,"name":"saccharomyceta"},{"taxid":4890,"rank":"phylum","names":{"Ascomycota":"scientific_name"},"parentid":451864,"name":"Ascomycota"},{"taxid":451864,"rank":"subkingdom","names":{"Dikarya":"scientific_name"},"parentid":4751,"name":"Dikarya"},{"taxid":4751,"rank":"kingdom","names":{"Fungi":"scientific_name"},"parentid":33154,"name":"Fungi"},{"taxid":33154,"rank":"clade","names":{"Opisthokonta":"scientific_name"},"parentid":2759,"name":"Opisthokonta"},{"taxid":2759,"rank":"superkingdom","names":{"Eukaryota":"scientific_name"},"parentid":131567,"name":"Eukaryota"},{"taxid":131567,"rank":"no rank","names":{"cellular organisms":"scientific_name"},"parentid":null,"name":"cellular organisms"}]}

$: ncbi-taxonomist map  -a MH842226.1 NQWZ01000003.1 | ncbi-taxonomist resolve -x -m
<resolve><query value="MH842226.1" cast="accession"><accession><taxid>122929</taxid><uid>1476663987</uid><database>nucleotide</database><accessions><accessionversion>MH842226.1</accessionversion><caption>MH842226</caption><extra>gi|1476663987|gb|MH842226.1|</extra></accessions></accession></query><lineage><taxon><taxid>122929</taxid><rank>clade</rank><name>Norovirus GII</name><parentid>11983</parentid><names><name type="scientific_name">Norovirus GII</name><name type="EquivalentName">Norovirus genogroup 2</name><name type="EquivalentName">Norovirus genogroup II</name><name type="EquivalentName">Norwalk-like virus genogroup 2</name><name type="EquivalentName">Norwalk-like viruses genogroup 2</name><name type="EquivalentName">human calicivirus genogroup 2</name></names></taxon><taxon><taxid>11983</taxid><rank>species</rank><name>Norwalk virus</name><parentid>142786</parentid><names><name type="scientific_name">Norwalk virus</name></names></taxon><taxon><taxid>142786</taxid><rank>genus</rank><name>Norovirus</name><parentid>11974</parentid><names><name type="scientific_name">Norovirus</name></names></taxon><taxon><taxid>11974</taxid><rank>family</rank><name>Caliciviridae</name><parentid>464095</parentid><names><name type="scientific_name">Caliciviridae</name></names></taxon><taxon><taxid>464095</taxid><rank>order</rank><name>Picornavirales</name><parentid>2732506</parentid><names><name type="scientific_name">Picornavirales</name></names></taxon><taxon><taxid>2732506</taxid><rank>class</rank><name>Pisoniviricetes</name><parentid>2732408</parentid><names><name type="scientific_name">Pisoniviricetes</name></names></taxon><taxon><taxid>2732408</taxid><rank>phylum</rank><name>Pisuviricota</name><parentid>2732396</parentid><names><name type="scientific_name">Pisuviricota</name></names></taxon><taxon><taxid>2732396</taxid><rank>kingdom</rank><name>Orthornavirae</name><parentid>2559587</parentid><names><name type="scientific_name">Orthornavirae</name></names></taxon><taxon><taxid>2559587</taxid><rank>clade</rank><name>Riboviria</name><parentid>10239</parentid><names><name type="scientific_name">Riboviria</name></names></taxon><taxon><taxid>10239</taxid><rank>superkingdom</rank><name>Viruses</name><parentid>None</parentid><names><name type="scientific_name">Viruses</name></names></taxon></lineage></resolve>
<resolve><query value="NQWZ01000003.1" cast="accession"><accession><taxid>45151</taxid><uid>1391950314</uid><database>nucleotide</database><accessions><accessionversion>NQWZ01000003.1</accessionversion><caption>NQWZ01000003</caption><extra>gi|1391950314|gb|NQWZ01000003.1||gnl|WGS:NQWZ01|ARCrossB10_scaffold_00003</extra></accessions></accession></query><lineage><taxon><taxid>45151</taxid><rank>species</rank><name>Pyrenophora tritici-repentis</name><parentid>5027</parentid><names><name type="scientific_name">Pyrenophora tritici-repentis</name><name type="Synonym">Drechslera tritici-repentis</name><name type="Synonym">Pyrenophora triticirepentis</name><name type="Includes">Pyrenophora sp. CBS 259.59</name><name type="Includes">Pyrenophora sp. MUCL 18687</name></names></taxon><taxon><taxid>5027</taxid><rank>genus</rank><name>Pyrenophora</name><parentid>28556</parentid><names><name type="scientific_name">Pyrenophora</name></names></taxon><taxon><taxid>28556</taxid><rank>family</rank><name>Pleosporaceae</name><parentid>715340</parentid><names><name type="scientific_name">Pleosporaceae</name></names></taxon><taxon><taxid>715340</taxid><rank>suborder</rank><name>Pleosporineae</name><parentid>92860</parentid><names><name type="scientific_name">Pleosporineae</name></names></taxon><taxon><taxid>92860</taxid><rank>order</rank><name>Pleosporales</name><parentid>451868</parentid><names><name type="scientific_name">Pleosporales</name></names></taxon><taxon><taxid>451868</taxid><rank>subclass</rank><name>Pleosporomycetidae</name><parentid>147541</parentid><names><name type="scientific_name">Pleosporomycetidae</name></names></taxon><taxon><taxid>147541</taxid><rank>class</rank><name>Dothideomycetes</name><parentid>715962</parentid><names><name type="scientific_name">Dothideomycetes</name></names></taxon><taxon><taxid>715962</taxid><rank>clade</rank><name>dothideomyceta</name><parentid>716546</parentid><names><name type="scientific_name">dothideomyceta</name></names></taxon><taxon><taxid>716546</taxid><rank>clade</rank><name>leotiomyceta</name><parentid>147538</parentid><names><name type="scientific_name">leotiomyceta</name></names></taxon><taxon><taxid>147538</taxid><rank>subphylum</rank><name>Pezizomycotina</name><parentid>716545</parentid><names><name type="scientific_name">Pezizomycotina</name></names></taxon><taxon><taxid>716545</taxid><rank>clade</rank><name>saccharomyceta</name><parentid>4890</parentid><names><name type="scientific_name">saccharomyceta</name></names></taxon><taxon><taxid>4890</taxid><rank>phylum</rank><name>Ascomycota</name><parentid>451864</parentid><names><name type="scientific_name">Ascomycota</name></names></taxon><taxon><taxid>451864</taxid><rank>subkingdom</rank><name>Dikarya</name><parentid>4751</parentid><names><name type="scientific_name">Dikarya</name></names></taxon><taxon><taxid>4751</taxid><rank>kingdom</rank><name>Fungi</name><parentid>33154</parentid><names><name type="scientific_name">Fungi</name></names></taxon><taxon><taxid>33154</taxid><rank>clade</rank><name>Opisthokonta</name><parentid>2759</parentid><names><name type="scientific_name">Opisthokonta</name></names></taxon><taxon><taxid>2759</taxid><rank>superkingdom</rank><name>Eukaryota</name><parentid>131567</parentid><names><name type="scientific_name">Eukaryota</name></names></taxon><taxon><taxid>131567</taxid><rank>no rank</rank><name>cellular organisms</name><parentid>None</parentid><names><name type="scientific_name">cellular organisms</name></names></taxon></lineage></resolve>

Collect taxa remotely

Collect taxid and names remotely, mainly to collect taxonomic data to store in local database. Does not work on local database.

Collect taxid and names remotely

$ ncbi-taxonomist collect -t 562,10508  -n man, 'Influenza B virus (B/Acre/121609/2012)'
{"taxid":10239,"rank":"superkingdom","names":{"Viruses":"scientific_name"},"parentid":null,"name":"Viruses"}
{"taxid":2559587,"rank":"clade","names":{"Riboviria":"scientific_name"},"parentid":10239,"name":"Riboviria"}
{"taxid":2732396,"rank":"kingdom","names":{"Orthornavirae":"scientific_name"},"parentid":2559587,"name":"Orthornavirae"}
{"taxid":2497569,"rank":"phylum","names":{"Negarnaviricota":"scientific_name"},"parentid":2732396,"name":"Negarnaviricota"}
{"taxid":2497571,"rank":"subphylum","names":{"Polyploviricotina":"scientific_name"},"parentid":2497569,"name":"Polyploviricotina"}
{"taxid":2497577,"rank":"class","names":{"Insthoviricetes":"scientific_name"},"parentid":2497571,"name":"Insthoviricetes"}
{"taxid":2499411,"rank":"order","names":{"Articulavirales":"scientific_name"},"parentid":2497577,"name":"Articulavirales"}
{"taxid":11308,"rank":"family","names":{"Orthomyxoviridae":"scientific_name"},"parentid":2499411,"name":"Orthomyxoviridae"}
{"taxid":197912,"rank":"genus","names":{"Betainfluenzavirus":"scientific_name"},"parentid":11308,"name":"Betainfluenzavirus"}
{"taxid":11520,"rank":"species","names":{"Influenza B virus":"scientific_name"},"parentid":197912,"name":"Influenza B virus"}
#cut

Piping ncbi-taxonomist

An example showing the individual steps and outputs to collect remote taxonomies, store them in local database, and resolve the lineages. The last point shows how to use pipes to create e small taxonomic pipeline.

1. Collect taxa information remotely and import into local database taxa.db

$ ncbi-taxonomist collect -t 562,10508 -n man, 'Influenza B virus (B/Acre/121609/2012)' | ncbi-taxonomist import --database taxa.db
{"taxid":10239,"rank":"superkingdom","names":{"Viruses":"scientific_name"},"parentid":null,"name":"Viruses"}
{"taxid":2559587,"rank":"clade","names":{"Riboviria":"scientific_name"},"parentid":10239,"name":"Riboviria"}
{"taxid":2732396,"rank":"kingdom","names":{"Orthornavirae":"scientific_name"},"parentid":2559587,"name":"Orthornavirae"}
{"taxid":2497569,"rank":"phylum","names":{"Negarnaviricota":"scientific_name"},"parentid":2732396,"name":"Negarnaviricota"}
{"taxid":2497571,"rank":"subphylum","names":{"Polyploviricotina":"scientific_name"},"parentid":2497569,"name":"Polyploviricotina"}
{"taxid":2497577,"rank":"class","names":{"Insthoviricetes":"scientific_name"},"parentid":2497571,"name":"Insthoviricetes"}
#cut
2. Test local database:

$ sqlite3 -line taxa.db 'SELECT * FROM taxa;SELECT * FROM names;

3. Resolve lineages from local database:
$ ncbi-taxonomist resolve -t 562,10508  -n man, 'Influenza B virus' -db taxa.db

{"mode":"resolve","query":"Influenza B virus","cast":"taxon","taxon":{"taxid":11520,"rank":"species","names":{"Influenza B virus":"scientific_name"},"parentid":197912,"name":"Influenza B virus"},"lineage":[{"taxid":11520,"rank":"species","names":{"Influenza B virus":"scientific_name"},"parentid":197912,"name":"Influenza B virus"},{"taxid":197912,"rank":"genus","names":{"Betainfluenzavirus":"scientific_name"},"parentid":11308,"name":"Betainfluenzavirus"},{"taxid":11308,"rank":"family","names":{"Orthomyxoviridae":"scientific_name"},"parentid":2499411,"name":"Orthomyxoviridae"},{"taxid":2499411,"rank":"order","names":{"Articulavirales":"scientific_name"},"parentid":2497577,"name":"Articulavirales"},{"taxid":2497577,"rank":"class","names":{"Insthoviricetes":"scientific_name"},"parentid":2497571,"name":"Insthoviricetes"},{"taxid":2497571,"rank":"subphylum","names":{"Polyploviricotina":"scientific_name"},"parentid":2497569,"name":"Polyploviricotina"},{"taxid":2497569,"rank":"phylum","names":{"Negarnaviricota":"scientific_name"},"parentid":2732396,"name":"Negarnaviricota"},{"taxid":2732396,"rank":"kingdom","names":{"Orthornavirae":"scientific_name"},"parentid":2559587,"name":"Orthornavirae"},{"taxid":2559587,"rank":"clade","names":{"Riboviria":"scientific_name"},"parentid":10239,"name":"Riboviria"},{"taxid":10239,"rank":"superkingdom","names":{"Viruses":"scientific_name"},"parentid":null,"name":"Viruses"}]}
{"mode":"resolve","query":"man","cast":"taxon","taxon":{"taxid":9606,"rank":"species","names":{"Homo sapiens":"scientific_name","human":"GenbankCommonName","man":"CommonName"},"parentid":9605,"name":"Homo sapiens"},"lineage":[{"taxid":9606,"rank":"species","names":{"Homo sapiens":"scientific_name","human":"GenbankCommonName","man":"CommonName"},"parentid":9605,"name":"Homo sapiens"},{"taxid":9605,"rank":"genus","names":{"Homo":"scientific_name"},"parentid":207598,"name":"Homo"},{"taxid":207598,"rank":"subfamily","names":{"Homininae":"scientific_name"},"parentid":9604,"name":"Homininae"},{"taxid":9604,"rank":"family","names":{"Hominidae":"scientific_name"},"parentid":314295,"name":"Hominidae"},{"taxid":314295,"rank":"superfamily","names":{"Hominoidea":"scientific_name"},"parentid":9526,"name":"Hominoidea"},{"taxid":9526,"rank":"parvorder","names":{"Catarrhini":"scientific_name"},"parentid":314293,"name":"Catarrhini"},{"taxid":314293,"rank":"infraorder","names":{"Simiiformes":"scientific_name"},"parentid":376913,"name":"Simiiformes"},{"taxid":376913,"rank":"suborder","names":{"Haplorrhini":"scientific_name"},"parentid":9443,"name":"Haplorrhini"},{"taxid":9443,"rank":"order","names":{"Primates":"scientific_name"},"parentid":314146,"name":"Primates"},{"taxid":314146,"rank":"superorder","names":{"Euarchontoglires":"scientific_name"},"parentid":1437010,"name":"Euarchontoglires"},{"taxid":1437010,"rank":"clade","names":{"Boreoeutheria":"scientific_name"},"parentid":9347,"name":"Boreoeutheria"},{"taxid":9347,"rank":"clade","names":{"Eutheria":"scientific_name"},"parentid":32525,"name":"Eutheria"},{"taxid":32525,"rank":"clade","names":{"Theria":"scientific_name"},"parentid":40674,"name":"Theria"},{"taxid":40674,"rank":"class","names":{"Mammalia":"scientific_name"},"parentid":32524,"name":"Mammalia"},{"taxid":32524,"rank":"clade","names":{"Amniota":"scientific_name"},"parentid":32523,"name":"Amniota"},{"taxid":32523,"rank":"clade","names":{"Tetrapoda":"scientific_name"},"parentid":1338369,"name":"Tetrapoda"},{"taxid":1338369,"rank":"clade","names":{"Dipnotetrapodomorpha":"scientific_name"},"parentid":8287,"name":"Dipnotetrapodomorpha"},{"taxid":8287,"rank":"superclass","names":{"Sarcopterygii":"scientific_name"},"parentid":117571,"name":"Sarcopterygii"},{"taxid":117571,"rank":"clade","names":{"Euteleostomi":"scientific_name"},"parentid":117570,"name":"Euteleostomi"},{"taxid":117570,"rank":"clade","names":{"Teleostomi":"scientific_name"},"parentid":7776,"name":"Teleostomi"},{"taxid":7776,"rank":"clade","names":{"Gnathostomata":"scientific_name"},"parentid":7742,"name":"Gnathostomata"},{"taxid":7742,"rank":"clade","names":{"Vertebrata":"scientific_name"},"parentid":89593,"name":"Vertebrata"},{"taxid":89593,"rank":"subphylum","names":{"Craniata":"scientific_name"},"parentid":7711,"name":"Craniata"},{"taxid":7711,"rank":"phylum","names":{"Chordata":"scientific_name"},"parentid":33511,"name":"Chordata"},{"taxid":33511,"rank":"clade","names":{"Deuterostomia":"scientific_name"},"parentid":33213,"name":"Deuterostomia"},{"taxid":33213,"rank":"clade","names":{"Bilateria":"scientific_name"},"parentid":6072,"name":"Bilateria"},{"taxid":6072,"rank":"clade","names":{"Eumetazoa":"scientific_name"},"parentid":33208,"name":"Eumetazoa"},{"taxid":33208,"rank":"kingdom","names":{"Metazoa":"scientific_name"},"parentid":33154,"name":"Metazoa"},{"taxid":33154,"rank":"clade","names":{"Opisthokonta":"scientific_name"},"parentid":2759,"name":"Opisthokonta"},{"taxid":2759,"rank":"superkingdom","names":{"Eukaryota":"scientific_name"},"parentid":131567,"name":"Eukaryota"},{"taxid":131567,"rank":"no rank","names":{"cellular organisms":"scientific_name"},"parentid":null,"name":"cellular organisms"}]}
{"mode":"resolve","query":"562","cast":"taxon","taxon":{"taxid":562,"rank":"species","names":{"Escherichia coli":"scientific_name","Bacillus coli":"Synonym","Bacterium coli":"Synonym","Bacterium coli commune":"Synonym","Enterococcus coli":"Synonym","E. coli":"CommonName","Escherichia sp. 3_2_53FAA":"Includes","Escherichia sp. MAR":"Includes","bacterium 10a":"Includes","bacterium E3":"Includes","Escherichia/Shigella coli":"EquivalentName","ATCC 11775":"type material","ATCC:11775":"type material","BCCM/LMG:2092":"type material","CCUG 24":"type material","CCUG 29300":"type material","CCUG:24":"type material","CCUG:29300":"type material","CIP 54.8":"type material","CIP:54.8":"type material","DSM 30083":"type material","DSM:30083":"type material","IAM 12119":"type material","IAM:12119":"type material","JCM 1649":"type material","JCM:1649":"type material","LMG 2092":"type material","LMG:2092":"type material","NBRC 102203":"type material","NBRC:102203":"type material","NCCB 54008":"type material","NCCB:54008":"type material","NCTC 9001":"type material","NCTC:9001":"type material","personal::U5/41":"type material","strain U5/41":"type material"},"parentid":561,"name":"Escherichia coli"},"lineage":[{"taxid":562,"rank":"species","names":{"Escherichia coli":"scientific_name","Bacillus coli":"Synonym","Bacterium coli":"Synonym","Bacterium coli commune":"Synonym","Enterococcus coli":"Synonym","E. coli":"CommonName","Escherichia sp. 3_2_53FAA":"Includes","Escherichia sp. MAR":"Includes","bacterium 10a":"Includes","bacterium E3":"Includes","Escherichia/Shigella coli":"EquivalentName","ATCC 11775":"type material","ATCC:11775":"type material","BCCM/LMG:2092":"type material","CCUG 24":"type material","CCUG 29300":"type material","CCUG:24":"type material","CCUG:29300":"type material","CIP 54.8":"type material","CIP:54.8":"type material","DSM 30083":"type material","DSM:30083":"type material","IAM 12119":"type material","IAM:12119":"type material","JCM 1649":"type material","JCM:1649":"type material","LMG 2092":"type material","LMG:2092":"type material","NBRC 102203":"type material","NBRC:102203":"type material","NCCB 54008":"type material","NCCB:54008":"type material","NCTC 9001":"type material","NCTC:9001":"type material","personal::U5/41":"type material","strain U5/41":"type material"},"parentid":561,"name":"Escherichia coli"},{"taxid":561,"rank":"genus","names":{"Escherichia":"scientific_name"},"parentid":543,"name":"Escherichia"},{"taxid":543,"rank":"family","names":{"Enterobacteriaceae":"scientific_name"},"parentid":91347,"name":"Enterobacteriaceae"},{"taxid":91347,"rank":"order","names":{"Enterobacterales":"scientific_name"},"parentid":1236,"name":"Enterobacterales"},{"taxid":1236,"rank":"class","names":{"Gammaproteobacteria":"scientific_name"},"parentid":1224,"name":"Gammaproteobacteria"},{"taxid":1224,"rank":"phylum","names":{"Proteobacteria":"scientific_name"},"parentid":2,"name":"Proteobacteria"},{"taxid":2,"rank":"superkingdom","names":{"Bacteria":"scientific_name"},"parentid":131567,"name":"Bacteria"},{"taxid":131567,"rank":"no rank","names":{"cellular organisms":"scientific_name"},"parentid":null,"name":"cellular organisms"}]}
{"mode":"resolve","query":"10508","cast":"taxon","taxon":{"taxid":10508,"rank":"family","names":{"Adenoviridae":"scientific_name"},"parentid":2732559,"name":"Adenoviridae"},"lineage":[{"taxid":10508,"rank":"family","names":{"Adenoviridae":"scientific_name"},"parentid":2732559,"name":"Adenoviridae"},{"taxid":2732559,"rank":"order","names":{"Rowavirales":"scientific_name"},"parentid":2732529,"name":"Rowavirales"},{"taxid":2732529,"rank":"class","names":{"Tectiliviricetes":"scientific_name"},"parentid":2732008,"name":"Tectiliviricetes"},{"taxid":2732008,"rank":"phylum","names":{"Preplasmiviricota":"scientific_name"},"parentid":2732005,"name":"Preplasmiviricota"},{"taxid":2732005,"rank":"kingdom","names":{"Bamfordvirae":"scientific_name"},"parentid":2732004,"name":"Bamfordvirae"},{"taxid":2732004,"rank":"clade","names":{"Varidnaviria":"scientific_name"},"parentid":10239,"name":"Varidnaviria"},{"taxid":10239,"rank":"superkingdom","names":{"Viruses":"scientific_name"},"parentid":null,"name":"Viruses"}]}
4. All in one go and report in XML
$ ncbi-taxonomist collect -t 562,10508  -n man, 'Influenza B virus (B/Acre/121609/2012)' |  ncbi-taxonomist import -db taxa.db  | ncbi-taxonomist resolve -x
<resolve><query value="1334390" cast="taxid"><taxon><taxid>1334390</taxid><rank>no rank</rank><name>Influenza B virus (B/Acre/121609/2012)</name><parentid>11520</parentid><names><name type="scientific_name">Influenza B virus (B/Acre/121609/2012)</name></names></taxon></query><lineage><taxon><taxid>1334390</taxid><rank>no rank</rank><name>Influenza B virus (B/Acre/121609/2012)</name><parentid>11520</parentid><names><name type="scientific_name">Influenza B virus (B/Acre/121609/2012)</name></names></taxon><taxon><taxid>11520</taxid><rank>species</rank><name>Influenza B virus</name><parentid>197912</parentid><names><name type="scientific_name">Influenza B virus</name></names></taxon><taxon><taxid>197912</taxid><rank>genus</rank><name>Betainfluenzavirus</name><parentid>11308</parentid><names><name type="scientific_name">Betainfluenzavirus</name></names></taxon><taxon><taxid>11308</taxid><rank>family</rank><name>Orthomyxoviridae</name><parentid>2499411</parentid><names><name type="scientific_name">Orthomyxoviridae</name></names></taxon><taxon><taxid>2499411</taxid><rank>order</rank><name>Articulavirales</name><parentid>2497577</parentid><names><name type="scientific_name">Articulavirales</name></names></taxon><taxon><taxid>2497577</taxid><rank>class</rank><name>Insthoviricetes</name><parentid>2497571</parentid><names><name type="scientific_name">Insthoviricetes</name></names></taxon><taxon><taxid>2497571</taxid><rank>subphylum</rank><name>Polyploviricotina</name><parentid>2497569</parentid><names><name type="scientific_name">Polyploviricotina</name></names></taxon><taxon><taxid>2497569</taxid><rank>phylum</rank><name>Negarnaviricota</name><parentid>2732396</parentid><names><name type="scientific_name">Negarnaviricota</name></names></taxon><taxon><taxid>2732396</taxid><rank>kingdom</rank><name>Orthornavirae</name><parentid>2559587</parentid><names><name type="scientific_name">Orthornavirae</name></names></taxon><taxon><taxid>2559587</taxid><rank>clade</rank><name>Riboviria</name><parentid>10239</parentid><names><name type="scientific_name">Riboviria</name></names></taxon><taxon><taxid>10239</taxid><rank>superkingdom</rank><name>Viruses</name><parentid>None</parentid><names><name type="scientific_name">Viruses</name></names></taxon></lineage></resolve>
<resolve><query value="9606" cast="taxid"><taxon><taxid>9606</taxid><rank>species</rank><name>Homo sapiens</name><parentid>9605</parentid><names><name type="scientific_name">Homo sapiens</name><name type="GenbankCommonName">human</name><name type="CommonName">man</name></names></taxon></query><lineage><taxon><taxid>9606</taxid><rank>species</rank><name>Homo sapiens</name><parentid>9605</parentid><names><name type="scientific_name">Homo sapiens</name><name type="GenbankCommonName">human</name><name type="CommonName">man</name></names></taxon><taxon><taxid>9605</taxid><rank>genus</rank><name>Homo</name><parentid>207598</parentid><names><name type="scientific_name">Homo</name></names></taxon><taxon><taxid>207598</taxid><rank>subfamily</rank><name>Homininae</name><parentid>9604</parentid><names><name type="scientific_name">Homininae</name></names></taxon><taxon><taxid>9604</taxid><rank>family</rank><name>Hominidae</name><parentid>314295</parentid><names><name type="scientific_name">Hominidae</name></names></taxon><taxon><taxid>314295</taxid><rank>superfamily</rank><name>Hominoidea</name><parentid>9526</parentid><names><name type="scientific_name">Hominoidea</name></names></taxon><taxon><taxid>9526</taxid><rank>parvorder</rank><name>Catarrhini</name><parentid>314293</parentid><names><name type="scientific_name">Catarrhini</name></names></taxon><taxon><taxid>314293</taxid><rank>infraorder</rank><name>Simiiformes</name><parentid>376913</parentid><names><name type="scientific_name">Simiiformes</name></names></taxon><taxon><taxid>376913</taxid><rank>suborder</rank><name>Haplorrhini</name><parentid>9443</parentid><names><name type="scientific_name">Haplorrhini</name></names></taxon><taxon><taxid>9443</taxid><rank>order</rank><name>Primates</name><parentid>314146</parentid><names><name type="scientific_name">Primates</name></names></taxon><taxon><taxid>314146</taxid><rank>superorder</rank><name>Euarchontoglires</name><parentid>1437010</parentid><names><name type="scientific_name">Euarchontoglires</name></names></taxon><taxon><taxid>1437010</taxid><rank>clade</rank><name>Boreoeutheria</name><parentid>9347</parentid><names><name type="scientific_name">Boreoeutheria</name></names></taxon><taxon><taxid>9347</taxid><rank>clade</rank><name>Eutheria</name><parentid>32525</parentid><names><name type="scientific_name">Eutheria</name></names></taxon><taxon><taxid>32525</taxid><rank>clade</rank><name>Theria</name><parentid>40674</parentid><names><name type="scientific_name">Theria</name></names></taxon><taxon><taxid>40674</taxid><rank>class</rank><name>Mammalia</name><parentid>32524</parentid><names><name type="scientific_name">Mammalia</name></names></taxon><taxon><taxid>32524</taxid><rank>clade</rank><name>Amniota</name><parentid>32523</parentid><names><name type="scientific_name">Amniota</name></names></taxon><taxon><taxid>32523</taxid><rank>clade</rank><name>Tetrapoda</name><parentid>1338369</parentid><names><name type="scientific_name">Tetrapoda</name></names></taxon><taxon><taxid>1338369</taxid><rank>clade</rank><name>Dipnotetrapodomorpha</name><parentid>8287</parentid><names><name type="scientific_name">Dipnotetrapodomorpha</name></names></taxon><taxon><taxid>8287</taxid><rank>superclass</rank><name>Sarcopterygii</name><parentid>117571</parentid><names><name type="scientific_name">Sarcopterygii</name></names></taxon><taxon><taxid>117571</taxid><rank>clade</rank><name>Euteleostomi</name><parentid>117570</parentid><names><name type="scientific_name">Euteleostomi</name></names></taxon><taxon><taxid>117570</taxid><rank>clade</rank><name>Teleostomi</name><parentid>7776</parentid><names><name type="scientific_name">Teleostomi</name></names></taxon><taxon><taxid>7776</taxid><rank>clade</rank><name>Gnathostomata</name><parentid>7742</parentid><names><name type="scientific_name">Gnathostomata</name></names></taxon><taxon><taxid>7742</taxid><rank>clade</rank><name>Vertebrata</name><parentid>89593</parentid><names><name type="scientific_name">Vertebrata</name></names></taxon><taxon><taxid>89593</taxid><rank>subphylum</rank><name>Craniata</name><parentid>7711</parentid><names><name type="scientific_name">Craniata</name></names></taxon><taxon><taxid>7711</taxid><rank>phylum</rank><name>Chordata</name><parentid>33511</parentid><names><name type="scientific_name">Chordata</name></names></taxon><taxon><taxid>33511</taxid><rank>clade</rank><name>Deuterostomia</name><parentid>33213</parentid><names><name type="scientific_name">Deuterostomia</name></names></taxon><taxon><taxid>33213</taxid><rank>clade</rank><name>Bilateria</name><parentid>6072</parentid><names><name type="scientific_name">Bilateria</name></names></taxon><taxon><taxid>6072</taxid><rank>clade</rank><name>Eumetazoa</name><parentid>33208</parentid><names><name type="scientific_name">Eumetazoa</name></names></taxon><taxon><taxid>33208</taxid><rank>kingdom</rank><name>Metazoa</name><parentid>33154</parentid><names><name type="scientific_name">Metazoa</name></names></taxon><taxon><taxid>33154</taxid><rank>clade</rank><name>Opisthokonta</name><parentid>2759</parentid><names><name type="scientific_name">Opisthokonta</name></names></taxon><taxon><taxid>2759</taxid><rank>superkingdom</rank><name>Eukaryota</name><parentid>131567</parentid><names><name type="scientific_name">Eukaryota</name></names></taxon><taxon><taxid>131567</taxid><rank>no rank</rank><name>cellular organisms</name><parentid>None</parentid><names><name type="scientific_name">cellular organisms</name></names></taxon></lineage></resolve>
<resolve><query value="562" cast="taxid"><taxon><taxid>562</taxid><rank>species</rank><name>Escherichia coli</name><parentid>561</parentid><names><name type="scientific_name">Escherichia coli</name><name type="Synonym">Bacillus coli</name><name type="Synonym">Bacterium coli</name><name type="Synonym">Bacterium coli commune</name><name type="Synonym">Enterococcus coli</name><name type="CommonName">E. coli</name><name type="Includes">Escherichia sp. 3_2_53FAA</name><name type="Includes">Escherichia sp. MAR</name><name type="Includes">bacterium 10a</name><name type="Includes">bacterium E3</name><name type="EquivalentName">Escherichia/Shigella coli</name><name type="type material">ATCC 11775</name><name type="type material">ATCC:11775</name><name type="type material">BCCM/LMG:2092</name><name type="type material">CCUG 24</name><name type="type material">CCUG 29300</name><name type="type material">CCUG:24</name><name type="type material">CCUG:29300</name><name type="type material">CIP 54.8</name><name type="type material">CIP:54.8</name><name type="type material">DSM 30083</name><name type="type material">DSM:30083</name><name type="type material">IAM 12119</name><name type="type material">IAM:12119</name><name type="type material">JCM 1649</name><name type="type material">JCM:1649</name><name type="type material">LMG 2092</name><name type="type material">LMG:2092</name><name type="type material">NBRC 102203</name><name type="type material">NBRC:102203</name><name type="type material">NCCB 54008</name><name type="type material">NCCB:54008</name><name type="type material">NCTC 9001</name><name type="type material">NCTC:9001</name><name type="type material">personal::U5/41</name><name type="type material">strain U5/41</name></names></taxon></query><lineage><taxon><taxid>562</taxid><rank>species</rank><name>Escherichia coli</name><parentid>561</parentid><names><name type="scientific_name">Escherichia coli</name><name type="Synonym">Bacillus coli</name><name type="Synonym">Bacterium coli</name><name type="Synonym">Bacterium coli commune</name><name type="Synonym">Enterococcus coli</name><name type="CommonName">E. coli</name><name type="Includes">Escherichia sp. 3_2_53FAA</name><name type="Includes">Escherichia sp. MAR</name><name type="Includes">bacterium 10a</name><name type="Includes">bacterium E3</name><name type="EquivalentName">Escherichia/Shigella coli</name><name type="type material">ATCC 11775</name><name type="type material">ATCC:11775</name><name type="type material">BCCM/LMG:2092</name><name type="type material">CCUG 24</name><name type="type material">CCUG 29300</name><name type="type material">CCUG:24</name><name type="type material">CCUG:29300</name><name type="type material">CIP 54.8</name><name type="type material">CIP:54.8</name><name type="type material">DSM 30083</name><name type="type material">DSM:30083</name><name type="type material">IAM 12119</name><name type="type material">IAM:12119</name><name type="type material">JCM 1649</name><name type="type material">JCM:1649</name><name type="type material">LMG 2092</name><name type="type material">LMG:2092</name><name type="type material">NBRC 102203</name><name type="type material">NBRC:102203</name><name type="type material">NCCB 54008</name><name type="type material">NCCB:54008</name><name type="type material">NCTC 9001</name><name type="type material">NCTC:9001</name><name type="type material">personal::U5/41</name><name type="type material">strain U5/41</name></names></taxon><taxon><taxid>561</taxid><rank>genus</rank><name>Escherichia</name><parentid>543</parentid><names><name type="scientific_name">Escherichia</name></names></taxon><taxon><taxid>543</taxid><rank>family</rank><name>Enterobacteriaceae</name><parentid>91347</parentid><names><name type="scientific_name">Enterobacteriaceae</name></names></taxon><taxon><taxid>91347</taxid><rank>order</rank><name>Enterobacterales</name><parentid>1236</parentid><names><name type="scientific_name">Enterobacterales</name></names></taxon><taxon><taxid>1236</taxid><rank>class</rank><name>Gammaproteobacteria</name><parentid>1224</parentid><names><name type="scientific_name">Gammaproteobacteria</name></names></taxon><taxon><taxid>1224</taxid><rank>phylum</rank><name>Proteobacteria</name><parentid>2</parentid><names><name type="scientific_name">Proteobacteria</name></names></taxon><taxon><taxid>2</taxid><rank>superkingdom</rank><name>Bacteria</name><parentid>131567</parentid><names><name type="scientific_name">Bacteria</name></names></taxon><taxon><taxid>131567</taxid><rank>no rank</rank><name>cellular organisms</name><parentid>None</parentid><names><name type="scientific_name">cellular organisms</name></names></taxon></lineage></resolve>
<resolve><query value="10508" cast="taxid"><taxon><taxid>10508</taxid><rank>family</rank><name>Adenoviridae</name><parentid>2732559</parentid><names><name type="scientific_name">Adenoviridae</name></names></taxon></query><lineage><taxon><taxid>10508</taxid><rank>family</rank><name>Adenoviridae</name><parentid>2732559</parentid><names><name type="scientific_name">Adenoviridae</name></names></taxon><taxon><taxid>2732559</taxid><rank>order</rank><name>Rowavirales</name><parentid>2732529</parentid><names><name type="scientific_name">Rowavirales</name></names></taxon><taxon><taxid>2732529</taxid><rank>class</rank><name>Tectiliviricetes</name><parentid>2732008</parentid><names><name type="scientific_name">Tectiliviricetes</name></names></taxon><taxon><taxid>2732008</taxid><rank>phylum</rank><name>Preplasmiviricota</name><parentid>2732005</parentid><names><name type="scientific_name">Preplasmiviricota</name></names></taxon><taxon><taxid>2732005</taxid><rank>kingdom</rank><name>Bamfordvirae</name><parentid>2732004</parentid><names><name type="scientific_name">Bamfordvirae</name></names></taxon><taxon><taxid>2732004</taxid><rank>clade</rank><name>Varidnaviria</name><parentid>10239</parentid><names><name type="scientific_name">Varidnaviria</name></names></taxon><taxon><taxid>10239</taxid><rank>superkingdom</rank><name>Viruses</name><parentid>None</parentid><names><name type="scientific_name">Viruses</name></names></taxon></lineage></resolve>

Please note: this will return taxids, not names, But the lineages are identical. To keep the names, reuse -t 562,10508 -n man, 'Influenza B virus (B/Acre/121609/2012)' from the collect step in the last resolve step.

Subtree to extract lineages and ranks

Subtree can extract whole or partial lineages from any taxid within its lineage. This command works currently only with a local database. This examples creates local database to demonstrate the command.

Creating a local database:

$ ncbi-taxonomist collect -t 142786 9606 | ncbi-taxonomist import -db test.db

Obtaining subtrees

$ ncbi-taxonomist subtree -db test.db -t 142786 9606 --lrank order --hrank phylum
{"mode":"subtree","query":9606,"subtree":[{"taxid":9443,"rank":"order","names":{"Primates":"scientific_name"},"parentid":314146,"name":"Primates"},{"taxid":314146,"rank":"superorder","names":{"Euarchontoglires":"scientific_name"},"parentid":1437010,"name":"Euarchontoglires"},{"taxid":1437010,"rank":"clade","names":{"Boreoeutheria":"scientific_name"},"parentid":9347,"name":"Boreoeutheria"},{"taxid":9347,"rank":"clade","names":{"Eutheria":"scientific_name"},"parentid":32525,"name":"Eutheria"},{"taxid":32525,"rank":"clade","names":{"Theria":"scientific_name"},"parentid":40674,"name":"Theria"},{"taxid":40674,"rank":"class","names":{"Mammalia":"scientific_name"},"parentid":32524,"name":"Mammalia"},{"taxid":32524,"rank":"clade","names":{"Amniota":"scientific_name"},"parentid":32523,"name":"Amniota"},{"taxid":32523,"rank":"clade","names":{"Tetrapoda":"scientific_name"},"parentid":1338369,"name":"Tetrapoda"},{"taxid":1338369,"rank":"clade","names":{"Dipnotetrapodomorpha":"scientific_name"},"parentid":8287,"name":"Dipnotetrapodomorpha"},{"taxid":8287,"rank":"superclass","names":{"Sarcopterygii":"scientific_name"},"parentid":117571,"name":"Sarcopterygii"},{"taxid":117571,"rank":"clade","names":{"Euteleostomi":"scientific_name"},"parentid":117570,"name":"Euteleostomi"},{"taxid":117570,"rank":"clade","names":{"Teleostomi":"scientific_name"},"parentid":7776,"name":"Teleostomi"},{"taxid":7776,"rank":"clade","names":{"Gnathostomata":"scientific_name"},"parentid":7742,"name":"Gnathostomata"},{"taxid":7742,"rank":"clade","names":{"Vertebrata":"scientific_name"},"parentid":89593,"name":"Vertebrata"},{"taxid":89593,"rank":"subphylum","names":{"Craniata":"scientific_name"},"parentid":7711,"name":"Craniata"},{"taxid":7711,"rank":"phylum","names":{"Chordata":"scientific_name"},"parentid":33511,"name":"Chordata"}]}
{"mode":"subtree","query":142786,"subtree":[{"taxid":464095,"rank":"order","names":{"Picornavirales":"scientific_name"},"parentid":2732506,"name":"Picornavirales"},{"taxid":2732506,"rank":"class","names":{"Pisoniviricetes":"scientific_name"},"parentid":2732408,"name":"Pisoniviricetes"},{"taxid":2732408,"rank":"phylum","names":{"Pisuviricota":"scientific_name"},"parentid":2732396,"name":"Pisuviricota"}]}

$ ncbi-taxonomist subtree -x -db test.db -t 142786 9606 --lrank order --hrank phylum
<subtree><query value="9606" cast="taxid" /><tree><taxon><taxid>9443</taxid><rank>order</rank><name>Primates</name><parentid>314146</parentid><names><name type="scientific_name">Primates</name></names></taxon><taxon><taxid>314146</taxid><rank>superorder</rank><name>Euarchontoglires</name><parentid>1437010</parentid><names><name type="scientific_name">Euarchontoglires</name></names></taxon><taxon><taxid>1437010</taxid><rank>clade</rank><name>Boreoeutheria</name><parentid>9347</parentid><names><name type="scientific_name">Boreoeutheria</name></names></taxon><taxon><taxid>9347</taxid><rank>clade</rank><name>Eutheria</name><parentid>32525</parentid><names><name type="scientific_name">Eutheria</name></names></taxon><taxon><taxid>32525</taxid><rank>clade</rank><name>Theria</name><parentid>40674</parentid><names><name type="scientific_name">Theria</name></names></taxon><taxon><taxid>40674</taxid><rank>class</rank><name>Mammalia</name><parentid>32524</parentid><names><name type="scientific_name">Mammalia</name></names></taxon><taxon><taxid>32524</taxid><rank>clade</rank><name>Amniota</name><parentid>32523</parentid><names><name type="scientific_name">Amniota</name></names></taxon><taxon><taxid>32523</taxid><rank>clade</rank><name>Tetrapoda</name><parentid>1338369</parentid><names><name type="scientific_name">Tetrapoda</name></names></taxon><taxon><taxid>1338369</taxid><rank>clade</rank><name>Dipnotetrapodomorpha</name><parentid>8287</parentid><names><name type="scientific_name">Dipnotetrapodomorpha</name></names></taxon><taxon><taxid>8287</taxid><rank>superclass</rank><name>Sarcopterygii</name><parentid>117571</parentid><names><name type="scientific_name">Sarcopterygii</name></names></taxon><taxon><taxid>117571</taxid><rank>clade</rank><name>Euteleostomi</name><parentid>117570</parentid><names><name type="scientific_name">Euteleostomi</name></names></taxon><taxon><taxid>117570</taxid><rank>clade</rank><name>Teleostomi</name><parentid>7776</parentid><names><name type="scientific_name">Teleostomi</name></names></taxon><taxon><taxid>7776</taxid><rank>clade</rank><name>Gnathostomata</name><parentid>7742</parentid><names><name type="scientific_name">Gnathostomata</name></names></taxon><taxon><taxid>7742</taxid><rank>clade</rank><name>Vertebrata</name><parentid>89593</parentid><names><name type="scientific_name">Vertebrata</name></names></taxon><taxon><taxid>89593</taxid><rank>subphylum</rank><name>Craniata</name><parentid>7711</parentid><names><name type="scientific_name">Craniata</name></names></taxon><taxon><taxid>7711</taxid><rank>phylum</rank><name>Chordata</name><parentid>33511</parentid><names><name type="scientific_name">Chordata</name></names></taxon></tree></subtree>
<subtree><query value="142786" cast="taxid" /><tree><taxon><taxid>464095</taxid><rank>order</rank><name>Picornavirales</name><parentid>2732506</parentid><names><name type="scientific_name">Picornavirales</name></names></taxon><taxon><taxid>2732506</taxid><rank>class</rank><name>Pisoniviricetes</name><parentid>2732408</parentid><names><name type="scientific_name">Pisoniviricetes</name></names></taxon><taxon><taxid>2732408</taxid><rank>phylum</rank><name>Pisuviricota</name><parentid>2732396</parentid><names><name type="scientific_name">Pisuviricota</name></names></taxon></tree></subtree>

Group

Groups can organize taxa into non-taxonomic groups, e.g. taxa used in an experiment. The example collects two species by its common name, stores them in a local database and adds the to the group 'tree'.

$ ncbi-taxonomist collect -n 'Black willow' 'Black hickory' | \
  ncbi-taxonomist import -db taxa.db                        | \
  ncbi-taxonomist group --add tree -db taxa.db

Retrieve a group

Groups can be retrieved as taxids and processed, e.g. with jq, and reused.

$ ncbi-taxonomist group --get tree -db taxa.db  | \
  jq '.taxa[]'                                  | \
  ncbi-taxonomist map -t -db taxa.db

Empty result examples

map -edb bioproject -a PRJNA604394,PRJNA604390 -> PRJNA604390has no species data

Format specific filed from map output to CSV and TSV using jq

Extract taxid, rank, and scientific name from map JSON output using jq:

$: ncbi-taxonomist map -t 2 -n human -r | \
jq -r '[.taxon.taxid,.taxon.rank,(.taxon.names|to_entries[]|select(.value=="scientific_name").key)]|@csv'
9606,"species","Homo sapiens"
2,"superkingdom","Bacteria"

For TSV output, substitute @csv with @tsv in the jq command:

9606    species Homo sapiens
2       superkingdom    Bacteria

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ncbi-taxonomist-1.2.1.tar.gz (61.0 kB view hashes)

Uploaded Source

Built Distribution

ncbi_taxonomist-1.2.1-py3-none-any.whl (85.5 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page