Locally query the NCBI taxonomy
Project description
Taxadb2
Taxadb2 is an application to locally query the ncbi taxonomy. Taxadb2 is written in python, and access its database using the peewee library.
Taxadb2 is a fork from https://github.com/HadrienG/taxadb and handles the merged.dmp ncbi taxonomy file to deal with updated taxIDs.
- the built-in support for MySQL and PostgreSQL was not touched and kept as it is
merged.dmpsupport was added
In brief Taxadb2:
- is a small tool to query the ncbi taxonomy.
- is written in python >= 3.10.
- has built-in support for SQLite, MySQL and PostgreSQL.
- has available pre-built SQLite databases.
- has a comprehensive API documentation.
Installation
Taxadb2 requires python >= 3.10 to work. To install taxadb2 with sqlite support, simply type the following in your terminal:
pip3 install taxadb2
If you wish to use MySQL or PostgreSQL, please refer to the full documentation
Usage
Querying the Database
Firstly, make sure you have built the database
Below you can find basic examples. For more complete examples, please refer to the complete API documentation
>>> from taxadb2.taxid import TaxID
>>> from taxadb2.names import SciName
>>> from taxadb2.accessionid import AccessionID
>>> dbname = "taxadb2/test/test_db.sqlite"
>>> ncbi = {
>>> 'taxid': TaxID(dbtype='sqlite', dbname=dbname),
>>> 'names': SciName(dbtype='sqlite', dbname=dbname),
>>> 'accessionid': AccessionID(dbtype='sqlite', dbname=dbname)
>>> }
>>> taxid2name = ncbi['taxid'].sci_name(2)
>>> print(taxid2name)
Bacteria
>>> lineage = ncbi['taxid'].lineage_name(17)
>>> print(lineage[:5])
['Methylophilus methylotrophus', 'Methylophilus', 'Methylophilaceae', 'Nitrosomonadales', 'Betaproteobacteria']
>>> lineage = ncbi['taxid'].lineage_name(17, reverse=True)
>>> print(lineage[:5])
['cellular organisms', 'Bacteria', 'Pseudomonadati', 'Pseudomonadota', 'Betaproteobacteria']
>>> ncbi['taxid'].has_parent(17, 'Bacteria')
True
Get the taxid from a scientific name.
>>> from taxadb2.taxid import TaxID
>>> from taxadb2.names import SciName
>>> from taxadb2.accessionid import AccessionID
>>> dbname = "taxadb2/test/test_db.sqlite"
>>> ncbi = {
>>> 'taxid': TaxID(dbtype='sqlite', dbname=dbname),
>>> 'names': SciName(dbtype='sqlite', dbname=dbname),
>>> 'accessionid': AccessionID(dbtype='sqlite', dbname=dbname)
>>> }
>>> name2taxid = ncbi['names'].taxid('Pseudomonadota')
>>> print(name2taxid)
1224
Automatic detection of old taxIDs imported from merged.dmp.
>>> from taxadb2.taxid import TaxID
>>> from taxadb2.names import SciName
>>> from taxadb2.accessionid import AccessionID
>>> dbname = "taxadb2/test/test_db.sqlite"
>>> ncbi = {
>>> 'taxid': TaxID(dbtype='sqlite', dbname=dbname),
>>> 'names': SciName(dbtype='sqlite', dbname=dbname),
>>> 'accessionid': AccessionID(dbtype='sqlite', dbname=dbname)
>>> }
>>> taxid2name = ncbi['taxid'].sci_name(30)
TaxID 30 is deprecated, using 29 instead.
>>> print(taxid2name)
Myxococcales
Get the taxonomic information for accession number(s).
>>> from taxadb2.taxid import TaxID
>>> from taxadb2.names import SciName
>>> from taxadb2.accessionid import AccessionID
>>> dbname = "taxadb2/test/test_db.sqlite"
>>> ncbi = {
>>> 'taxid': TaxID(dbtype='sqlite', dbname=dbname),
>>> 'names': SciName(dbtype='sqlite', dbname=dbname),
>>> 'accessionid': AccessionID(dbtype='sqlite', dbname=dbname)
>>> }
>>> my_accessions = ['A01460']
>>> taxids = ncbi['accessionid'].taxid(my_accessions)
>>> taxids
<generator object AccessionID.taxid at 0x103e21bd0>
>>> for ti in taxids:
print(ti)
('A01460', 17)
You can also use a configuration file in order to automatically set database connection parameters at object build. Either set config parameter to init object method:
>>> from taxadb2.taxid import TaxID
>>> from taxadb2.names import SciName
>>> from taxadb2.accessionid import AccessionID
>>> config_path = "taxadb2/test/taxadb2.cfg"
>>> ncbi = {
>>> 'taxid': TaxID(config=config_path),
>>> 'names': SciName(config=config_path),
>>> 'accessionid': AccessionID(config=config_path)
>>> }
>>> ncbi['taxid'].sci_name(2)
Bacteria
>>> ...
or set environment variable TAXADB_CONFIG which point to configuration file:
$ export TAXADB2_CONFIG='taxadb2/test/taxadb2.cfg'
>>> from taxadb2.taxid import TaxID
>>> from taxadb2.names import SciName
>>> from taxadb2.accessionid import AccessionID
>>> ncbi = {
>>> 'taxid': TaxID(),
>>> 'names': SciName(),
>>> 'accessionid': AccessionID()
>>> }
>>> ncbi['taxid'].sci_name(2)
Bacteria
>>> ...
Check documentation for more information.
Creating the Database
Download data
The following commands will download the necessary files from the ncbi ftp into the directory taxadb.
$ taxadb2 download --outdir taxadb --type taxa
Insert data
SQLite
$ taxadb2 create --division taxa --input taxadb --dbname taxadb.sqlite
You can then safely remove the downloaded files
$ rm -r taxadb
You can easily rerun the same command, taxadb2 is able to skip already inserted taxid as well as accession.
Tests
Note: Relies on the pytest module. pip install pytest
You can easily run some tests. Go to the root directory of this projects cd /path/to/taxadb2 and run
pytest -v.
This simple command will run tests against an SQLite test database called test_db.sqlite located in taxadb2/test
directory.
It is also possible to only run tests related to accessionid or taxid as follow
$ pytest -m 'taxid'
$ pytest -m 'accessionid'
You can also use the configuration file located in root distribution taxadb2.ini as follow. This file should contain
database connection settings:
$ pytest taxadb2/test --config='taxadb2.ini'
License
Code is under the MIT license.
Issues
Found a bug or have a question? Please open an issue
Contributing
Thought about a new feature that you'd like us to implement? Open an issue or fork the repository and submit a pull request
Code of Conduct - Participation guidelines
This repository adhere to Contributor Covenant code of conduct for in any interactions you have within this project. (see Code of Conduct)
See also the policy against sexualized discrimination, harassment and violence for the Max Planck Society Code-of-Conduct.
By contributing to this project, you agree to abide by its terms.
References
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file taxadb2-0.12.3.tar.gz.
File metadata
- Download URL: taxadb2-0.12.3.tar.gz
- Upload date:
- Size: 24.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.10.16
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c3f8b4add73de45f599e5c3e3aeecc7b0982159f9df471311f0df9d2f7bdb322
|
|
| MD5 |
02c6ea2d5fdcbb1c10649dc70060bcc4
|
|
| BLAKE2b-256 |
90d7d0bbe21dc4f559c9eb5c381350c98864d69c720e18346718c8d765718422
|
File details
Details for the file taxadb2-0.12.3-py3-none-any.whl.
File metadata
- Download URL: taxadb2-0.12.3-py3-none-any.whl
- Upload date:
- Size: 26.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.10.16
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a259ae7afac435e9ea4b1bb3d6bc0ba71bc573250dda7a50a8c4c2b7e7c3eb38
|
|
| MD5 |
bac982fa9e88eb5ee0a160e23ca2cfe4
|
|
| BLAKE2b-256 |
37f97764175d85953c622467e8a7fd6907bc80a294b4bd8ef2b27cc7cbaeb676
|