Integrated registry of biological databases and nomenclatures
Project description
Bioregistry
A community-driven integrative meta-registry of biological databases, ontologies, and other resources.
More information here.
⬇️ Download
The bioregistry database can be downloaded directly from here.
The manually curated portions of these data are available under the CC0 1.0 Universal License.
🙏 Contributing
There haven't been any external contributors yet, but if you want to get involved, you can make edits directly to the bioregistry.json file through the GitHub interface.
Things that would be helpful:
- For all entries, add a
["wikidata"]["database"]
entry. Many ontologies and databases don't have a property in Wikidata because the process of adding a new property is incredibly cautious. However, anyone can add a database as normal Wikidata item with a Q prefix. One example is UniPathway, whose Wikidata database item is Q85719315. If there's no database item on Wikidata, you can even make one! Note: don't mix this up with a paper describing the resource, Q35631060. If you see there's a paper, you can add it under the["wikidata"]["paper"]
key. - Adding
["homepage"]
entry for any entry that doesn't have an external reference
A full list of curation to-do's is automatically generated as a web page here. This page also has a more in-depth tutorial on how to contribute.
🚀 Installation
The Bioregistry can be installed from PyPI with:
$ pip install bioregistry
It can be installed in development mode for local curation with:
$ git clone https://github.com/bioregistry/bioregistry.git
$ cd bioregistry
$ pip install -e .
💪 Usage
The Bioregistry can be used to normalize prefixes across MIRIAM and all the (very plentiful) variants that pop up in
ontologies in OBO Foundry and the OLS with the normalize_prefix()
function.
import bioregistry
# This works for synonym prefixes, like:
assert 'ncbitaxon' == bioregistry.normalize_prefix('taxonomy')
# This works for common mistaken prefixes, like:
assert 'pubchem.compound' == bioregistry.normalize_prefix('pubchem')
# This works for prefixes that are often written many ways, like:
assert 'eccode' == bioregistry.normalize_prefix('ec-code')
assert 'eccode' == bioregistry.normalize_prefix('EC_CODE')
# If a prefix is not registered, it gives back `None`
assert bioregistry.normalize_prefix('not a real key') is None
The pattern for an entry in the Bioregistry can be looked up quickly with get_pattern()
if
it exists. It prefers the custom curated, then MIRIAM, then Wikidata pattern.
import bioregistry
assert '^GO:\\d{7}$' == bioregistry.get_pattern('go')
Entries in the Bioregistry can be checked for deprecation with the is_deprecated()
function. MIRIAM and OBO Foundry
don't often agree - OBO Foundry takes precedence since it seems to be updated more often.
import bioregistry
assert bioregistry.is_deprecated('nmr')
assert not bioregistry.is_deprecated('efo')
Entries in the Bioregistry can be looked up with the get()
function.
import bioregistry
entry = bioregistry.get('taxonomy')
# there are lots of mysteries to discover in this dictionary!
The full Bioregistry can be read in a Python project using:
import bioregistry
registry = bioregistry.read_registry()
🕸️ Resolver App
After installing with the [web]
extras, run the resolver CLI with
$ bioregistry web
to run a web app that functions like Identifiers.org, but backed by the Bioregistry. A public instance of this app is hosted by the INDRA Lab at https://bioregistry.io.
♻️ Update
The database is automatically updated daily thanks to scheduled workflows in GitHub Actions. The workflow's configuration can be found here and the last run can be seen here. Further, a changelog can be recapitulated from the commits of the GitHub Actions bot.
If you want to manually update the database after installing in development mode, run the following:
$ bioregistry update
⚖️ License
The code in this repository is licensed under the MIT License.
📖 Citation
Hopefully there will be a paper describing this resource on bioRxiv sometime in 2021! Until then, you can use the Zenodo BibTeX or CSL.
💰 Funding
The development of the Bioregistry is funded by the DARPA Young Faculty Award W911NF2010255 (PI: Benjamin M. Gyori).
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for bioregistry-0.2.3-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 76130dec08dfd4183ec3e1c54665071f64321d8f2695dc0ef8cedf7470485ec7 |
|
MD5 | e0955b6e42d8b7993f918b6b43f37fcc |
|
BLAKE2b-256 | c7722a47d1f2573fd3efe5a318d6a16f31ff5f89390dfb97f8de79d25c2a3217 |