A decentralized database of synonyms for biomedical concepts and entities.
Project description
Biosynonyms
A decentralized database of synonyms for biomedical entities and concepts. This resource is meant to be complementary to ontologies, databases, and other controlled vocabularies that provide synonyms. It's released under a permissive license (CC0), so they can be easily adopted by/contributed back to upstream resources.
Here's how to get the data:
import biosynonyms
# Uses an internal data structure
positive_synonyms = biosynonyms.get_positive_synonyms()
negative_synonyms = biosynonyms.get_negative_synonyms()
# Get ready for use in NER with Gilda, only using positive synonyms
gilda_terms = biosynonyms.get_gilda_terms()
Synonyms
The data are also accessible directly through TSV such that anyone can consume them from any programming language.
The positives.tsv
has the following
columns:
text
the synonym text itselfcurie
the compact uniform resource identifier (CURIE) for a biomedical entity or concept, standardized using the Bioregistryname
the standard name for the conceptscope
the match type, written as a CURIE from the OBO in OWL (oio
) controlled vocabulary, i.e., one of:oboInOwl:hasExactSynonym
oboInOwl:hasNarrowSynonym
oboInOwl:hasBroadSynonym
oboInOwl:hasRelatedSynonym
oboInOwl:hasSynonym
(use this if the scope is unknown)
type
the synonym property type, written as a CURIE from the OBO Metadata Ontology (omo
) controlled vocabulary, e.g., one of:OMO:0003000
(abbreviation)OMO:0003001
(ambiguous synonym)OMO:0003002
(dubious synonym)OMO:0003003
(layperson synonym)OMO:0003004
(plural form)- ...
references
a comma-delimited list of CURIEs corresponding to publications that use the given synonym (ideally using highly actionable identifiers from semantic spaces likepubmed
,pmc
,doi
)contributor
the ORCID identifier of the contributor
Here's an example of some rows in the synonyms table (with linkified CURIEs):
text | curie | scope | references | contributor |
---|---|---|---|---|
PI(3,4,5)P3 | CHEBI:16618 | oio:hasExactSynonym | pubmed:29623928, pubmed:20817957 | 0000-0003-4423-4370 |
phosphatidylinositol (3,4,5) P3 | CHEBI:16618 | oio:hasExactSynonym | pubmed:29695532 | 0000-0003-4423-4370 |
Incorrect Synonyms
The negatives.tsv
has the following
columns for non-trivial examples of text strings that aren't synonyms. This
document doesn't address the same issues as context-based disambiguation, but
rather helps dscribe issues like incorrect sub-string matching:
text
the non-synonym text itselfcurie
the compact uniform resource identifier (CURIE) for a biomedical entity or concept that does not match the following text, standardized using the Bioregistryreferences
same as forpositives.tsv
, illustrating documents where this string appearscontributor
the ORCID identifier of the contributor
Here's an example of some rows in the negative synonyms table (with linkified CURIEs):
text | curie | references | contributor |
---|---|---|---|
PI(3,4,5)P3 | hgnc:22979 | pubmed:29623928, pubmed:20817957 | 0000-0003-4423-4370 |
Known Limitations
It's hard to know which exact matches between different vocabularies could be used to deduplicate synonyms. Right now, this isn't covered but some partial solutions already exist that could be adopted.
License
All data are available under CC0 license. All code is available under MIT license.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file biosynonyms-0.0.1.tar.gz
.
File metadata
- Download URL: biosynonyms-0.0.1.tar.gz
- Upload date:
- Size: 17.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1760011feaa06974fc1d2b6978388543fd909c5d9a50b425825f93248a8acfee |
|
MD5 | 25d516a7fafbb9e98151f314efeb3be0 |
|
BLAKE2b-256 | 64a0e2c8a0a0f1ef0eb3b3379794123aa81346f2ba85f1345de5fbcaa3af3268 |
File details
Details for the file biosynonyms-0.0.1-py3-none-any.whl
.
File metadata
- Download URL: biosynonyms-0.0.1-py3-none-any.whl
- Upload date:
- Size: 14.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1e18e5c320b3fbf51e5cb735b39d55c37794533d9adb00d6ec8668bd73dcb026 |
|
MD5 | 75efe3b899ab4170d49b9a7de833fda1 |
|
BLAKE2b-256 | b80737f46d67ab6834176a6e04c0ac164476e2b94e000492d1d314c1e2ec3a65 |