Skip to main content

A decentralized database of synonyms for biomedical concepts and entities.

Project description

Biosynonyms

A decentralized database of synonyms for biomedical entities and concepts. This resource is meant to be complementary to ontologies, databases, and other controlled vocabularies that provide synonyms. It's released under a permissive license (CC0), so they can be easily adopted by/contributed back to upstream resources.

Here's how to get the data:

import biosynonyms

# Uses an internal data structure
positive_synonyms = biosynonyms.get_positive_synonyms()
negative_synonyms = biosynonyms.get_negative_synonyms()

# Get ready for use in NER with Gilda, only using positive synonyms
gilda_terms = biosynonyms.get_gilda_terms()

Synonyms

The data are also accessible directly through TSV such that anyone can consume them from any programming language.

The positives.tsv has the following columns:

  1. text the synonym text itself
  2. curie the compact uniform resource identifier (CURIE) for a biomedical entity or concept, standardized using the Bioregistry
  3. name the standard name for the concept
  4. scope the match type, written as a CURIE from the OBO in OWL (oio) controlled vocabulary, i.e., one of:
    • oboInOwl:hasExactSynonym
    • oboInOwl:hasNarrowSynonym
    • oboInOwl:hasBroadSynonym
    • oboInOwl:hasRelatedSynonym
    • oboInOwl:hasSynonym (use this if the scope is unknown)
  5. type the synonym property type, written as a CURIE from the OBO Metadata Ontology (omo) controlled vocabulary, e.g., one of:
    • OMO:0003000 (abbreviation)
    • OMO:0003001 (ambiguous synonym)
    • OMO:0003002 (dubious synonym)
    • OMO:0003003 (layperson synonym)
    • OMO:0003004 (plural form)
    • ...
  6. references a comma-delimited list of CURIEs corresponding to publications that use the given synonym (ideally using highly actionable identifiers from semantic spaces like pubmed, pmc, doi)
  7. contributor the ORCID identifier of the contributor

Here's an example of some rows in the synonyms table (with linkified CURIEs):

text curie scope references contributor
PI(3,4,5)P3 CHEBI:16618 oio:hasExactSynonym pubmed:29623928, pubmed:20817957 0000-0003-4423-4370
phosphatidylinositol (3,4,5) P3 CHEBI:16618 oio:hasExactSynonym pubmed:29695532 0000-0003-4423-4370

Incorrect Synonyms

The negatives.tsv has the following columns for non-trivial examples of text strings that aren't synonyms. This document doesn't address the same issues as context-based disambiguation, but rather helps dscribe issues like incorrect sub-string matching:

  1. text the non-synonym text itself
  2. curie the compact uniform resource identifier (CURIE) for a biomedical entity or concept that does not match the following text, standardized using the Bioregistry
  3. references same as for positives.tsv, illustrating documents where this string appears
  4. contributor the ORCID identifier of the contributor

Here's an example of some rows in the negative synonyms table (with linkified CURIEs):

text curie references contributor
PI(3,4,5)P3 hgnc:22979 pubmed:29623928, pubmed:20817957 0000-0003-4423-4370

Known Limitations

It's hard to know which exact matches between different vocabularies could be used to deduplicate synonyms. Right now, this isn't covered but some partial solutions already exist that could be adopted.

License

All data are available under CC0 license. All code is available under MIT license.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

biosynonyms-0.0.1.tar.gz (17.6 kB view details)

Uploaded Source

Built Distribution

biosynonyms-0.0.1-py3-none-any.whl (14.9 kB view details)

Uploaded Python 3

File details

Details for the file biosynonyms-0.0.1.tar.gz.

File metadata

  • Download URL: biosynonyms-0.0.1.tar.gz
  • Upload date:
  • Size: 17.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.6

File hashes

Hashes for biosynonyms-0.0.1.tar.gz
Algorithm Hash digest
SHA256 1760011feaa06974fc1d2b6978388543fd909c5d9a50b425825f93248a8acfee
MD5 25d516a7fafbb9e98151f314efeb3be0
BLAKE2b-256 64a0e2c8a0a0f1ef0eb3b3379794123aa81346f2ba85f1345de5fbcaa3af3268

See more details on using hashes here.

File details

Details for the file biosynonyms-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: biosynonyms-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 14.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.6

File hashes

Hashes for biosynonyms-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 1e18e5c320b3fbf51e5cb735b39d55c37794533d9adb00d6ec8668bd73dcb026
MD5 75efe3b899ab4170d49b9a7de833fda1
BLAKE2b-256 b80737f46d67ab6834176a6e04c0ac164476e2b94e000492d1d314c1e2ec3a65

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page