A biomedical entity-linking package
Project description
BioEL: A comprehensive package for training, evaluating, and benchmarking biomedical entity linking models.
Installation
conda create -n bioel python=3.9
conda activate bioel
pip install -e .
Development Instructions
- Install as in editable package using
pipas shown above. - Add any new dependencies to
setup.py. - Add tests to
tests/directory.
Ontologies
Ontologies included in the package :
-
UMLS : UMLS is licensed by the National Library of Medicine and requires a free account to download. You can sign up for an account at https://uts.nlm.nih.gov/uts/signup-login. Once your account has been approved, you can download the UMLS metathesaurus at https://www.nlm.nih.gov/research/umls/licensedcontent/umlsknowledgesources.html.
-
MeSH : MeSH is derived from UMLS
-
NCBI Gene (Entrez) ontololgy. It can be downloaded at https://ftp.ncbi.nih.gov/gene/DATA/ (gene_info.gz)
-
MEDIC (MErged DIsease voCabulary) : It can be downloaded at https://ctdbase.org/downloads/
Resolving abbreviations
As a preprocessing step, we resolve abbreviations in the text using Ab3P, an abbreviation detector created for biomedical text. We ran abbreviation detection on the text of all documents in our benchmark, the results of which are stored in a large dictionary in data/abbreviations.json. In order to reproduce our abbreviation detection/resolution pipeline, please run the following:
from bioel.utils.solve_abbreviation.solve_abbreviation import create_abbrev
create_abbrev(output_dir, all_dataset)
# output_path : path where to create abbreviations.json
# all_dataset : datasets for which you want the abbreviations.
Example usage
# Import modules
from bioel.model import BioEL_Model
from bioel.evaluate import Evaluate
# load model
krissbert = BioEL_Model.load_krissbert(
name="krissbert", params_file=params,
)
# Look at data/params.json for more information about the parameters
krissbert.training() # train
krissbert.inference() # inference
abbreviations_path = "data/abbreviations.json"
dataset_names = ["ncbi_disease"]
model_names = ["krissbert"]
path_to_result = {
"ncbi_disease": {
"krissbert": "results/ncbi_disease.json"
}
}
# Results
evaluator = Evaluate(dataset_names, model_names, path_to_result, abbreviations_path)
evaluator.load_results()
evaluator.process_datasets()
evaluator.evaluate()
evaluator.plot_results()
ArboEL
ArboEL operates in two stages: First, you need to train the biencoder (load_arboel_biencoder). Then, you use the candidate results from the biencoder to train the crossencoder (load_arboel_crossencoder) and perform evaluation with the crossencoder.
BioBART/BioGenEL
BioBART and BioGenEL share the same entity linking module:
-
In order to finetune from BioBART set the
model_load_pathparameter in the .json config file toGanjinZero/biobart-v2-large, it will load the pretrained weights from HuggingFace. -
In order the finetune from BioGenEL's Knowledge base guided pretrained weights, you first must download the pretrained weights from this link: https://drive.google.com/file/d/1TqvQRau1WPYE9hKfemKZr-9ptE-7USAH/view?usp=sharing and then set the
model_load_pathparameter in the .json config file to the path where you stored the pretrained weights.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file bioel_package_pathology_dynamics-0.0.1.tar.gz.
File metadata
- Download URL: bioel_package_pathology_dynamics-0.0.1.tar.gz
- Upload date:
- Size: 171.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.9.19
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
27d1014a347fb2e58ac62c859dc836965bbb429255625763056341b13c55a557
|
|
| MD5 |
2f36d7a1fc1d0dd01f324113bea2a4e7
|
|
| BLAKE2b-256 |
93b6a30981145ed800bee8e8790fcef8c49675ede2c2edc7fb824270bbd8553c
|
File details
Details for the file bioel_package_pathology_dynamics-0.0.1-py3-none-any.whl.
File metadata
- Download URL: bioel_package_pathology_dynamics-0.0.1-py3-none-any.whl
- Upload date:
- Size: 192.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.9.19
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
97ef5488477991b7e5f337e5e80155e1cb424e9cbc44db4cab448fe611e13111
|
|
| MD5 |
231bd4386cba01c81f75d282002ad855
|
|
| BLAKE2b-256 |
a2ae694d00372938d2e5c17b0bf8356dde5580736de3e00135925c0da95d8330
|