Skip to main content

A biomedical entity-linking package

Project description

BioEL: A comprehensive package for training, evaluating, and benchmarking biomedical entity linking models.

Installation

conda create -n bioel python=3.9
conda activate bioel
pip install -e .

Development Instructions

  1. Install as in editable package using pip as shown above.
  2. Add any new dependencies to setup.py.
  3. Add tests to tests/ directory.

Ontologies

Ontologies included in the package :

Resolving abbreviations

As a preprocessing step, we resolve abbreviations in the text using Ab3P, an abbreviation detector created for biomedical text. We ran abbreviation detection on the text of all documents in our benchmark, the results of which are stored in a large dictionary in data/abbreviations.json. In order to reproduce our abbreviation detection/resolution pipeline, please run the following:

from bioel.utils.solve_abbreviation.solve_abbreviation import create_abbrev
create_abbrev(output_dir, all_dataset)
# output_path : path where to create abbreviations.json
# all_dataset : datasets for which you want the abbreviations.

Example usage

# Import modules
from bioel.model import BioEL_Model
from bioel.evaluate import Evaluate

# load model
krissbert = BioEL_Model.load_krissbert(
        name="krissbert", params_file=params,
    )
# Look at data/params.json for more information about the parameters
krissbert.training() # train
krissbert.inference() # inference

abbreviations_path = "data/abbreviations.json"
dataset_names = ["ncbi_disease"]
model_names = ["krissbert"]
path_to_result = {
    "ncbi_disease": {
        "krissbert": "results/ncbi_disease.json"
    }
}

# Results
evaluator = Evaluate(dataset_names, model_names, path_to_result, abbreviations_path)
evaluator.load_results()
evaluator.process_datasets()
evaluator.evaluate()
evaluator.plot_results()

ArboEL

ArboEL operates in two stages: First, you need to train the biencoder (load_arboel_biencoder). Then, you use the candidate results from the biencoder to train the crossencoder (load_arboel_crossencoder) and perform evaluation with the crossencoder.

BioBART/BioGenEL

BioBART and BioGenEL share the same entity linking module:

  • In order to finetune from BioBART set the model_load_path parameter in the .json config file to GanjinZero/biobart-v2-large, it will load the pretrained weights from HuggingFace.

  • In order the finetune from BioGenEL's Knowledge base guided pretrained weights, you first must download the pretrained weights from this link: https://drive.google.com/file/d/1TqvQRau1WPYE9hKfemKZr-9ptE-7USAH/view?usp=sharing and then set the model_load_path parameter in the .json config file to the path where you stored the pretrained weights.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bioel_package_pathology_dynamics-0.0.1.tar.gz (171.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

File details

Details for the file bioel_package_pathology_dynamics-0.0.1.tar.gz.

File metadata

File hashes

Hashes for bioel_package_pathology_dynamics-0.0.1.tar.gz
Algorithm Hash digest
SHA256 27d1014a347fb2e58ac62c859dc836965bbb429255625763056341b13c55a557
MD5 2f36d7a1fc1d0dd01f324113bea2a4e7
BLAKE2b-256 93b6a30981145ed800bee8e8790fcef8c49675ede2c2edc7fb824270bbd8553c

See more details on using hashes here.

File details

Details for the file bioel_package_pathology_dynamics-0.0.1-py3-none-any.whl.

File metadata

File hashes

Hashes for bioel_package_pathology_dynamics-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 97ef5488477991b7e5f337e5e80155e1cb424e9cbc44db4cab448fe611e13111
MD5 231bd4386cba01c81f75d282002ad855
BLAKE2b-256 a2ae694d00372938d2e5c17b0bf8356dde5580736de3e00135925c0da95d8330

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page