A framework for evaluating link prediction models on heterogeneous biomedical graph data

These details have not been verified by PyPI

Project links

Homepage

Project description

OpenBioLink

OpenBioLink is a resource and evaluation framework for evaluating link prediction models on heterogeneous biomedical graph data. It contains benchmark datasets as well as the underlying scrips to create them and to evaluate a costume model on them.

Paper preprint on arXiv

Supplementary data

Installation

Pip

Install a pytorch version suitable for your system https://pytorch.org/
pip install openbiolink

Source

clone the git repository or download the project
Create a new python3.7, or python3.6 virtual environment (note: under Windows, only python3.6 will work) e.g.: python3 -m venv my_venv
activate the virtual environment
- windows: my_venv\Scrips\activate
- linux/mac: source my_venv/bin/activate
Install a pytorch version suitable for your system https://pytorch.org/
Install the requirements stated in requirements.txt e.g. pip install -r requirements.txt

Benchmark Dataset

The OpenBioLink2020 Dataset is a highly challenging benchmark dataset containing over 5 million positive and negative edges. The test set does not contain trivially predictable, inverse edges from the training set and does contain all different edge types, to provide a more realistic edge prediction scenario.

Leaderboard

model	hits@10	hits@1	paper	code
TransE (Baseline)	0.0749	0.0125	(under review)	Code
TransR (Baseline)	0.0639	0.0096	(under review)	Code

To also be able to analyze the effect of the data quality as well as the directionality of the evaluation graph other settings of OpenBioLink2020 are provided, in directed and undirected setting, with and without quality cutoff.

Manual

The OpenBioLink framework consists of three parts, called actions

graph creation
train-test split creation
training and evaluation

With the graph creation and the train-test set action, costumed data sets can be created to suit individual needs. The last action serves as interface to train and evaluate link prediction models.

Calling via GUI

By calling the program without any parameters, the gui is started, providing a handy interface to define parameters needed. In the last step, the corresponding command line options are displayed.

Calling via command line

From folder src python -m openbiolink.openBioLink -p WORKING_DIR_PATH [-action] [--options] ...

Action: Graph Creation

-g:    
    --undir         Output-Graph should be undirectional (default = directional)
    --qual          quality cutoff of the output-graph, options = [hq, mq, lq], (default = None -> all entries are used)
    --no_interact   Disables interactive mode - existing files will be replaced (default = interactive)
    --skip          Existing files will be skipped - in combination with --no_interact (default = replace)
    --no_dl         No download is being performed (e.g. when local data is used)
    --no_in         No input_files are created (e.g. when local data is used)
    --no_create     No graph is created (e.g. when only in-files should be created)
    --out_format [Format] [Sep]       Format of graph output, takes 2 arguments: list of file formats 
                                      [s= single file, m=multiple files] and list of separators 
                                      (e.g. t=tab, n=newline, or any other character) (default= s t)
    --no_qscore     The output files will contain no scores
    --dbs [Cls]     custom source databases selection to be used, full class name, options --> see doc
    --mes [Cls]     custom meta edges selection to be used, full class name, options --> see doc

Action: Train- Test Split Generation

-s
   --edges Path        Path to edges.csv file (required with action -s
   --tn_edges Path     Path to true_negatives_edges.csv file (required with action -s)
   --nodes Path        Path to nodes.csv file (required with action -s)
   --tts_sep [Sep]     Separator of edge, tn-edge and nodes file (e.g. t=tab, n=newline, 
                       or any other character) (default=t)
   --mode rand|time    Mode of train-test-set split, options=[rand, time], (default=rand)
   --test_frac F       Fraction of test set as float (default= 0.2)
   --crossval          Multiple train-validation-sets are generated
   --val F             fraction of validation set as float (default= 0.2) or number of folds as int
   --tmo_edges Path    Path to edges.csv file of t-minus-one graph (required for --mode time
   --tmo_tn_edges Path     Path to true_negatives_edges.csv file of t-minus-one graph (required for --mode time)
   --tmo_nodes Path        Path to nodes.csv file of t-minus-one graph (required for --mode time)

Action: Training and Evaluation

-e
    --model_cls Cls         class of the model to be trained/evaluated (required with -e)
    --config Path           Path to the models config file
    --no_train              No training is being performed, trained model id provided via --trained_model
    --trained_model Path    Path to trained model (required with --no_train)
    --no_eval               No evaluation is being performed, only training
    --test Path             Path to test set file (required with -e)
    --train Path            Path to trainings set file')
    --eval_nodes Path       Path to the nodes file (required for ranked triples if no corrupted triples 
                            file is provided and nodes cannot be taken from graph creation
    --metrics [Metric]      list of evaluation metrics
    --ks [K]                k's for hits@k metric (integer list)

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

0.1.4

Sep 13, 2021

0.1.3

Jan 26, 2021

0.1.2

Jan 26, 2021

This version

0.1.1

Jan 23, 2020

0.1.0

Jan 13, 2020

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

openbiolink-0.1.1.tar.gz (70.6 kB view hashes)

Uploaded Jan 23, 2020 Source

Built Distribution

openbiolink-0.1.1-py3-none-any.whl (208.9 kB view hashes)

Uploaded Jan 23, 2020 Python 3

Hashes for openbiolink-0.1.1.tar.gz

Hashes for openbiolink-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`99bf28cf3b473575d886effc6ebed9938560561600e0875da6aa8275d3afb669`
MD5	`e55b735407e66580cc713264bfde1fe4`
BLAKE2b-256	`0f2b4dbf88110a58dd092ebd26dabf0e5889a12915bbd9c7ec5393262568aa1a`

Hashes for openbiolink-0.1.1-py3-none-any.whl

Hashes for openbiolink-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`782f3cc4e623916c5fd825c0f50acc172b63d4cddce0fe474d22dcc33d9118d3`
MD5	`4a0e52ae5bb7bbd9f3b05fb107c362fc`
BLAKE2b-256	`0d145ab3677518ec6e5e95b90a6fc32092217bb277f4bfc9f26d062ac5563a0a`