Skip to main content

A Python toolkit for biological network learning evaluation

Project description

PyPI version Documentation Status License: MIT Code style: black Imports: isort

Tests Test Examples Test Data

Open Biomedical Network Benchmark

The Open Biomedical Network Benchmark (OBNB) is a comprehensive resource for setting up benchmarking graph datasets using biomedical networks and gene annotations. Our goal is to accelerate the adoption of advanced graph machine learning techniques, such as graph neural networks and graph embeddings, in network biology for gaining novel insights into genes' function, trait, and disease associations using biological networks. To make this adoption convenient, OBNB also provides dataset objects compatible with popular graph deep learning frameworks, including PyTorch Geometric (PyG) and Deep Graph Library (DGL).

A comprehensive benchmarking study with a wide-range of graph neural networks and graph embedding methods on OBNB datasets can be found in our benchmarking repository obnbench.

Package usage

Construct default datasets

We provide a high-level dataset constructor to help users easily set up benchmarking graph datasets for a combination of network and label. In particular, the dataset will be set up with study-bias holdout split (6/2/2), where 60% of the most well-studied genes according to the number of associated PubMed publications are used for training, 20% of the least studied genes are used for testing, and the rest of the 20% genes are used for validation. For more customizable data loading and processing options, see the customized dataset construction section below.

from obnb.dataset import OpenBiomedNetBench
from obnb.util.version import get_available_data_versions

root = "datasets"  # save dataset and cache under the datasets/ directory
version = "current"  # use the last archived version
# Optionally, set version to the specific data version number
# Or, set version to "latest" to download the latest data from source and process it from scratch

# Download and process network/label data. Use the adjacency matrix as the ML feature
dataset = OpenBiomedNetBench(root=root, graph_name="BioGRID", label_name="DisGeNET",
                             version=version, graph_as_feature=True, use_dense_graph=True)

# Check the specific archive data version used
print(dataset.version)

# Check all available stable archive data versions
print(get_available_data_versions())

Users can also load the dataset objects into ones that are compatible with PyG or DGL (see below).

PyG dataset

from obnb.dataset import OpenBiomedNetBenchPyG
dataset = OpenBiomedNetBenchPyG(root, "BioGRID", "DisGeNET")

Note: requires installing PyG first (see installation instructions)

DGL dataset

from obnb.dataset import OpenBiomedNetBenchDGL
dataset = OpenBiomedNetBenchDGL(root, "BioGRID", "DisGeNET")

Note: requires installing DGL first (see installation instructions)

Evaluating standard models

Evaluation of simple machine learning methods such as logistic regression and label propagation can be done easily using the trainer objects.

from obnb.model_trainer import SupervisedLearningTrainer, LabelPropagationTrainer

sl_trainer = SupervisedLearningTrainer()
lp_trainer = LabelPropagationTrainer()

Then, use the fit_and_eval method of the trainer to evaluate a given ML model over all tasks in a one-vs-rest setting.

from sklearn.linear_model import LogisticRegression
from obnb.model.label_propagation import OneHopPropagation

# Initialize models
sl_mdl = LogisticRegression(penalty="l2", solver="lbfgs")
lp_mdl = OneHopPropagation()

# Evaluate the models over all tasks
sl_results = sl_trainer.fit_and_eval(sl_mdl, dataset)
lp_results = lp_trainer.fit_and_eval(lp_mdl, dataset)

Evaluating GNN models

Training and evaluation of Graph Neural Network (GNN) models can be done in a very similar fashion.

from torch_geometric.nn import GCN
from obnb.model_trainer.gnn import SimpleGNNTrainer

# Use 1-dimensional trivial node feature by default
dataset = OpenBiomedNetBench(root=root, graph_name="BioGRID", label_name="DisGeNET", version=version)

# Train and evaluate a GCN
gcn_mdl = GCN(in_channels=1, hidden_channels=64, num_layers=5, out_channels=n_tasks)
gcn_trainer = SimpleGNNTrainer(device="cuda", metric_best="apop")
gcn_results = gcn_trainer.train(gcn_mdl, dataset)

Customized dataset construction

Load network and labels

from obnb import data

# Load processed BioGRID data from archive.
g = data.BioGRID(root, version=version)

# Load DisGeNET gene set collections.
lsc = data.DisGeNET(root, version=version)

Setting up data and splits

from obnb.util.converter import GenePropertyConverter
from obnb.label.split import RatioHoldout

# Load PubMed count gene property converter and use it to set up study-bias holdout split
pubmedcnt_converter = GenePropertyConverter(root, name="PubMedCount")
splitter = RatioHoldout(0.6, 0.4, ascending=False, property_converter=pubmedcnt_converter)

Filter labeled data based on network genes and splits

# Apply in-place filters to the labelset collection
lsc.iapply(
    filters.Compose(
        # Only use genes that are present in the network
        filters.EntityExistenceFilter(list(g.node_ids)),
        # Remove any labelsets with less than 50 network genes
        filters.LabelsetRangeFilterSize(min_val=50),
        # Make sure each split has at least 10 positive examples
        filters.LabelsetRangeFilterSplit(min_val=10, splitter=splitter),
    ),
)

Combine into dataset

from obnb import Dataset
dataset = Dataset(graph=g, feature=g.to_dense_graph().to_feature(), label=lsc, splitter=splitter)

Installation

OBNB can be installed easily via pip from PyPI:

pip install obnb

Install with extension modules (optional)

OBNB provides interfaces with several other packages for network feature extractions, such as PecanPy and GraPE. To enable those extensions, install obnb with the ext extra option enabled:

pip install obnb[ext]

Install graph deep learning libraries (optional)

Follow installation instructions for PyG or DGL to set up the graph deep learning library of your choice.

Alternatively, we also provide an installation script that helps you installthe graph deep-learning dependencies in a new conda environment obnb:

git clone https://github.com/krishnanlab/obnb && cd obnb
source install.sh cu117  # other options are [cpu,cu118]

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

obnb-0.1.0.tar.gz (117.8 kB view details)

Uploaded Source

Built Distribution

obnb-0.1.0-py3-none-any.whl (145.1 kB view details)

Uploaded Python 3

File details

Details for the file obnb-0.1.0.tar.gz.

File metadata

  • Download URL: obnb-0.1.0.tar.gz
  • Upload date:
  • Size: 117.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.8.17

File hashes

Hashes for obnb-0.1.0.tar.gz
Algorithm Hash digest
SHA256 4ed8c64fed201ab54f4342a5ab0ff54d00fb563c87e83294a839b85888a8a081
MD5 1c68d476b23e6696d573d998ad01950c
BLAKE2b-256 7c6e581f60b1b0acf07c3ba8b51f3357ae31e38374276ab62e0d87e5b28871cc

See more details on using hashes here.

File details

Details for the file obnb-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: obnb-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 145.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.8.17

File hashes

Hashes for obnb-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 a4789ad797a650dc958cf93beac1a9161a0001ad3c0b3d4860a6ec475b78b590
MD5 2b92247d26674060725ec4579208817c
BLAKE2b-256 cb820668dd37ab8013defac84f6e7be8ad4d534d3bba34e6a0abc79e02c8f37e

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page