Skip to main content

A package to generate and interpret biologically informed neural networks.

Project description

Biologically Informed Neural Network (BINN)

Docs License: MIT

BINN documentation is avaiable here.

The BINN-package allows you to create a sparse neural network from a pathway and input file. The examples presented in notebooks use the Reactome pathway database and a proteomic dataset to generate the neural network. It also allows you to train and interpret the network using SHAP. Plotting functions are also available for generating sankey plots. The article presenting the BINN can currently be found at bioRxiv.


Usage

First, a network is created. This is the network that will be used to create the sparse BINN.

from binn import BINN, Network
import pandas as pd

input_data = pd.read_csv("../data/test_data.tsv", sep="\t")
translation = pd.read_csv("../data/translation.tsv", sep="\t")
pathways = pd.read_csv("../data/pathways.tsv", sep="\t")

network = Network(
    input_data=input_data,
    pathways=pathways,
    mapping=translation,
    verbose=True
)

The BINN can thereafter be generated using the network:

binn = BINN(
    pathways=network,
    n_layers=4,
    dropout=0.2,
    validate=False,
)

An sklearn wrapper is also available:

from binn import BINNClassifier

binn = BINNClassifier(
    pathways=network,
    n_layers=4,
    dropout=0.2,
    validate=True,
    epochs=10,
    threads=10,
    logger=SuperLogger("logs/test")
)

This generates the Pytorch sequential model:

Sequential(
  (Layer_0): Linear(in_features=446, out_features=953, bias=True)
  (BatchNorm_0): BatchNorm1d(953, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (Dropout_0): Dropout(p=0.2, inplace=False)
  (Tanh 0): Tanh()
  (Layer_1): Linear(in_features=953, out_features=455, bias=True)
  (BatchNorm_1): BatchNorm1d(455, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (Dropout_1): Dropout(p=0.2, inplace=False)
  (Tanh 1): Tanh()
  (Layer_2): Linear(in_features=455, out_features=162, bias=True)
  (BatchNorm_2): BatchNorm1d(162, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (Dropout_2): Dropout(p=0.2, inplace=False)
  (Tanh 2): Tanh()
  (Layer_3): Linear(in_features=162, out_features=28, bias=True)
  (BatchNorm_3): BatchNorm1d(28, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (Dropout_3): Dropout(p=0.2, inplace=False)
  (Tanh 3): Tanh()
  (Output layer): Linear(in_features=28, out_features=2, bias=True)
)

Example input

Test data - this file should contain a column with the feature names (quantmatrix or some matrix containing input column - in this case "Protein")

PeptideSequence Protein (this is our input column)
VDRDVAPGTLC(UniMod:4)DVAGWGIVNHAGR P00746
VDRDVAPGTLC(UniMod:4)DVAGWGIVNHAGR P00746
VDTVDPPYPR P04004
AVTEQGAELSNEER P27348
VDVIPVNLPGEHGQR P02751

Pathways file - this file should contain the mapping used to create the connectivity in the hidden layers.

parent (target) child (source)
R-BTA-109581 R-BTA-109606
R-BTA-109581 R-BTA-169911
R-BTA-109581 R-BTA-5357769
R-BTA-109581 R-BTA-75153
R-BTA-109582 R-BTA-140877

Translation file - this file is alternative, but is useful if some translation is needed to map the input features to the pathways in the hiddenn layers. In this case, it is used to map proteins (UniProt IDs) to pathways (Reactome IDs).

input (UniProd IDs) translation (Reactome IDs)
A0A075B6P5 R-HSA-166663
A0A075B6P5 R-HSA-173623
A0A075B6P5 R-HSA-198933
A0A075B6P5 R-HSA-202733
A0A075B6P5 R-HSA-2029481

Plotting

Plotting a subgraph starting from a node generates the plot: Pathway sankey! A compelte sankey may look like this: Complete sankey!

Installation

The package can be installed and built from source with git.

git clone git@github.com:InfectionMedicineProteomics/BINN.git
pip install -e BINN/

Testing

The software has been tested on desktop machines running Windows 10/Linux (Ubuntu). Small networks are not RAM-intensive and all experiments have been run comfortably with 16 GB RAM.

Contributors

Erik Hartman, infection medicine proteomics, Lund University

Aaron Scott, infection medicine proteomics, Lund University

Contact

Erik Hartman - erik.hartman@hotmail.com

imp

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

binn-0.0.2.tar.gz (20.1 kB view details)

Uploaded Source

Built Distribution

binn-0.0.2-py3-none-any.whl (19.8 kB view details)

Uploaded Python 3

File details

Details for the file binn-0.0.2.tar.gz.

File metadata

  • Download URL: binn-0.0.2.tar.gz
  • Upload date:
  • Size: 20.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.8

File hashes

Hashes for binn-0.0.2.tar.gz
Algorithm Hash digest
SHA256 2b3d2d288511a9c166fff4c43b9528d21537fb762aacc9b3c916c215981d041f
MD5 3bdf8743d60d54ccdad2eba3d3309347
BLAKE2b-256 1bb9e009bc06a4646ad46e7d05fdc854e0a7d36874069035ac766729edf6591a

See more details on using hashes here.

File details

Details for the file binn-0.0.2-py3-none-any.whl.

File metadata

  • Download URL: binn-0.0.2-py3-none-any.whl
  • Upload date:
  • Size: 19.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.8

File hashes

Hashes for binn-0.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 fdb56ce6f4bf7ef2824b90a2c8f3e35483fa49ec3a0774028c81b91cc448fd38
MD5 90b99b5f8553a21cb7686e0d7dca86ed
BLAKE2b-256 43963edfb95c9a775ce94d61bf0fc468b0cb34f1f3dc33941293ac53b9dbed97

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page