Skip to main content

Framework for Interpretable Neural Networks

Project description

DOI Build

GenNet

Framework for Interpretable Neural Networks for Genetics

  1. What is GenNet?
  2. Getting started
  3. GenNet command line.
  4. Wiki

1. What is GenNet?

GenNet is a command line tool that can be used to create neural networks for (mainly) genetics. GenNet gives the opportunity to let you decide what should be connected to what. Any information that groups knowledge can therefore be used to define connections in the network. For example, gene annotations can be used to group genetic variants into genes, as seen in the first layer of the image. This creates meaningful and interpretable connections. When the network is trained the network learns which connections are important for the predicted phenotype and assigns these connections a higher weight. For more information about the framework and the interpretation read the paper:

GenNet framework: interpretable neural networks for phenotype prediction

The Gennet framework is based on tensorflow, click here for the custom layer.

2. Getting started

Follow the instructions below to get started.

[!TIP] Check the A to Z Colab tutorial for an overview on how to use GenNet with your own data!

Prerequisites:

  • GenNet can use CPU or GPU (which can be quite a bit faster for deeper networks) If you want to use cuda, please make sure you have the correct version of CUDA installed CUDA. GenNet has been tested for:

    • Python 3.5, CUDA 9.1, Tensorflow 1.12.0
    • Python 3.5, CUDA 10.0, Tensorflow 1.13.1
    • Python 3.5, CUDA 10.0, Tensorflow 2.0.0-beta1
    • Python 3.6-3.7, CUDA 10.1, Tensorflow 2.2.0 (currently default and recommended)
    • Python 3.* Tensorflow 2.2 to 2.5 CPU

Clone the repository

Open terminal. Navigate to the a place where you want to store the project. Clone the repository:

git clone https://github.com/arnovanhilten/GenNet

Install the virtual envionment

Navigate to the home folder and create a virtual environment

cd ~
python3 -m venv env_GenNet

This automatically installs the latest Tensorflow version for which GenNet has been tested. If you have an older version of CUDA install the appriopriate tensorflow-gpu by pip install tensorflow-gpu==1.13.1 (change 1.13.1 to your version).

Activate the environment

source ~/env_GenNet/bin/activate

Install the packages

pip3 install --upgrade pip
pip3 install -r requirements_GenNet.txt

GenNet is ready to use!

Navigate to the GenNet folder and use the following command to run the example:

python GenNet.py train -path ./examples/example_classification/ -ID 1

Check the wiki for more info!

3. GenNet command line.

Preparing the data

NOTE: In python indices start from zero

As seen in the overview the commmand line takes 3 inputs:

  1. genotype.h5 - a genotype matrix, each row is a sample/subject/patient, each column is a feature (i.e. genetic variant). The genotype file can be automatically generated from plink files and VCF files using python GenNet.py convert, use python GenNet.py convert --help for more options or check HASE wiki convert
  2. subject.csv - a .csv file with the following columns:
    • patient_id: am ID for each patient
    • labels: phenotype (with zeros and ones for classification and continuous values for regression)
    • genotype_row: The row in which the subject can be found in the genotype matrix (genotype.h5 file)
    • set: in which set the subject belongs (1 = training set, 2 = validation set, 3 = test, others= ignored)
  3. topology - This file describes the whole network: each row should be a "path" of the network, from input to output node. This file defines thus each connections in the network, giving you the freedom to design your network the way you want. In the GenNet framework we used biological knowledge such as gene annotations to do define meaningful connections, we included some helper functions to generate a topology file using Annovar. See the topoogy help for more information: python GenNet.py topology --help

Topology example:

layer0_node layer0_name layer1_node layer1_name layer2_node layer2_name
0 rs916977 0 HERC2 0 Ubiquitin mediated proteolysis
1 rs766173 1 BRCA2 1 Breast cancer
5 rs1799944 1 BRCA2 1 Breast cancer
6 rs4987047 1 BRCA2 1 Breast cancer
1276 SNP1276 612 UHMK1 2 Tyrosine metabolism

NOTE: It is important to name the column headers as shown in the table.

The first genetic variant in the genotypefile (row number zero!), named rs916977, is connected to the HERC2 node in the first layer. The HERC2 gene is node number zero. This node is conncted to the 'Ubiquitin mediated proteolysis' pathway which is the first node in the following layer. The next node is the end node which should not be included.

The second genetic variant 'rs766173' is connected to BRCA2 (node number 1 in the first layer), followed by the breast cancer pathway (node number 1 in the layer2), folowed by the end node.

The sixth(!) genetic variant 'rs1799944' is also connected to BRCA2 (whic was node number 1 in the first layer), followed by the breast cancer pathway (again node number 1 in the layer2), folowed by the end node.

All rows together describe all the connections in the network. Each layer should be described by a column layer#_node and a column layer#_name with # denoting the layer number.

Tip: Check the topology files in the examples folder.

Running GenNet

Open the command line and navigate to the GenNet folder. Start training by:

python GenNet.py train -path {/path/to/your/folder} -ID {experimment number}

For example:

python GenNet.py train -path ./examples/example_classification/ -ID 1

or

python GenNet.py train -path ./examples/example_regression/ -ID 2 -problem_type regression

Choose from: convert, topology, train, plot and interpret. For the options check the wiki or use:

python GenNet.py convert --help
python GenNet.py train --help
python GenNet.py plot --help
python GenNet.py topology --help
python GenNet.py interpret --help

GenNet output

After training your network it saved together with its results. Results include a text file with the performance, a .CSV file with all the connections and their weights, a .h5 with the best weights on the validation set and a plot of the training and validation loss. Using these files we can create visualizations to better understand the network.

The .CSV file with the weights can be used to create your own plot but python GenNet.py plot also has standard plots available. First we calculate the relative importance by multiplying all the weights between the output and each input. This can then be used to see the importance of each gene:

Manhattan plot

Or can be used in a Sunburt plot to get an overview of the whole network!

Sunburst plot

GenNet Interpretation module

GenNet offers a number of intepretation methods, to find important features and interacting features.

  • get_weight_scores: uses the weights to calculate the importance of each feature and node
  • DeepExplain: uses the gradient (see DeepExplain) to calculate the importance
  • RLIPP: uses logistic regression with signals to and from the node to calculate a measure of non-linearity for all nodes
  • NID: Finds interacting features based on the features with the strongest weights
  • DFIM: perturbs each input (or N inputs in the order of importance), and tracks which other features change importance to find interacting features
  • PathExplain: Uses the Expected Hessian to find interacting features

For more information use: python GenNet.py interpret --help

Jupyter notebook

The original jupyter notebooks can be found in the jupyter notebook folder. Navigate to the jupyter notebook folder and start with jupyter notebook. The notebooks are not updated but can be a useful source to understand the main code and/or to create .npz masks (to define connections between layers). For more freedom in designing your own networks you can define your network here and create masks using the notebooks.

More

A to Z tutorial in Google Colabs, try GenNet with a single click!

GenNet is also available on Superbio.ai!

Paper

The bioRxiv paper

All plots

Trained networks

Toy with the demo with the basic principles online!

(Deprecated) Jupyter notebooks

Contact

For questions or comments make an issue or mail to: a.vanhilten@erasmusmc.nl

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gennet_forked-2.0.0.4.tar.gz (96.0 kB view details)

Uploaded Source

Built Distribution

gennet_forked-2.0.0.4-py2.py3-none-any.whl (113.9 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file gennet_forked-2.0.0.4.tar.gz.

File metadata

  • Download URL: gennet_forked-2.0.0.4.tar.gz
  • Upload date:
  • Size: 96.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.0 CPython/3.10.14

File hashes

Hashes for gennet_forked-2.0.0.4.tar.gz
Algorithm Hash digest
SHA256 8df388fe511864e0434e81ea060d524fa8e03286ad2d4af9029fbdac5efc8824
MD5 1413a870af40b5c9941722ae5344b90b
BLAKE2b-256 97ab105a023b22237499f316b5773e4a62fcf6157e969eec3bb10fea8c940bd8

See more details on using hashes here.

File details

Details for the file gennet_forked-2.0.0.4-py2.py3-none-any.whl.

File metadata

File hashes

Hashes for gennet_forked-2.0.0.4-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 c91a9ba31c6e00f20bf2162de4379257dd32add837dc46217c2e698a5a81c116
MD5 a38893b1037f610b2b23c654b12d0d76
BLAKE2b-256 67299e627f449609ce798b16d1d68d9ff5c06b6c961acad8eb0721693dd05abd

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page