This project aims to train neural networks by compound-protein interactions and provides interpretation of the learned model by interactively showing transformed chemical landscape and visualized SAR for chemicals of interest.
VISAR: an interactive tool for dissecting chemical features learned by deep neural network QSAR models
Qingyang Ding, Siyu Hou, Songpeng Zu, Yonghui Zhang, Shao Li
Bioinformatics Division and Center for Synthetic and Systems Biology, TNLIST/Department of Automation, Tsinghua University, Beijing 100084, China
School of Pharmaceutical Science, Tsinghua University, Beijing 100084, China.
Please contact firstname.lastname@example.org if you have question or suggestions.
Table of contents
Aims of this project
(Back to Table of contents.)
While many previous works focus on improving predictive merits of the models, few looked into the trained model and check if the model is learning what's truly important, as well as link what have been learned by the model back to useful insights.
Here we took a step forward to interpret the learned features from deep neural network QSAR models, and present VISAR, an interactive tool for visualizing structure-activity relationship and the chemical activity landscape based on the learned features, thus providing deeper insights of the neural network 'black-box'. For a learning task, VISAR firstly provided users with useful functions to build, train and test the deep neural network models.
The rationale of VISAR workflow is shown in the schematic diagram below:
Starting from a series of trained weights of the neural network QSAR models, VISAR provided visualization tools for dissecting the learned chemical features on 3 levels: 1) on the macro-level, compounds with weighted features are clustered and forming different chemical landscapes regarding different tasks; 2) on the meso-level, within each local cluster of chemicals on the chemical landscape sharing similar sturcture and similar activity, pharmacophoric features could be identified; 3) on the micro-level, the SAR pattern is built for each compound regarding each task.
The VISAR workflow features:
- For a learning task, VISAR firstly provided users with useful functions to build, train and test the neural network models.
- The learned parameters of the models were then mapped back as weights of each atom and were visualized as structural-activity relationship (SAR) patterns, demonstrating the positive and negative contributor substructure suggested by the trained model.
- VISAR took the transformed features of the chemicals and build activity landscapes, showing the correlation between the descriptor space after model training and the experimental activity space.
- With the interactive web application of VISAR, users could interactively explore the chemical space and the SAR pattern for each chemical.
- Users could provide their compounds of interest, and predict target profiles with the trained model, as well as map them on the chemical space for further analysis.
- The clusters of chemicals on the landscape could be then subject to analysis of active pharmacophores.
We proposed that VISAR could serve as a helpful workflow for training and interactive analysis of the deep neural network QSAR model.
(Back to Table of contents.)
The training, testing and result processing pipeline is available in template jupyter notebooks:
The general steps for interactive analysis are:
Set the location (including the prefix) of the pre-composed dataframes and the mode of your training. After clicking "Run" button on the upper panel, the whole interface would update according to your settings.
- For single task sample dataset, set "Prefix of the input data:" as 'VISAR_webapp/data/T107_rep2_50_' with 'Mode of the model:' as 'ST', and then click "Run";
- For multitask sample dataset, set 'Prefix of the input data:' as 'VISAR_webapp/data/output_' with 'Mode of the model:' as 'RobustMT', and then click "Run".
Explore the activity profile of the chemical space on the left panel. There are several places allowing for interactive exploring, including: A. color options for the scatter plotting, enabling different color rendering based on eg. different activity of the compounds; B. number of bi-clusters, which correlated with the arrangement of the heatmap on the bottom panel (through trying out different bi-cluster numbers, users could gain an idea of how the activity profile is distributed on the chemical landscape); E. information fo the compounds when hovering your mouse on the scatter plot, displaying its ID, batch ID and the color code for the bi-cluster where it belongs; F. information of the batch when hovering your mouse on the heatmap, displaying its ID and color code for the bi-cluster where it belongs.
Upon selecting the batch or indivisual compounds on the left panel, visualize chemical structures along with the SAR pattern on the right panel. There are two ways for batch selection: first is to directly click on the heatmap, second is to use the drop-down list (C). As for compound selection, use the tap mode of the scatter plot and click on the points. Since for RobustMT mode, multiple tasks give their corresponding SAR patterns for the compound; thus by selecting SAR task (D), the SAR pattern of the compounds would update accordingly.
If a custom file is provided during the generation of RUNKEY dataframe, the predictive results for custom compounds could then be displayed by clicking the 'custom data' radio button. This function would allow the users to quickly explore the possible key atoms of their compounds of interests, and facilitate the experiment design.
(Back to Table of contents.)
- For model training, a working environment with python=3.6 and CUDA 9.0 is recommended, and the environment is depended on: Deepchem, Rdkit, Tensorflow, Numpy, Pandas, Sklearn, Scipy. The VISAR python package is available for pip install.
## Create an environment for model training which depends on deepchem conda create -n deepchem_visar python=3.6 conda activate deepchem_visar ## Install packages pip install tensorflow-gpu==1.12.0 conda install scipy # also install six conda install matplotlib conda install pandas conda install seaborn conda install -c conda-forge scikit-learn # also install joblib pip install deepchem==2.1.1.dev353 pip install visar # also install bokeh conda install -c rdkit rdkit # Installs also numpy and pandas conda install jupyter # Installs also ipykernel python -m ipykernel install --user --name deepchem_visar
- For using VISAR web application in visualising the trained models, the requirements for the environment is fewer, independent of GPU, and could be easily applied on desktops or laptops of either Windows or macOS platforms with chrome browsers. To use the interactive application, users need to firstly get the local copy of the VISAR repository by direct dowloading or
git lfs install # enable large file downloading git lfs clone https://github.com/Svvord/visar.git
The installation steps are as follows:
## Create an environment for visualization using VISAR web application conda create -n visar_viz python=3.6 conda activate visar_viz ## install packages conda install -c conda-forge rdkit # also install numpy and pandas conda install -c conda-forge scikit-learn # also install scipy conda install matplotlib conda install bokeh conda install cairosvg # start the web app cd /path/of/visar bokeh serve --show VISAR_webapp
Release history Release notifications | RSS feed
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.