Skip to main content

MolMap: An Efficient Convolutional Neural Network Based Molecular Deep Learning Tool

Project description

License: MIT Documentation Status Build Status DOI Codeocean Paper

MolMap

MolMap is generated by the following steps:

  • Step1: Input structures
  • Step2: Feature extraction
  • Step3: Feature pairwise distance calculation --> cosine, correlation, jaccard
  • Step4: Feature 2D embedding --> umap, tsne, mds
  • Step5: Feature grid arrangement --> grid, scatter
  • Step5: Transform --> minmax, standard

MolMap Fmaps for compounds

fmap_dynamicly

Construction of the MolMap Objects


molmap

The MolMapNet Architecture


net

Installation


  1. install rdkit and tamp first(create a molmap env):
conda create -c conda-forge -n molmap rdkit python=3.7
conda activate molmap
conda install -c tmap tmap
pip install molmap
  1. ChemBench (optional, if you wish to use the dataset and the split induces in this paper).

  2. If you have gcc problems when you install molmap, please installing g++ first:

sudo apt-get install g++

Out-of-the-Box Usage


code

import molmap
# Define your molmap
mp_name = './descriptor.mp'
mp = molmap.MolMap(ftype = 'descriptor', fmap_type = 'grid',
                   split_channels = True,   metric='cosine', var_thr=1e-4)
# Fit your molmap
mp.fit(method = 'umap', verbose = 2)
mp.save(mp_name) 
# Visulization of your molmap
mp.plot_scatter()
mp.plot_grid()
# Batch transform 
from molmap import dataset
data = dataset.load_ESOL()
smiles_list = data.x # list of smiles strings
X = mp.batch_transform(smiles_list,  scale = True, 
                       scale_method = 'minmax', n_jobs=8)
Y = data.y 
print(X.shape)
# Train on your data and test on the external test set
from molmap.model import RegressionEstimator
from sklearn.utils import shuffle 
import numpy as np
import pandas as pd
def Rdsplit(df, random_state = 888, split_size = [0.8, 0.1, 0.1]):
    base_indices = np.arange(len(df)) 
    base_indices = shuffle(base_indices, random_state = random_state) 
    nb_test = int(len(base_indices) * split_size[2]) 
    nb_val = int(len(base_indices) * split_size[1]) 
    test_idx = base_indices[0:nb_test] 
    valid_idx = base_indices[(nb_test):(nb_test+nb_val)] 
    train_idx = base_indices[(nb_test+nb_val):len(base_indices)] 
    print(len(train_idx), len(valid_idx), len(test_idx)) 
    return train_idx, valid_idx, test_idx 
# split your data
train_idx, valid_idx, test_idx = Rdsplit(data.x, random_state = 888)
trainX = X[train_idx]
trainY = Y[train_idx]
validX = X[valid_idx]
validY = Y[valid_idx]
testX = X[test_idx]
testY = Y[test_idx]

# fit your model
clf = RegressionEstimator(n_outputs=trainY.shape[1], 
                          fmap_shape1 = trainX.shape[1:], 
                          dense_layers = [128, 64], gpuid = 0) 
clf.fit(trainX, trainY, validX, validY)

# make prediction
testY_pred = clf.predict(testX)
rmse, r2 = clf._performance.evaluate(testX, testY)
print(rmse, r2)

Out-of-the-Box Performances


Dataset Task Metric MoleculeNet (GCN Best Model) Chemprop (D-MPNN model) MolMapNet (MMNB model)
ESOL RMSE 0.580 (MPNN) 0.555 0.575
FreeSolv RMSE 1.150 (MPNN) 1.075 1.155
Lipop RMSE 0.655 (GC) 0.555 0.625
PDBbind-F RMSE 1.440 (GC) 1.391 0.721
PDBbind-C RMSE 1.920 (GC) 2.173 0.931
PDBbind-R RMSE 1.650 (GC) 1.486 0.889
BACE ROC_AUC 0.806 (Weave) N.A. 0.849
HIV ROC_AUC 0.763 (GC) 0.776 0.777
PCBA PRC_AUC 0.136 (GC) 0.335 0.276
MUV PRC_AUC 0.109 (Weave) 0.041 0.096
ChEMBL ROC_AUC N.A. 0.739 0.750
Tox21 ROC_AUC 0.829 (GC) 0.851 0.845
SIDER ROC_AUC 0.638 (GC) 0.676 0.68
ClinTox ROC_AUC 0.832 (GC) 0.864 0.888
BBBP ROC_AUC 0.690 (Weave) 0.738 0.739

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

molmap-1.3.2.tar.gz (74.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

molmap-1.3.2-py3-none-any.whl (5.7 MB view details)

Uploaded Python 3

File details

Details for the file molmap-1.3.2.tar.gz.

File metadata

  • Download URL: molmap-1.3.2.tar.gz
  • Upload date:
  • Size: 74.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.0 CPython/3.7.11

File hashes

Hashes for molmap-1.3.2.tar.gz
Algorithm Hash digest
SHA256 a40d6e4c5adac88b73773dcf09819cdaa014fb273e4eaf5eca49586c3d244d3e
MD5 62a66fffe31d07630b010d60d058d0e0
BLAKE2b-256 3ada5169dcc8e6404a4398c11cd1d896915672e500c894e97ef15770fdd0b2e3

See more details on using hashes here.

File details

Details for the file molmap-1.3.2-py3-none-any.whl.

File metadata

  • Download URL: molmap-1.3.2-py3-none-any.whl
  • Upload date:
  • Size: 5.7 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.0 CPython/3.7.11

File hashes

Hashes for molmap-1.3.2-py3-none-any.whl
Algorithm Hash digest
SHA256 4fb3584fd7a44d838dd9814252eb10544661cc4478e7010709c86641ce190b31
MD5 cd4e12f973d1c5b28b7da6364ef004ed
BLAKE2b-256 d604d7515061a477b99a5c5ea93e3156981db0555950cd0a2ca0516efba22580

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page