Skip to main content

MolMap: An Efficient Convolutional Neural Network Based Molecular Deep Learning Tool

Project description

License: MIT DOI Codeocean Paper PyPI version Downloads

MolMap

MolMap is generated by the following steps:

  • Step1: Input structures
  • Step2: Feature extraction
  • Step3: Feature pairwise distance calculation --> cosine, correlation, jaccard
  • Step4: Feature 2D embedding --> umap, tsne, mds
  • Step5: Feature grid arrangement --> grid, scatter
  • Step5: Transform --> minmax, standard

MolMap Fmaps for compounds

fmap_dynamicly

Construction of the MolMap Objects


molmap

The MolMapNet Architecture


net

Installation


  1. install rdkit and tamp first(create a molmap env):
conda create -c conda-forge -n molmap rdkit python=3.7
conda activate molmap
conda install -c tmap tmap
pip install molmap
  1. ChemBench (optional, if you wish to use the dataset and the split induces in this paper).

  2. If you have gcc problems when you install molmap, please installing g++ first:

sudo apt-get install g++

Out-of-the-Box Usage


code

import molmap
# Define your molmap
mp_name = './descriptor.mp'
mp = molmap.MolMap(ftype = 'descriptor', fmap_type = 'grid',
                   split_channels = True,   metric='cosine', var_thr=1e-4)
# Fit your molmap
mp.fit(method = 'umap', verbose = 2)
mp.save(mp_name) 
# Visulization of your molmap
mp.plot_scatter()
mp.plot_grid()
# Batch transform 
from molmap import dataset
data = dataset.load_ESOL()
smiles_list = data.x # list of smiles strings
X = mp.batch_transform(smiles_list,  scale = True, 
                       scale_method = 'minmax', n_jobs=8)
Y = data.y 
print(X.shape)
# Train on your data and test on the external test set
from molmap.model import RegressionEstimator
from sklearn.utils import shuffle 
import numpy as np
import pandas as pd
def Rdsplit(df, random_state = 888, split_size = [0.8, 0.1, 0.1]):
    base_indices = np.arange(len(df)) 
    base_indices = shuffle(base_indices, random_state = random_state) 
    nb_test = int(len(base_indices) * split_size[2]) 
    nb_val = int(len(base_indices) * split_size[1]) 
    test_idx = base_indices[0:nb_test] 
    valid_idx = base_indices[(nb_test):(nb_test+nb_val)] 
    train_idx = base_indices[(nb_test+nb_val):len(base_indices)] 
    print(len(train_idx), len(valid_idx), len(test_idx)) 
    return train_idx, valid_idx, test_idx 
# split your data
train_idx, valid_idx, test_idx = Rdsplit(data.x, random_state = 888)
trainX = X[train_idx]
trainY = Y[train_idx]
validX = X[valid_idx]
validY = Y[valid_idx]
testX = X[test_idx]
testY = Y[test_idx]

# fit your model
clf = RegressionEstimator(n_outputs=trainY.shape[1], 
                          fmap_shape1 = trainX.shape[1:], 
                          dense_layers = [128, 64], gpuid = 0) 
clf.fit(trainX, trainY, validX, validY)

# make prediction
testY_pred = clf.predict(testX)
rmse, r2 = clf._performance.evaluate(testX, testY)
print(rmse, r2)

#Documentation Status #Build Status

Out-of-the-Box Performances


Dataset Task Metric MoleculeNet (GCN Best Model) Chemprop (D-MPNN model) MolMapNet (MMNB model)
ESOL RMSE 0.580 (MPNN) 0.555 0.575
FreeSolv RMSE 1.150 (MPNN) 1.075 1.155
Lipop RMSE 0.655 (GC) 0.555 0.625
PDBbind-F RMSE 1.440 (GC) 1.391 0.721
PDBbind-C RMSE 1.920 (GC) 2.173 0.931
PDBbind-R RMSE 1.650 (GC) 1.486 0.889
BACE ROC_AUC 0.806 (Weave) N.A. 0.849
HIV ROC_AUC 0.763 (GC) 0.776 0.777
PCBA PRC_AUC 0.136 (GC) 0.335 0.276
MUV PRC_AUC 0.109 (Weave) 0.041 0.096
ChEMBL ROC_AUC N.A. 0.739 0.750
Tox21 ROC_AUC 0.829 (GC) 0.851 0.845
SIDER ROC_AUC 0.638 (GC) 0.676 0.68
ClinTox ROC_AUC 0.832 (GC) 0.864 0.888
BBBP ROC_AUC 0.690 (Weave) 0.738 0.739

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

molmap-1.4.0.tar.gz (8.4 MB view details)

Uploaded Source

Built Distribution

molmap-1.4.0-py3-none-any.whl (1.5 MB view details)

Uploaded Python 3

File details

Details for the file molmap-1.4.0.tar.gz.

File metadata

  • Download URL: molmap-1.4.0.tar.gz
  • Upload date:
  • Size: 8.4 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.4.2 requests/2.22.0 setuptools/45.2.0 requests-toolbelt/0.8.0 tqdm/4.30.0 CPython/3.8.10

File hashes

Hashes for molmap-1.4.0.tar.gz
Algorithm Hash digest
SHA256 36beab25e954d46468c6f1a05acda7c20107602367c6224ed4337fe4915fbb8c
MD5 e2435f8e0d3f45763535082501953895
BLAKE2b-256 b8a4bfe2ff975c29bf3eba089be114d55ec9ebc2dca8ed23d977b34921843353

See more details on using hashes here.

File details

Details for the file molmap-1.4.0-py3-none-any.whl.

File metadata

  • Download URL: molmap-1.4.0-py3-none-any.whl
  • Upload date:
  • Size: 1.5 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.4.2 requests/2.22.0 setuptools/45.2.0 requests-toolbelt/0.8.0 tqdm/4.30.0 CPython/3.8.10

File hashes

Hashes for molmap-1.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 a44da12a6897363e9c82eed0a32d0ba90f79bcb604be8cffbafe310c3d06ffc6
MD5 2baf262d05753f3a593137e5e46a7525
BLAKE2b-256 fffd68ad415deb29dd13f59b0091e0a67f96d50d2e0e205abfa5e62b5c4ec857

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page