Skip to main content

MolMap: An Efficient Convolutional Neural Network Based Molecular Deep Learning Tool

Project description

License: MIT Documentation Status Build Status DOI Codeocean Paper PyPI version

MolMap

MolMap is generated by the following steps:

  • Step1: Input structures
  • Step2: Feature extraction
  • Step3: Feature pairwise distance calculation --> cosine, correlation, jaccard
  • Step4: Feature 2D embedding --> umap, tsne, mds
  • Step5: Feature grid arrangement --> grid, scatter
  • Step5: Transform --> minmax, standard

MolMap Fmaps for compounds

fmap_dynamicly

Construction of the MolMap Objects


molmap

The MolMapNet Architecture


net

Installation


  1. install rdkit and tamp first(create a molmap env):
conda create -c conda-forge -n molmap rdkit python=3.7
conda activate molmap
conda install -c tmap tmap
pip install molmap
  1. ChemBench (optional, if you wish to use the dataset and the split induces in this paper).

  2. If you have gcc problems when you install molmap, please installing g++ first:

sudo apt-get install g++

Out-of-the-Box Usage


code

import molmap
# Define your molmap
mp_name = './descriptor.mp'
mp = molmap.MolMap(ftype = 'descriptor', fmap_type = 'grid',
                   split_channels = True,   metric='cosine', var_thr=1e-4)
# Fit your molmap
mp.fit(method = 'umap', verbose = 2)
mp.save(mp_name) 
# Visulization of your molmap
mp.plot_scatter()
mp.plot_grid()
# Batch transform 
from molmap import dataset
data = dataset.load_ESOL()
smiles_list = data.x # list of smiles strings
X = mp.batch_transform(smiles_list,  scale = True, 
                       scale_method = 'minmax', n_jobs=8)
Y = data.y 
print(X.shape)
# Train on your data and test on the external test set
from molmap.model import RegressionEstimator
from sklearn.utils import shuffle 
import numpy as np
import pandas as pd
def Rdsplit(df, random_state = 888, split_size = [0.8, 0.1, 0.1]):
    base_indices = np.arange(len(df)) 
    base_indices = shuffle(base_indices, random_state = random_state) 
    nb_test = int(len(base_indices) * split_size[2]) 
    nb_val = int(len(base_indices) * split_size[1]) 
    test_idx = base_indices[0:nb_test] 
    valid_idx = base_indices[(nb_test):(nb_test+nb_val)] 
    train_idx = base_indices[(nb_test+nb_val):len(base_indices)] 
    print(len(train_idx), len(valid_idx), len(test_idx)) 
    return train_idx, valid_idx, test_idx 
# split your data
train_idx, valid_idx, test_idx = Rdsplit(data.x, random_state = 888)
trainX = X[train_idx]
trainY = Y[train_idx]
validX = X[valid_idx]
validY = Y[valid_idx]
testX = X[test_idx]
testY = Y[test_idx]

# fit your model
clf = RegressionEstimator(n_outputs=trainY.shape[1], 
                          fmap_shape1 = trainX.shape[1:], 
                          dense_layers = [128, 64], gpuid = 0) 
clf.fit(trainX, trainY, validX, validY)

# make prediction
testY_pred = clf.predict(testX)
rmse, r2 = clf._performance.evaluate(testX, testY)
print(rmse, r2)

Out-of-the-Box Performances


Dataset Task Metric MoleculeNet (GCN Best Model) Chemprop (D-MPNN model) MolMapNet (MMNB model)
ESOL RMSE 0.580 (MPNN) 0.555 0.575
FreeSolv RMSE 1.150 (MPNN) 1.075 1.155
Lipop RMSE 0.655 (GC) 0.555 0.625
PDBbind-F RMSE 1.440 (GC) 1.391 0.721
PDBbind-C RMSE 1.920 (GC) 2.173 0.931
PDBbind-R RMSE 1.650 (GC) 1.486 0.889
BACE ROC_AUC 0.806 (Weave) N.A. 0.849
HIV ROC_AUC 0.763 (GC) 0.776 0.777
PCBA PRC_AUC 0.136 (GC) 0.335 0.276
MUV PRC_AUC 0.109 (Weave) 0.041 0.096
ChEMBL ROC_AUC N.A. 0.739 0.750
Tox21 ROC_AUC 0.829 (GC) 0.851 0.845
SIDER ROC_AUC 0.638 (GC) 0.676 0.68
ClinTox ROC_AUC 0.832 (GC) 0.864 0.888
BBBP ROC_AUC 0.690 (Weave) 0.738 0.739

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

molmap-1.3.9.8.tar.gz (8.4 MB view details)

Uploaded Source

Built Distribution

molmap-1.3.9.8-py3-none-any.whl (1.5 MB view details)

Uploaded Python 3

File details

Details for the file molmap-1.3.9.8.tar.gz.

File metadata

  • Download URL: molmap-1.3.9.8.tar.gz
  • Upload date:
  • Size: 8.4 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.4.2 requests/2.27.1 setuptools/45.2.0 requests-toolbelt/0.8.0 tqdm/4.30.0 CPython/3.8.10

File hashes

Hashes for molmap-1.3.9.8.tar.gz
Algorithm Hash digest
SHA256 b64e47e074344fcb92801b5e751bfd58cd44d839a9550e4b37d940b12b1ba3af
MD5 ee066ffe7f05c82fd19f94282cc0ec8e
BLAKE2b-256 da3981a1138d919b0d50ce7e7c8d02140195fe631c2452c31a1420355eca2c44

See more details on using hashes here.

File details

Details for the file molmap-1.3.9.8-py3-none-any.whl.

File metadata

  • Download URL: molmap-1.3.9.8-py3-none-any.whl
  • Upload date:
  • Size: 1.5 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.4.2 requests/2.27.1 setuptools/45.2.0 requests-toolbelt/0.8.0 tqdm/4.30.0 CPython/3.8.10

File hashes

Hashes for molmap-1.3.9.8-py3-none-any.whl
Algorithm Hash digest
SHA256 7bbb9d8eb34bcf8898e94e76fe57313ce03af960ade374a43cc277c7dc16b39a
MD5 5988b0db8f95d73fc28a8d480440c82c
BLAKE2b-256 2191c746616d6132ce9c87404460c559c07d6610a9870ea2f7045a60f639d1c3

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page