Skip to main content

MolMap: An Efficient Convolutional Neural Network Based Molecular Deep Learning Tool

Project description

License: MIT DOI Codeocean Paper PyPI version Downloads

MolMap

MolMap is generated by the following steps:

  • Step1: Input structures
  • Step2: Feature extraction
  • Step3: Feature pairwise distance calculation --> cosine, correlation, jaccard
  • Step4: Feature 2D embedding --> umap, tsne, mds
  • Step5: Feature grid arrangement --> grid, scatter
  • Step5: Transform --> minmax, standard

MolMap Fmaps for compounds

fmap_dynamicly

Construction of the MolMap Objects


molmap

The MolMapNet Architecture


net

Installation


  1. install rdkit and tamp first(create a molmap env):
micromamba create -c conda-forge -n molmap rdkit python=3.12
micromamba activate molmap


pip install molmap

GPU support: After installation, run pip install "tensorflow[and-cuda]" to enable GPU acceleration.

  1. ChemBench (optional, if you wish to use the dataset and the split induces in this paper).

  2. If you have gcc problems when you install molmap, please installing g++ first:

sudo apt-get install g++

Out-of-the-Box Usage


code

import molmap
# Define your molmap
mp_name = './descriptor.mp'
mp = molmap.MolMap(ftype = 'descriptor', fmap_type = 'grid',
                   split_channels = True,   metric='cosine', var_thr=1e-4)
# Fit your molmap
mp.fit(method = 'umap', verbose = 2)
mp.save(mp_name) 
# Visulization of your molmap
mp.plot_scatter()
mp.plot_grid()
# Batch transform 
from molmap import dataset
data = dataset.load_ESOL()
smiles_list = data.x # list of smiles strings
X = mp.batch_transform(smiles_list,  scale = True, 
                       scale_method = 'minmax', n_jobs=8)
Y = data.y 
print(X.shape)
# Train on your data and test on the external test set
from molmap.model import RegressionEstimator
from sklearn.utils import shuffle 
import numpy as np
import pandas as pd
def Rdsplit(df, random_state = 888, split_size = [0.8, 0.1, 0.1]):
    base_indices = np.arange(len(df)) 
    base_indices = shuffle(base_indices, random_state = random_state) 
    nb_test = int(len(base_indices) * split_size[2]) 
    nb_val = int(len(base_indices) * split_size[1]) 
    test_idx = base_indices[0:nb_test] 
    valid_idx = base_indices[(nb_test):(nb_test+nb_val)] 
    train_idx = base_indices[(nb_test+nb_val):len(base_indices)] 
    print(len(train_idx), len(valid_idx), len(test_idx)) 
    return train_idx, valid_idx, test_idx 
# split your data
train_idx, valid_idx, test_idx = Rdsplit(data.x, random_state = 888)
trainX = X[train_idx]
trainY = Y[train_idx]
validX = X[valid_idx]
validY = Y[valid_idx]
testX = X[test_idx]
testY = Y[test_idx]

# fit your model
clf = RegressionEstimator(n_outputs=trainY.shape[1], 
                          fmap_shape1 = trainX.shape[1:], 
                          dense_layers = [128, 64], gpuid = 0) 
clf.fit(trainX, trainY, validX, validY)

# make prediction
testY_pred = clf.predict(testX)
rmse, r2 = clf._performance.evaluate(testX, testY)
print(rmse, r2)

#Documentation Status #Build Status

Out-of-the-Box Performances


Dataset Task Metric MoleculeNet (GCN Best Model) Chemprop (D-MPNN model) MolMapNet (MMNB model)
ESOL RMSE 0.580 (MPNN) 0.555 0.575
FreeSolv RMSE 1.150 (MPNN) 1.075 1.155
Lipop RMSE 0.655 (GC) 0.555 0.625
PDBbind-F RMSE 1.440 (GC) 1.391 0.721
PDBbind-C RMSE 1.920 (GC) 2.173 0.931
PDBbind-R RMSE 1.650 (GC) 1.486 0.889
BACE ROC_AUC 0.806 (Weave) N.A. 0.849
HIV ROC_AUC 0.763 (GC) 0.776 0.777
PCBA PRC_AUC 0.136 (GC) 0.335 0.276
MUV PRC_AUC 0.109 (Weave) 0.041 0.096
ChEMBL ROC_AUC N.A. 0.739 0.750
Tox21 ROC_AUC 0.829 (GC) 0.851 0.845
SIDER ROC_AUC 0.638 (GC) 0.676 0.68
ClinTox ROC_AUC 0.832 (GC) 0.864 0.888
BBBP ROC_AUC 0.690 (Weave) 0.738 0.739

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

molmap-4.0.tar.gz (8.4 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

molmap-4.0-py3-none-any.whl (1.5 MB view details)

Uploaded Python 3

File details

Details for the file molmap-4.0.tar.gz.

File metadata

  • Download URL: molmap-4.0.tar.gz
  • Upload date:
  • Size: 8.4 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for molmap-4.0.tar.gz
Algorithm Hash digest
SHA256 3cf1caf00879fac55e912e0132ed1d586f77e5d8da21aae0a58c66bcfe5078f0
MD5 2f0680ac82ff3337e34e1d0cfd0a9aa7
BLAKE2b-256 185493fe8a67e2ae89375d7d6ee0958d8b3c18f2d84086ef509cc362df8127ea

See more details on using hashes here.

File details

Details for the file molmap-4.0-py3-none-any.whl.

File metadata

  • Download URL: molmap-4.0-py3-none-any.whl
  • Upload date:
  • Size: 1.5 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for molmap-4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 1d00aa1c69bdd41f809517726250f893c0571f8394477b6031af1e812dc4656c
MD5 722c5bfb74bdb0c9267ffa099f2d13e0
BLAKE2b-256 fd7e9873fdf4065fa50a8ac0c30c9e89c5b50c21f54a9f6bdf36acb8d6b0f173

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page