Skip to main content

Neural networks for MolMap generated features

Project description

molmapnets

Neural networks for regression and classification for molecular data, using MolMap generated features.

#all_slow

This package implements the neural network architects originally presented in the MolMap package, with two important differences:

  • The package is written using literate programming so all functionalities are written and tested in Jupyter notebooks, and the implementation, testing, and documentation are done together at the same time. You can find the documentation on the package website.
  • The models are implemented in PyTorch.

Install

First you need to install MolMap and ChemBench (you can find the detailed installation guide here), then simply

pip install molmapnets

How to use the package

We need ChemBench for the datasets, MolMap for feature extraction, and finally molmapnets for the neural networks.

from chembench import dataset
from molmap import MolMap
RDKit WARNING: [13:50:43] Enabling RDKit 2019.09.3 jupyter extensions
from molmapnets.data import SingleFeatureData, DoubleFeatureData
from molmapnets.models import MolMapRegression

And for model training we also need torch

import torch
from torch import nn, optim
from torch.utils.data import Dataset, DataLoader, random_split
torch.set_default_dtype(torch.float64)

Load and process data, using the eSOL dataset here for regression

data = dataset.load_ESOL()
total samples: 1128
descriptor = MolMap(ftype='descriptor', metric='cosine',)
descriptor.fit(verbose=0, method='umap', min_dist=0.1, n_neighbors=15,)
2021-07-23 13:50:53,798 - INFO - [bidd-molmap] - Applying grid feature map(assignment), this may take several minutes(1~30 min)
2021-07-23 13:50:56,904 - INFO - [bidd-molmap] - Finished

feature extraction

X = descriptor.batch_transform(data.x)
100%|##########| 1128/1128 [06:08<00:00,  2.78it/s]

Prepare data for model training

esol = SingleFeatureData(data.y, X)

Train, validation, and test split

train, val, test = random_split(esol, [904,112,112], generator=torch.Generator().manual_seed(7))

Batch data loader

train_loader = DataLoader(train, batch_size=8, shuffle=True)
val_loader = DataLoader(val, batch_size=8, shuffle=True)
test_loader = DataLoader(test, batch_size=8, shuffle=True)

Initialise model

model = MolMapRegression()

epochs = 5
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
model.to(device)
optimizer = optim.Adam(model.parameters(), lr=0.001)
criterion = nn.MSELoss()

Train model. The users are encouraged to tweak the training loop to achieve better performance

for epoch in range(epochs):

    running_loss = 0.0
    for i, (xb, yb) in enumerate(train_loader):

        xb, yb = xb.to(device), yb.to(device)

        # zero gradients
        optimizer.zero_grad()

        # forward propagation
        pred = model(xb)

        # loss calculation
        loss = criterion(pred, yb)
        loss.backward()
        optimizer.step()

        # print statistics
        running_loss += loss.item()

    print('[Epoch: %2d] Training loss: %.3f' %
          (epoch + 1, running_loss / (i+1)))

print('Training finished')
/Users/olivier/opt/anaconda3/envs/molmap/lib/python3.6/site-packages/torch/nn/functional.py:718: UserWarning: Named tensors and all their associated APIs are an experimental feature and subject to change. Please do not use them for anything important until they are released as stable. (Triggered internally at  ../c10/core/TensorImpl.h:1156.)
  return torch.max_pool2d(input, kernel_size, stride, padding, dilation, ceil_mode)


[Epoch:  1] Training loss: 4.530
[Epoch:  2] Training loss: 1.803
[Epoch:  3] Training loss: 1.541
[Epoch:  4] Training loss: 1.209
[Epoch:  5] Training loss: 1.092
Training finished

Please refer to the package documentation for more detailed usage.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

molmapnets-0.0.1.tar.gz (14.1 kB view hashes)

Uploaded Source

Built Distribution

molmapnets-0.0.1-py3-none-any.whl (11.0 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page