Skip to main content

RRD: A Reactivity-Related bond/atom-wise Descriptors (RRD) package

Project description

Codeocean

CL-SAR

Contrastive learning of structure-activity relationship stduies (SAR)

Online triplet contrastive learning enables efficient cliff awareness in molecular activity prediction


About

$$\mathcal{L}_{aca}=\mathcal{L}_{mae}+{a\ast\mathcal{L}}_{tsm}$$

This study proposes the activity-cliff-awareness (ACA) loss for improving molecular activity prediction by deep learning models. The ACA loss enhances both metric learning in the latent space and task learning in the target space during training, making the network aware of the activity-cliff issue. For more details, please refer to the paper titled "Online triplet contrastive learning enables efficient cliff awareness in molecular activity prediction."

Comparison of models for molecular activity prediction, one without (left) and one with (right) activity cliff awareness (ACA). The left panel depicts a model without ACA, where the presence of an activity cliff triplet (A, P, N) creates a challenge for the model to learn in the latent space. Due to the similarity between A and N in their chemical structures, the model learns graph representations that result in the distance between A-P being far greater than A-N, leading to poor training and prediction results. However, the right panel shows a model with ACA that optimizes the latent vectors in the latent space, making A closer to P and further away from N. The model with ACA combines metric learning in the latent space with minimizing the error for regression learning, while the model without ACA only focuses on the regression loss and may not effectively handle activity cliffs.

Performance

ACA loss vs. MAE loss on external test set and on No. of mined triplets during the training:

More details on usage and performance can be found here.

ACA loss implementation

ACA loss usage

#Pytorch
from clsar.model.loss import ACALoss
aca_loss = ACALoss(alpha=0.1, cliff_lower = 0.2, cliff_upper = 1.0, p = 1., squared = False)
loss = aca_loss(labels,  predictions, embeddings)
loss.backward()


#Tensorflow
from clsar.model.loss_tf import ACALoss

Installation

pip install clsar

Run ACANet

from clsar import ACANet
#Xs_train: list of SMILES string of training set
#y_train_pIC50: the pChEMBL labels of training set

## init ACANet
clf = ACANet(gpuid = 0,   work_dir = './')

## get loss hyperparameters by training set 
dfp = clf.opt_cliff_by_cv(Xs_train, y_train_pIC50, total_epochs=50, n_repeats=3)
dfa = clf.opt_alpha_by_cv(Xs_train, y_train_pIC50, total_epochs=100, n_repeats=3)


## cross-validation fit
clf.cv_fit(Xs_train, y_train_pIC50, verbose=1)


## 5FCV predict and convert pIC50 to y
test_pred_pIC50 = clf.cv_predict(Xs_test)

Citation

Wan Xiang Shen*, Chao Cui*, Xiang Cheng Shi, et al. Online triplet contrastive learning enables efficient cliff awareness in molecular activity prediction [J]. ChemRxiv Pub Date : 2023-05-29 , DOI: 10.26434/chemrxiv-2023-5cz7s.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

clsar-1.0.0.tar.gz (22.8 kB view details)

Uploaded Source

Built Distributions

clsar-1.0.0-py3.9.egg (56.5 kB view details)

Uploaded Source

clsar-1.0.0-py3-none-any.whl (25.1 kB view details)

Uploaded Python 3

File details

Details for the file clsar-1.0.0.tar.gz.

File metadata

  • Download URL: clsar-1.0.0.tar.gz
  • Upload date:
  • Size: 22.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.7

File hashes

Hashes for clsar-1.0.0.tar.gz
Algorithm Hash digest
SHA256 6e9b57f5c614617109a76da82d653b08f552f31a847c834b82968cb6dd5d073f
MD5 7472465a54ef626fec77b2ac90e2e0f8
BLAKE2b-256 fabe5c1d77fcf782f2f1934a76acf108c9c5b2b7fa9a07d70bf9ce64b757b19b

See more details on using hashes here.

File details

Details for the file clsar-1.0.0-py3.9.egg.

File metadata

  • Download URL: clsar-1.0.0-py3.9.egg
  • Upload date:
  • Size: 56.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.7

File hashes

Hashes for clsar-1.0.0-py3.9.egg
Algorithm Hash digest
SHA256 aaeaa794846e4e591fc5b4a1f88b6864aeb5ae4c6490f77a3eead6100f581180
MD5 45dee7304bad4bf547d209a6492b7082
BLAKE2b-256 d18b0dea353b6d36e71d9e689e32678165e52afced1ac91f871fa63ee07d9186

See more details on using hashes here.

File details

Details for the file clsar-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: clsar-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 25.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.7

File hashes

Hashes for clsar-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 ed1908616f6037502df8f12150400200e66489e4f070c9242f2b4578cd0cb48d
MD5 2cada84c9889fe35aa0a3a3339af8933
BLAKE2b-256 ee3ab5a8452443518d6ff188eb70e2f92a8dd4106fd8745968cd8b3dc9225918

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page