RRD: A Reactivity-Related bond/atom-wise Descriptors (RRD) package
Project description
CL-SAR
Contrastive learning of structure-activity relationship stduies (SAR)
Online triplet contrastive learning enables efficient cliff awareness in molecular activity prediction
About
$$\mathcal{L}_{aca}=\mathcal{L}_{mae}+{a\ast\mathcal{L}}_{tsm}$$
This study proposes the activity-cliff-awareness (ACA) loss for improving molecular activity prediction by deep learning models. The ACA loss enhances both metric learning in the latent space and task learning in the target space during training, making the network aware of the activity-cliff issue. For more details, please refer to the paper titled "Online triplet contrastive learning enables efficient cliff awareness in molecular activity prediction."
Comparison of models for molecular activity prediction, one without (left) and one with (right) activity cliff awareness (ACA). The left panel depicts a model without ACA, where the presence of an activity cliff triplet (A, P, N) creates a challenge for the model to learn in the latent space. Due to the similarity between A and N in their chemical structures, the model learns graph representations that result in the distance between A-P being far greater than A-N, leading to poor training and prediction results. However, the right panel shows a model with ACA that optimizes the latent vectors in the latent space, making A closer to P and further away from N. The model with ACA combines metric learning in the latent space with minimizing the error for regression learning, while the model without ACA only focuses on the regression loss and may not effectively handle activity cliffs.
Performance
ACA loss vs. MAE loss on external test set and on No. of mined triplets during the training:
More details on usage and performance can be found here.
ACA loss implementation
ACA loss usage
#Pytorch
from clsar.model.loss import ACALoss
aca_loss = ACALoss(alpha=0.1, cliff_lower = 0.2, cliff_upper = 1.0, p = 1., squared = False)
loss = aca_loss(labels, predictions, embeddings)
loss.backward()
#Tensorflow
from clsar.model.loss_tf import ACALoss
Installation
pip install clsar
Run ACANet
from clsar import ACANet
#Xs_train: list of SMILES string of training set
#y_train_pIC50: the pChEMBL labels of training set
## init ACANet
clf = ACANet(gpuid = 0, work_dir = './')
## get loss hyperparameters by training set
dfp = clf.opt_cliff_by_cv(Xs_train, y_train_pIC50, total_epochs=50, n_repeats=3)
dfa = clf.opt_alpha_by_cv(Xs_train, y_train_pIC50, total_epochs=100, n_repeats=3)
## cross-validation fit
clf.cv_fit(Xs_train, y_train_pIC50, verbose=1)
## 5FCV predict and convert pIC50 to y
test_pred_pIC50 = clf.cv_predict(Xs_test)
Citation
Wan Xiang Shen*, Chao Cui*, Xiang Cheng Shi, et al. Online triplet contrastive learning enables efficient cliff awareness in molecular activity prediction
[J]. ChemRxiv Pub Date : 2023-05-29 , DOI: 10.26434/chemrxiv-2023-5cz7s.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
File details
Details for the file clsar-1.0.0.tar.gz
.
File metadata
- Download URL: clsar-1.0.0.tar.gz
- Upload date:
- Size: 22.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6e9b57f5c614617109a76da82d653b08f552f31a847c834b82968cb6dd5d073f |
|
MD5 | 7472465a54ef626fec77b2ac90e2e0f8 |
|
BLAKE2b-256 | fabe5c1d77fcf782f2f1934a76acf108c9c5b2b7fa9a07d70bf9ce64b757b19b |
File details
Details for the file clsar-1.0.0-py3.9.egg
.
File metadata
- Download URL: clsar-1.0.0-py3.9.egg
- Upload date:
- Size: 56.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | aaeaa794846e4e591fc5b4a1f88b6864aeb5ae4c6490f77a3eead6100f581180 |
|
MD5 | 45dee7304bad4bf547d209a6492b7082 |
|
BLAKE2b-256 | d18b0dea353b6d36e71d9e689e32678165e52afced1ac91f871fa63ee07d9186 |
File details
Details for the file clsar-1.0.0-py3-none-any.whl
.
File metadata
- Download URL: clsar-1.0.0-py3-none-any.whl
- Upload date:
- Size: 25.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | ed1908616f6037502df8f12150400200e66489e4f070c9242f2b4578cd0cb48d |
|
MD5 | 2cada84c9889fe35aa0a3a3339af8933 |
|
BLAKE2b-256 | ee3ab5a8452443518d6ff188eb70e2f92a8dd4106fd8745968cd8b3dc9225918 |