Skip to main content

Activity-cliff awareness (ACA) loss and ACANet

Project description

Activity Cliff Awareness

Dataset Paper Paper Codeocean PyPI version Downloads


Code repository for activity cliff-awareness (ACA) loss and graph-based ACANet model


About

1) ACALoss

This study proposes the activity-cliff-awareness (ACA) loss for improving molecular activity prediction by deep learning models. The ACA loss enhances both metric learning in the latent space and task learning in the target space during training, making the network aware of the activity-cliff issue. For more details, please refer to the paper titled "Online triplet contrastive learning enables efficient cliff awareness in molecular activity prediction."

**Comparison of models for molecular activity prediction, one without (left) and one with (right) activity cliff awareness (ACA).** The left panel depicts a model without ACA, where the presence of an activity cliff triplet (A, P, N) creates a challenge for the model to learn in the latent space. Due to the similarity between A and N in their chemical structures, the model learns graph representations that result in the distance between A-P being far greater than A-N, leading to poor training and prediction results. However, the right panel shows a model with ACA that optimizes the latent vectors in the latent space, making A closer to P and further away from N. The model with ACA combines metric learning in the latent space with minimizing the error for regression learning, while the model without ACA only focuses on the regression loss and may not effectively handle activity cliffs.

2) ACANet

ACANet is a deep learning model that developed based on the proposed ACALoss and graph neural network. It can tune the hyperparameters of ACALoss automatically, and provides a high-level interface of training and test in the deep learning model (Users can use it just like scikit-learn)

Model performance of with and without AC-Awareness

ACA loss vs. MAE loss on external test set and on No. of mined triplets during the training:

More details on usage and performance can be found here.

ACA loss implementation

ACA loss usage

#Pytorch
from clsar.model.loss import ACALoss
aca_loss = ACALoss(alpha=0.1, cliff_lower = 0.2, cliff_upper = 1.0, p = 1., squared = False)
loss = aca_loss(labels,  predictions, embeddings)
loss.backward()


#Tensorflow
from clsar.model.loss_tf import ACALoss

Installation

pip install clsar

Run ACANet

from clsar import ACANet
#Xs_train: the SMILES string of training set (1D Arrary)
#y_train_pIC50: the pChEMBL labels of training set (1D Arrary)

## init ACANet
clf = ACANet(gpuid = 0,   work_dir = './')

## get loss hyperparameters (cliff_lower, cliff_upper, and alpha) by training set 
dfp = clf.opt_cliff_by_cv(Xs_train, y_train_pIC50, total_epochs=50, n_repeats=3)
dfa = clf.opt_alpha_by_cv(Xs_train, y_train_pIC50, total_epochs=100, n_repeats=3)


## fit model using 5fold cross-validation
clf.cv_fit(Xs_train, y_train_pIC50, verbose=1)


## make prediction using the 5-submodels, the outputs are the average of the 5-submodels
test_pred_pIC50 = clf.cv_predict(Xs_test)

Citation

SHEN W, Cui C, Su X, Zhang Z, Velez-Arce A, Wang J, et al. Activity Cliff-Informed Contrastive Learning for Molecular Property Prediction. ChemRxiv. 2024; doi:10.26434/chemrxiv-2023-5cz7s-v2.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

clsar-1.2.tar.gz (35.7 kB view details)

Uploaded Source

Built Distribution

clsar-1.2-py3-none-any.whl (37.7 kB view details)

Uploaded Python 3

File details

Details for the file clsar-1.2.tar.gz.

File metadata

  • Download URL: clsar-1.2.tar.gz
  • Upload date:
  • Size: 35.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.4.2 requests/2.22.0 setuptools/45.2.0 requests-toolbelt/0.8.0 tqdm/4.30.0 CPython/3.8.10

File hashes

Hashes for clsar-1.2.tar.gz
Algorithm Hash digest
SHA256 cc52ed9989b9dd4ba924f7ad525f39508c85d944816d8f16e79baaa992bae457
MD5 229c9bb740739f7734a667efc0ed68f5
BLAKE2b-256 b79b737393ad60943e5548e3b0f72e82546c619f6949245b66433449cd4bc603

See more details on using hashes here.

File details

Details for the file clsar-1.2-py3-none-any.whl.

File metadata

  • Download URL: clsar-1.2-py3-none-any.whl
  • Upload date:
  • Size: 37.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.4.2 requests/2.22.0 setuptools/45.2.0 requests-toolbelt/0.8.0 tqdm/4.30.0 CPython/3.8.10

File hashes

Hashes for clsar-1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 318fac07b5831074931862077a365035ee4cc634a7d318c7ef201b6acd78fd03
MD5 97ccf034e97bd6eef2f842484284bdaf
BLAKE2b-256 5bdab615673d252a2de5a07044ef8090b578f3520e93fe4e84f03adad77c1c9e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page