Skip to main content

PyTorch implementation of toxicity prediction models from SMILES.

Project description

Build Status License: MIT Code style: black Gradio demo

Chemical Representation Learning for Toxicity Prediction

PyTorch implementation related to the paper Chemical Representation Learning for Toxicity Prediction (Born et al, 2023, Digital Discovery).

Inference

We released pretrained models for the Tox21, the ClinTox and the SIDER dataset.

Demo with UI

🤗 A gradio demo with a simple UI is available on HuggingFace spaces Summary

Python API

The pretrained models are available via the GT4SD, the Generative Toolkit for Scientific Discovery. See the paper here. We recommend to use GT4SD for inference. Once you install that library, use as follows:

from gt4sd.properties import PropertyPredictorRegistry
tox21 = PropertyPredictorRegistry.get_property_predictor('tox21', {'algorithm_version': 'v0'})
tox21('CCO')

The other models are the SIDER model and the ClinTox model from the MoleculeNet benchmark:

from gt4sd.properties import PropertyPredictorRegistry
sider = PropertyPredictorRegistry.get_property_predictor('sider', {'algorithm_version': 'v0'})
clintox = PropertyPredictorRegistry.get_property_predictor('clintox', {'algorithm_version': 'v0'})
print(f"SIDE effect predictions: {sider('CCO')}")
print(f"Clinical toxicitiy predictions: {clintox('CCO')}")

Training your own model

Setup

The library itself has few dependencies (see setup.py) with loose requirements.

pip install -e .

Start a training

In the scripts directory is a training script train_tox.

Download sample data from the Tox21 database and store it in a folder called data here.

(toxsmi) $ python3 scripts/train_tox \
--train data/tox21_train.csv \
--test data/tox21_score.csv \
--smi data/tox21.smi \
--params params/mca.json \
--model path_to_model_folder \
--name debug

Features:

  • Set --finetune to the path to a .pt file to start from a pretrained model
  • Set --embedding_path to the path of pretrained embeddings

Type python scripts/train_tox -h for further help.

Evaluate a model

In the scripts directory is an evaluation script eval_tox.py. Assume you have a trained model, use as follows:

(toxsmi) $ python3 scripts/eval_tox.py \
-model path_to_model_folder \
-smi data/tox21.smi \
-labels data/tox21_test.csv \
-checkpoint RMSE"

where -checkpoint specifies which .pt file to pick for the evaluation (based on substring matching)

Attention visualization

The model uses a self-attention mechanism that can highlight chemical motifs used for the predictions. In notebooks/toxicity_attention_plot.ipynb we share a tutorial on how to create such plots: Attention

Citation

If you use this code in your projects, please cite the following:

@article{born2023chemical,
    author = {Born, Jannis and Markert, Greta and Janakarajan, Nikita and Kimber, Talia B. and Volkamer, Andrea and Martínez, María Rodríguez and Manica, Matteo},
    title = {Chemical representation learning for toxicity prediction},
    journal = {Digital Discovery},
    year = {2023},
    pages = {-},
    publisher = {RSC},
    doi = {10.1039/D2DD00099G},
    url = {http://dx.doi.org/10.1039/D2DD00099G}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

toxsmi-1.0.0.tar.gz (24.8 kB view details)

Uploaded Source

Built Distribution

toxsmi-1.0.0-py3-none-any.whl (30.8 kB view details)

Uploaded Python 3

File details

Details for the file toxsmi-1.0.0.tar.gz.

File metadata

  • Download URL: toxsmi-1.0.0.tar.gz
  • Upload date:
  • Size: 24.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.0 CPython/3.12.3

File hashes

Hashes for toxsmi-1.0.0.tar.gz
Algorithm Hash digest
SHA256 907a045c08311c15f49bc800b8bd62a8a1ab5d9b56d28e41a59409af67f0a9e7
MD5 a2d45923f6ea0f69adf14a88cc9893fc
BLAKE2b-256 5b966e34d2798b284873a38c6cf610301697cc67f1237babe382a084539d4eb4

See more details on using hashes here.

File details

Details for the file toxsmi-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: toxsmi-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 30.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.0 CPython/3.12.3

File hashes

Hashes for toxsmi-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 77718e55b0e5577df3ea884a4694099c385d31f2fea4319730a122269ca12707
MD5 3f105a3ddca6ea54f38a353832a2df12
BLAKE2b-256 54333238673c82145d77f7006fdd9a9c50e06226de3006144847edf611a7159a

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page