PyTorch implementation of toxicity prediction models from SMILES.
Project description
Chemical Representation Learning for Toxicity Prediction
PyTorch implementation related to the paper Chemical Representation Learning for Toxicity Prediction (Born et al, 2023, Digital Discovery).
Inference
We released pretrained models for the Tox21, the ClinTox and the SIDER dataset.
Demo with UI
🤗 A gradio demo with a simple UI is available on HuggingFace spaces
Python API
The pretrained models are available via the GT4SD, the Generative Toolkit for Scientific Discovery. See the paper here. We recommend to use GT4SD for inference. Once you install that library, use as follows:
from gt4sd.properties import PropertyPredictorRegistry
tox21 = PropertyPredictorRegistry.get_property_predictor('tox21', {'algorithm_version': 'v0'})
tox21('CCO')
The other models are the SIDER model and the ClinTox model from the MoleculeNet benchmark:
from gt4sd.properties import PropertyPredictorRegistry
sider = PropertyPredictorRegistry.get_property_predictor('sider', {'algorithm_version': 'v0'})
clintox = PropertyPredictorRegistry.get_property_predictor('clintox', {'algorithm_version': 'v0'})
print(f"SIDE effect predictions: {sider('CCO')}")
print(f"Clinical toxicitiy predictions: {clintox('CCO')}")
Training your own model
Setup
The library itself has few dependencies (see setup.py) with loose requirements.
pip install -e .
Start a training
In the scripts
directory is a training script train_tox.
Download sample data from the Tox21 database and store it in a folder called data
here.
(toxsmi) $ python3 scripts/train_tox \
--train data/tox21_train.csv \
--test data/tox21_score.csv \
--smi data/tox21.smi \
--params params/mca.json \
--model path_to_model_folder \
--name debug
Features:
- Set
--finetune
to the path to a.pt
file to start from a pretrained model - Set
--embedding_path
to the path of pretrained embeddings
Type python scripts/train_tox -h
for further help.
Evaluate a model
In the scripts
directory is an evaluation script eval_tox.py.
Assume you have a trained model, use as follows:
(toxsmi) $ python3 scripts/eval_tox.py \
-model path_to_model_folder \
-smi data/tox21.smi \
-labels data/tox21_test.csv \
-checkpoint RMSE"
where -checkpoint
specifies which .pt
file to pick for the evaluation (based on substring matching)
Attention visualization
The model uses a self-attention mechanism that can highlight chemical motifs used for the predictions. In notebooks/toxicity_attention_plot.ipynb we share a tutorial on how to create such plots:
Citation
If you use this code in your projects, please cite the following:
@article{born2023chemical,
author = {Born, Jannis and Markert, Greta and Janakarajan, Nikita and Kimber, Talia B. and Volkamer, Andrea and Martínez, María Rodríguez and Manica, Matteo},
title = {Chemical representation learning for toxicity prediction},
journal = {Digital Discovery},
year = {2023},
pages = {-},
publisher = {RSC},
doi = {10.1039/D2DD00099G},
url = {http://dx.doi.org/10.1039/D2DD00099G}
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file toxsmi-1.0.0.tar.gz
.
File metadata
- Download URL: toxsmi-1.0.0.tar.gz
- Upload date:
- Size: 24.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.0 CPython/3.12.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 907a045c08311c15f49bc800b8bd62a8a1ab5d9b56d28e41a59409af67f0a9e7 |
|
MD5 | a2d45923f6ea0f69adf14a88cc9893fc |
|
BLAKE2b-256 | 5b966e34d2798b284873a38c6cf610301697cc67f1237babe382a084539d4eb4 |
File details
Details for the file toxsmi-1.0.0-py3-none-any.whl
.
File metadata
- Download URL: toxsmi-1.0.0-py3-none-any.whl
- Upload date:
- Size: 30.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.0 CPython/3.12.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 77718e55b0e5577df3ea884a4694099c385d31f2fea4319730a122269ca12707 |
|
MD5 | 3f105a3ddca6ea54f38a353832a2df12 |
|
BLAKE2b-256 | 54333238673c82145d77f7006fdd9a9c50e06226de3006144847edf611a7159a |