Skip to main content

Molecular fingerprinting using pre-trained deep nets

Project description

Minimol architecture

A parameter-efficient molecular featuriser that generalises well to biological tasks thanks to the effective pre-training on biological and quantum mechnical datasets.

The model has been introduced in the paper 𝙼𝚒𝚗𝚒𝙼𝚘𝚕: A Parameter-Efficient Foundation Model for Molecular Learning, published in the ICML workshop on Accessible and Efficient Foundation Models for Biological Discovery in 2024.

Usage

Embeddings can be generated in four lines of code:

from minimol import Minimol
model = Minimol()
smiles = [
    'COc1ccc2cc(C(=O)NC3(C(=O)N[C@H](Cc4ccccc4)C(=O)NCC4CCN(CC5CCOCC5)CC4)CCCC3)sc2c1',
    'Nc1nc(=O)c2c([nH]1)NCC(CNc1ccc(C(=O)NC(CCC(=O)O)C(=O)O)cc1)N2C=O',
    'O=C1CCCN1CCCCN1CCN(c2cc(C(F)(F)F)ccn2)CC1',
    'c1ccc(-c2cccnc2)cc1',
]
model(smiles)
>> A list of 4 tensors of (512,) shape

For a Colab notebook showing how to use Minimol's fingerprints to achieve SoTA results on a downstream task, click here: Open In Colab

Installation

Pip

When used with cuda, use nvcc --version to see which version of the driver is installed on your machine, to select the wheel (cuXXX):

pip install torch-sparse torch-cluster torch-scatter -f https://pytorch-geometric.com/whl/torch-2.3.0+cu124.html
pip install minimol

Local

git clone git@github.com:graphcore-research/minimol.git 
cd minimol
mamba env create -f env.yml -n minimol_venv
mamba activate minimol

To install mamba see the official documentation.

Performance

The model has been evaluated on 22 benchmarks from the ADMET group of Therapeutics Data Commons (TDC). These are the results when comparing to MolE and TOP5 models from the TDC leaderboard (as of June 2024):

TDC Dataset TDC Leaderboard MolE MiniMol (GINE)
Name Size Metric SoTA Result Result Rank Result Rank
Absorption
Caco2 Wang 906 MAE 0.276 ± 0.005 0.310 ± 0.010 6 0.350 ± 0.018 7
Bioavailability Ma 640 AUROC 0.748 ± 0.033 0.654 ± 0.028 7 0.689 ± 0.020 5
Lipophilicity AZ 4,200 MAE 0.467 ± 0.006 0.469 ± 0.009 3 0.456 ± 0.008 1
Solubility AqSolDB 9,982 MAE 0.761 ± 0.025 0.792 ± 0.005 5 0.741 ± 0.013 1
HIA Hou 578 AUROC 0.989 ± 0.001 0.963 ± 0.019 7 0.993 ± 0.005 1
Pgp Broccatelli 1,212 AUROC 0.938 ± 0.002 0.915 ± 0.005 7 0.942 ± 0.002 1
Distribution
BBB Martins 1,975 AUROC 0.916 ± 0.001 0.903 ± 0.005 7 0.924 ± 0.003 1
PPBR AZ 1,797 MAE 7.526 ± 0.106 8.073 ± 0.335 6 7.696 ± 0.125 4
VDss Lombardo 1,130 Spearman 0.713 ± 0.007 0.654 ± 0.031 3 0.535 ± 0.027 7
Metabolism
CYP2C9 Veith 12,092 AUPRC 0.859 ± 0.001 0.801 ± 0.003 5 0.823 ± 0.006 4
CYP2D6 Veith 13,130 AUPRC 0.790 ± 0.001 0.682 ± 0.008 6 0.719 ± 0.004 5
CYP3A4 Veith 12,328 AUPRC 0.916 ± 0.000 0.867 ± 0.003 7 0.877 ± 0.001 4
CYP2C9 Substrate 666 AUPRC 0.441 ± 0.033 0.446 ± 0.062 2 0.474 ± 0.025 1
CYP2D6 Substrate 664 AUPRC 0.736 ± 0.024 0.699 ± 0.018 7 0.695 ± 0.032 6
CYP3A4 Substrate 667 AUROC 0.662 ± 0.031 0.670 ± 0.018 1 0.663 ± 0.008 2
Excretion
Half Life Obach 667 Spearman 0.562 ± 0.008 0.549 ± 0.024 4 0.495 ± 0.042 6
Clearance Hepatocyte 1,102 Spearman 0.498 ± 0.009 0.381 ± 0.038 7 0.446 ± 0.029 3
Clearance Microsome 1,020 Spearman 0.630 ± 0.010 0.607 ± 0.027 6 0.628 ± 0.005 2
Toxicity
LD50 Zhu 7,385 MAE 0.552 ± 0.009 0.823 ± 0.019 7 0.585 ± 0.005 2
hERG 648 AUROC 0.880 ± 0.002 0.813 ± 0.009 7 0.846 ± 0.016 4
Ames 7,255 AUROC 0.871 ± 0.002 0.883 ± 0.005 1 0.849 ± 0.004 5
DILI 475 AUROC 0.925 ± 0.005 0.577 ± 0.021 7 0.956 ± 0.006 1
Mean Rank: 5.2 3.3

License

Copyright (c) 2024 Graphcore Ltd. Licensed under the MIT License.

The included code is released under the MIT license (see details of the license).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

minimol-1.3.5.tar.gz (19.1 MB view details)

Uploaded Source

Built Distribution

minimol-1.3.5-py3-none-any.whl (19.1 MB view details)

Uploaded Python 3

File details

Details for the file minimol-1.3.5.tar.gz.

File metadata

  • Download URL: minimol-1.3.5.tar.gz
  • Upload date:
  • Size: 19.1 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.10.12

File hashes

Hashes for minimol-1.3.5.tar.gz
Algorithm Hash digest
SHA256 d4ab31212715938baea6c45d06e332c2fbd4b06043ecdd47b9c2fe975eb484d2
MD5 dcd1d0d9ac093f9713ad91de65850a4a
BLAKE2b-256 b8fdc94fb36ce405ea15a76156a3ddabe7132220cd41bf279e2ad5847ed1d77a

See more details on using hashes here.

File details

Details for the file minimol-1.3.5-py3-none-any.whl.

File metadata

  • Download URL: minimol-1.3.5-py3-none-any.whl
  • Upload date:
  • Size: 19.1 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.10.12

File hashes

Hashes for minimol-1.3.5-py3-none-any.whl
Algorithm Hash digest
SHA256 faa431c550682dfeed9f5b5503e3efef2a3d85a782afa256150449bb209f9eda
MD5 43a5389a9e01653cdb553ce00e4d7ee9
BLAKE2b-256 12817e82c1996294bbb62b5affecf8822409ca782af1db474eb093ae281e1d09

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page