Skip to main content

Counterfactual generation with STONED SELFIES

Project description

Explaining why that molecule

GitHub tests paper docs PyPI version MIT license

exmol is a package to explain black-box predictions of molecules. The package uses model agnostic explanations to help users understand why a molecule is predicted to have a property.

Install

pip install exmol

Counterfactual Generation

Our package implements the Model Agnostic Counterfactual Compounds with STONED to generate counterfactuals. A counterfactual can explain a prediction by showing what would have to change in the molecule to change its predicted class. Here is an example of a counterfactual:

This package is not popular. If the package had a logo, it would be popular.

In addition to having a changed prediction, a molecular counterfactual must be similar to its base molecule as much as possible. Here is an example of a molecular counterfactual:

counterfactual demo

The counterfactual shows that if the carboxylic acid were an ester, the molecule would be active. It is up to the user to translate this set of structures into a meaningful sentence.

Usage

Let's assume you have a deep learning model my_model(s) that takes in one SMILES string and outputs a predicted binary class. We first expand chemical space around the prediction of interest

import exmol

# mol of interest
base = 'Cc1onc(-c2ccccc2Cl)c1C(=O)NC1C(=O)N2C1SC(C)(C)C2C(=O)O'

samples = exmol.sample_space(base, my_model, batched=False)

Our model (my_model) should be a function that takes in one SMILES string. We use batched=False to indicate my_model cannot handle a batch of SMILES, just one at a time. Now we select counterfactuals from that space and plot them. If your model takes SELFIES, just pass use_selfies=True to sample_space.

cfs = exmol.cf_explain(samples)
exmol.plot_cf(cfs)
set of counterfactuals

We can also plot the space around the counterfactual. This is computed via PCA of the affinity matrix -- the similarity with the base molecule. Due to how similarity is calculated, the base is going to be the farthest from all other molecules. Thus your base should fall on the left (or right) extreme of your plot.

cfs = exmol.cf_explain(samples)
exmol.plot_space(samples, cfs)
chemical space

Each counterfactual is a Python dataclass with information allowing it to be used in your own analysis:

print(cfs[1])
{
'smiles': 'Cc1onc(-c2ccccc2Cl)c1C(=O)NC1C(=O)N2C1SC(C)(C)C2C',
'selfies': '[C][C][O][N][=C][Branch1_1][Branch2_3][C][=C][C][=C][C][=C][Ring1][Branch1_2][Cl][C]
            [Expl=Ring1][N][C][Branch1_2][C][=O][N][C][C][Branch1_2][C][=O][N][C][Ring1][Branch1_1][S][C]
            [Branch1_1][C][C][Branch1_1][C][C][C][Ring1][Branch1_3][C]',
'similarity': 0.8,
'yhat': 1,
'index': 1813,
'position': array([-7.8032394 ,  0.51781263]),
'is_origin': False,
'cluster': -1,
'label': 'Counterfactual 1'
}

Chemical Space

When calling exmol.sample_space you can pass preset=<preset>, which can be one of the following:

  • 'narrow': Only one change to molecular structure, reduced set of possible bonds/elements
  • 'medium': Default. One or two changes to molecular structure, reduced set of possible bonds/elements
  • 'wide': One through five changes to molecular structure, large set of possible bonds/elements
  • 'chemed': A restrictive set where only pubchem molecules are considered.
  • 'custom': A restrictive set where only molecules provided by the "data" key are considered.

You can also pass num_samples as a "request" for number of samples. You will typically end up with less due to degenerate molecules. See API for complete description.

SVG

Molecules are by default drawn as PNGs. If you would like to have them drawn as SVGs, call insert_svg after calling plot_space or plot_cf

import skunk
exmol.plot_cf(exps)
svg = exmol.insert_svg(exps, mol_fontsize=16)

# for Jupyter Notebook
skunk.display(svg)

# To save to file
with open('myplot.svg', 'w') as f:
    f.write(svg)

This is done with the skunk🦨 library.

Quiet Progress Bars

If exmol is being too loud, add quiet = True to sample_space arguments.

API and Docs

Read API here. You should also read the paper (see below) for a more exact description of the methods and implementation.

Developing

This repo uses pre-commit, so after cloning run pip install -r requirements.txt and pre-commit install prior to committing.

Citation

Please cite Wellawatte et al.

@Article{wellawatte_seshadri_white_2021,
author ="Wellawatte, Geemi P. and Seshadri, Aditi and White, Andrew D.",
title  ="Model agnostic generation of counterfactual explanations for molecules",
journal  ="Chem. Sci.",
year  ="2022",
pages  ="-",
publisher  ="The Royal Society of Chemistry",
doi  ="10.1039/D1SC05259D",
url  ="http://dx.doi.org/10.1039/D1SC05259D",
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

exmol-1.1.0.tar.gz (23.0 kB view hashes)

Uploaded Source

Built Distribution

exmol-1.1.0-py3-none-any.whl (22.0 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page