Skip to main content

Measures of roughness for molecular property landscapes

Project description

Measures of roughness for molecular property landscapes

This package implements the roughness index (ROGI) presented in "Roughness of Molecular Property Landscapes and Its Impact on Modellability", as well as the SARI, MODI, and RMODI indices.

Installation

rogi can be installed with pip:

pip install rogi

Note that rdkit is a dependency but needs to be installed separately with conda.

Requirements

  • numpy
  • scipy>=1.4
  • fastcluster
  • pandas
  • scikit-learn>=1
  • rdkit >= 2021 to be installed with conda

Usage

Note that ROGI and SARI are classes, while MODI and RMODI are functions.

ROGI

If SMILES are used as input, Morgan fingerprints (length 2048, radius 2) are computed and a distance matrix calculated with the Tanimoto metric:

from rogi import RoughnessIndex

ri = RoughnessIndex(Y=Y, smiles=smiles)
ri.compute_index()
>>> 0.42

With precomputed fingerprints:

ri = RoughnessIndex(Y=Y, fps=fingerprints)
ri.compute_index()

With descriptors you can pass a 2D array or a pandas.DataFrame where each row is a different molecule, and each column a different descriptor:

ri = RoughnessIndex(Y=Y, X=descriptors, metric='euclidean')
ri.compute_index()

You can also precompute a distance matrix using any chosen representation and metric:

ri = RoughnessIndex(Y=Y, X=descriptors, metric='precomputed')
ri.compute_index()

SARI

You can provide SMILES as input, and compute the SARI score without considering a reference set of datasets as follows:

from rogi import SARI

sari = SARI(pKi=pKi, smiles=smiles, fingerprints='maccs')
sari.compute_sari()
>>> 0.42

To standardize the raw continuous and discontinuous scores based on a reference set of datasets, you can compute the raw scores first and then provide SARI with their average and standard deviation:

raw_conts = []
raw_discs = []

for smiles, pKi in zip(datasets, affinities):
    sari = SARI(pKi=pKi, smiles=smiles, fingerprints='maccs')
    raw_cont, raw_disc = sari.compute_raw_scores()
    raw_conts.append(raw_cont)
    raw_discs.append(raw_disc)

mean_raw_cont = np.mean(raw_conts)
std_raw_cont = np.std(raw_conts)
mean_raw_disc = np.mean(raw_discs)
std_raw_disc = np.std(raw_discs)
                         
sari = SARI(pKi=my_pKi, smiles=my_smiles, fingerprints='maccs')
sari.compute_sari(mean_raw_cont=mean_raw_cont, std_raw_cont=std_raw_cont,
                  mean_raw_disc=mean_raw_disc, std_raw_disc=std_raw_disc)
>>> 0.42

You can also pass a precomputed similarity matrix:

sari = SARI(pKi=pKi, sim_matrix=precomputed_similarity_matrix)

RMODI

RMODI is a function and takes a distance matrix in square form, and a list of float, as input.

from rogi import RMODI
RMODI(Dx=square_dist_matrix, Y=Y)
>>> 0.42

The delta values used by default is 0.625, but can be changed with the delta argument:

from rogi import RMODI
RMODI(Dx=square_dist_matrix, Y=Y, delta=0.5)
>>> 0.21

MODI

MODI is a function and takes a distance matrix in square form, and a list of binary labels (0 and 1), as input.

from rogi import MODI
MODI(Dx=square_dist_matrix, Y=Y)
>>> 0.42

Citation

If you make use of the rogi package in scientific publications, please cite the following article:

@misc{rogi,
      title={Roughness of molecular property landscapes and its impact on modellability}, 
      author={Matteo Aldeghi and David E. Graff and Nathan Frey and Joseph A. Morrone and 
              Edward O. Pyzer-Knapp and Kirk E. Jordan and Connor W. Coley},
      year={2022},
      eprint={2207.09250},
      archivePrefix={arXiv},
      primaryClass={q-bio.QM}
      }

If you use SARI, please also cite:

@article{sari,
         title={SAR Index: Quantifying the Nature of Structure−Activity Relationships},
         author={Peltason, Lisa and Bajorath, J\"urgen},
         journal={J. Med. Chem.},
         publisher={American Chemical Society},
         volume={50},
         number={23},
         pages={5571--5578},
         year={2007}
         }

If you use MODI, please also cite:

@article{modi,
         title={Data Set Modelability by QSAR},
         author={"Golbraikh, Alexander and Muratov, Eugene and Fourches, Denis and
                 Tropsha, Alexander"}
         journal={J. Chem. Inf. Model.},
         publisher={American Chemical Society},
         volume={54},
         number={1},
         pages={1--4},
         year={2014}
         }

If you use RMODI, please also cite:

@article{rmodi,
         title={Regression Modelability Index: A New Index for Prediction of the
                Modelability of Data Sets in the Development of QSAR
                Regression Models},
         author={Luque Ruiz, Irene and G\'omez-Nieto, Miguel \'Angel},
         journal={J. Chem. Inf. Model.},
         publisher={American Chemical Society},
         volume={58},
         number={10},
         pages={2069--2084},
         year={2018}
         }

Project details


Release history Release notifications | RSS feed

This version

0.1

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rogi-0.1.tar.gz (30.9 kB view details)

Uploaded Source

Built Distribution

rogi-0.1-py3-none-any.whl (13.0 kB view details)

Uploaded Python 3

File details

Details for the file rogi-0.1.tar.gz.

File metadata

  • Download URL: rogi-0.1.tar.gz
  • Upload date:
  • Size: 30.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.8.13

File hashes

Hashes for rogi-0.1.tar.gz
Algorithm Hash digest
SHA256 f4f2d64d09530e9ac12a630e970afcbda8491acf1e73cf1457604f4d312d4dcf
MD5 852d543efe63b98bb867e84c85e6de28
BLAKE2b-256 67571ab31803bda5dbac9d10015529964ea9d6e6ef9cea27307153cf17d7d103

See more details on using hashes here.

File details

Details for the file rogi-0.1-py3-none-any.whl.

File metadata

  • Download URL: rogi-0.1-py3-none-any.whl
  • Upload date:
  • Size: 13.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.8.13

File hashes

Hashes for rogi-0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 90886fdb30a672538759d84a2f77bbc7e602260868da359f0445df665f3cf067
MD5 2fa846924359f610b6ca6283ff412f69
BLAKE2b-256 8b3aa2d05d38cd34ba5c802beeb7f1bfa9d570fe954cf5e8d62bf80bf2bbc508

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page