Skip to main content

Applicability domains for cheminformactics.

Project description

MLChemAD

Applicability domain definitions for cheminformatics modelling.

Getting Started

Install

pip install mlchemad

Example Usage

  • With molecular fingerprints, prefer the use of the KNNApplicabilityDomain with k=1, scaling=None, hard_threshold=0.3, and dist='jaccard'.
  • Otherwise, the use of the TopKatApplicabilityDomain is recommended.
from mlchemad import TopKatApplicabilityDomain, KNNApplicabilityDomain, data

# Create the applicability domain using TopKat's definition
app_domain = TopKatApplicabilityDomain()
# Fit it to the training set
app_domain.fit(data.mekenyan1993.training)

# Determine outliers from multiple samples (rows) ...
print(app_domain.contains(data.mekenyan1993.test))

# ... or a unique sample
sample = data.mekenyan1993.test.iloc[5] # Obtain the 5th row as a pandas.Series object 
print(app_domain.contains(sample))

# Now with Morgan fingerprints
app_domain = KNNApplicabilityDomain(k=1, scaling=None, hard_threshold=0.3, dist='jaccard')
app_domain.fit(data.broccatelli2011.training.drop(columns='Activity'))
print(app_domain.contains(data.broccatelli2011.test.drop(columns='Activity')))

Depending on the definition of the applicability domain, some samples of the training set might be outliers themselves.

Applicability domains

The applicability domain defined by MLChemAD as the following:

  • Bounding Box
  • PCA Bounding Box
  • Convex Hull
    (does not scale well)
  • TOPKAT's Optimum Prediction Space
    (recommended with molecular descriptors)
  • Leverage
  • Hotelling T²
  • Distance to Centroids
  • k-Nearest Neighbors
    (recommended with molecular fingerprints with the use of dist='rogerstanimoto', scaling=None and hard_threshold=0.75 for ECFP fingerprints)
  • Isolation Forests
  • Non-parametric Kernel Densities

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mlchemad-1.5.2.tar.gz (286.2 kB view hashes)

Uploaded Source

Built Distribution

mlchemad-1.5.2-py3-none-any.whl (314.4 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page