Skip to main content

Applicability domains for cheminformactics.

Project description

MLChemAD

Applicability domain definitions for cheminformatics modelling.

Getting Started

Install

pip install mlchemad

Example Usage

  • With molecular fingerprints, prefer the use of the KNNApplicabilityDomain with k=1, scaling=None, hard_threshold=0.3, and dist='jaccard'.
  • Otherwise, the use of the TopKatApplicabilityDomain is recommended.
from mlchemad import TopKatApplicabilityDomain, KNNApplicabilityDomain, data

# Create the applicability domain using TopKat's definition
app_domain = TopKatApplicabilityDomain()
# Fit it to the training set
app_domain.fit(data.mekenyan1993.training)

# Determine outliers from multiple samples (rows) ...
print(app_domain.contains(data.mekenyan1993.test))

# ... or a unique sample
sample = data.mekenyan1993.test.iloc[5] # Obtain the 5th row as a pandas.Series object 
print(app_domain.contains(sample))

# Now with Morgan fingerprints
app_domain = KNNApplicabilityDomain(k=1, scaling=None, hard_threshold=0.3, dist='jaccard')
app_domain.fit(data.broccatelli2011.training.drop(columns='Activity'))
print(app_domain.contains(data.broccatelli2011.test.drop(columns='Activity')))

Depending on the definition of the applicability domain, some samples of the training set might be outliers themselves.

Applicability domains

The applicability domain defined by MLChemAD as the following:

  • Bounding Box
  • PCA Bounding Box
  • Convex Hull
    (does not scale well)
  • TOPKAT's Optimum Prediction Space
    (recommended with molecular descriptors)
  • Leverage
  • Hotelling T²
  • Distance to Centroids
  • k-Nearest Neighbors
    (recommended with molecular fingerprints with the use of dist='rogerstanimoto', scaling=None and hard_threshold=0.75 for ECFP fingerprints)
  • Isolation Forests
  • Non-parametric Kernel Densities

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mlchemad-1.5.2.tar.gz (286.2 kB view details)

Uploaded Source

Built Distribution

mlchemad-1.5.2-py3-none-any.whl (314.4 kB view details)

Uploaded Python 3

File details

Details for the file mlchemad-1.5.2.tar.gz.

File metadata

  • Download URL: mlchemad-1.5.2.tar.gz
  • Upload date:
  • Size: 286.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.5

File hashes

Hashes for mlchemad-1.5.2.tar.gz
Algorithm Hash digest
SHA256 b0f2c6d6b8c639e0c873f14af0364693ad7b3e9641705441464d5e1816168a41
MD5 ab322f828ea6da5ec56dd08389f9b7cd
BLAKE2b-256 581d2452236c0e6cfcaf451f56dbbfe5bcc91354411e983d99cdc7a62af3314a

See more details on using hashes here.

File details

Details for the file mlchemad-1.5.2-py3-none-any.whl.

File metadata

  • Download URL: mlchemad-1.5.2-py3-none-any.whl
  • Upload date:
  • Size: 314.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.5

File hashes

Hashes for mlchemad-1.5.2-py3-none-any.whl
Algorithm Hash digest
SHA256 b578ceca58139578c84843aa851aa14430e3748bcf339a7b9a12ad94dbddfa3e
MD5 832ea93fe074abdb10e12d89d0d52a22
BLAKE2b-256 984937b077b10c2bd780bf16a79cdf15d6f1dd0cf49ea51a0162ee0477989a4f

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page