Skip to main content

Simple package that performs basic molecular structural sanity checks

Project description

chembl_gen_check

chembl_gen_check is a Python library for rapidly assessing how reasonable a (generated) molecule is. It ships with precomputed databases for both ChEMBL (medicinal chemistry from the scientific literature) and SureChEMBL (chemical structures extracted from patent literature), so no downloads or preprocessing are required: just pick one when loading your checker.

Using lightweight MolBloom filters, it verifies whether a compound's scaffolds, generic scaffolds or ring systems already exist in the selected database. It can also flag uncommon bonds via the LACAN algorithm and report structural alerts. Taken together, these checks give a fast read on the plausibility of a molecule's ring systems and scaffolds, and ensure that its atom and bond environments have precedent.

Installation

pip install chembl-gen-check

Usage example

from chembl_gen_check import Checker

checker = Checker("chembl")
#checker = Checker("surechembl")

smiles = "CCN(CC)C(=O)C[C@H]1C[C@@H]1c1ccccc1"
checker.load_smiles(smiles)

# Murcko scaffold found in the loaded database (True/False)
checker.check_scaffold()

# Generic Murcko scaffold found in loaded database (True/False)
checker.check_skeleton()

# All molecule ring systems found in loaded database (True/False)
checker.check_ring_systems()

# Number of structural alerts using the ChEMBL set in RDKit(integer)
checker.check_structural_alerts()

# LACAN hard pass/fail filter (default mode="threshold"): reject if any bond's
# PMI is below the threshold t (default 0.05). Returns 1.0 (pass) or 0.0 (fail).
checker.check_lacan()

How LACAN Works

Reference: Dehaen, W. LACAN. ChemRxiv preprint.

LACAN scores a molecule one bond at a time. Each bond is split into its two atom environments, and a PMI ratio (pointwise mutual information) is computed from the reference database:

PMI = P(env_a, env_b) / (P(env_a) * P(env_b))

that is, how often the two halves are actually bonded together versus how often they would be by pure chance. PMI > 1 means the bond is more common than chance, PMI ≈ 1 is as expected, and PMI ≈ 0 flags a junction that is essentially never seen in the database (a likely artifact).

A molecule is summarized by its weakest bond, min_PMI:

  • mode="threshold"(default): a single bond is enough to fail the whole molecule — if any bond has PMI < t (default 0.05) it returns 0.0 (fail), otherwise 1.0 (pass).
  • mode="score" : returns min_PMI / (1 + min_PMI), a value in [0, 1). Higher is more reasonable; 0.5 corresponds to min_PMI = 1.

Code to extract ring systems adapted from: W Patrick Walters. useful_rdkit_utils

Code to calculate LACAN scores adapted from: Dehaen, W. LACAN. https://github.com/dehaenw/lacan/

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

chembl_gen_check-0.1.4.tar.gz (46.1 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

chembl_gen_check-0.1.4-py3-none-any.whl (46.3 MB view details)

Uploaded Python 3

File details

Details for the file chembl_gen_check-0.1.4.tar.gz.

File metadata

  • Download URL: chembl_gen_check-0.1.4.tar.gz
  • Upload date:
  • Size: 46.1 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for chembl_gen_check-0.1.4.tar.gz
Algorithm Hash digest
SHA256 70e755c7fdb5ebb143325e42b15dd482605e1516dce257cdb7eafb6dd06ae96b
MD5 a5b5d3f6e3808fdee15f9c98bd152699
BLAKE2b-256 765e116de689dba23eee05c31ad33bf5779abb9f310af99e91c8ddfaa41182e3

See more details on using hashes here.

Provenance

The following attestation bundles were made for chembl_gen_check-0.1.4.tar.gz:

Publisher: ci.yml on eloyfelix/chembl_gen_check

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file chembl_gen_check-0.1.4-py3-none-any.whl.

File metadata

File hashes

Hashes for chembl_gen_check-0.1.4-py3-none-any.whl
Algorithm Hash digest
SHA256 2a8eaf69d3fba2553dfd0edc3cca94cc2209b7b4073dc6a016bb2e1b5a2fd434
MD5 383703cd0635b3cbcf0a931ab0a3da76
BLAKE2b-256 b682bac51cb68917ef235b27d0bd2e6d86abf7cd7830cf47ec3992a9a76947bf

See more details on using hashes here.

Provenance

The following attestation bundles were made for chembl_gen_check-0.1.4-py3-none-any.whl:

Publisher: ci.yml on eloyfelix/chembl_gen_check

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page