Simple package that performs basic molecular structural sanity checks
Project description
chembl_gen_check
chembl_gen_check is a Python library for rapidly assessing how reasonable a (generated) molecule is. It ships with precomputed databases for both ChEMBL (medicinal chemistry from the scientific literature) and SureChEMBL (chemical structures extracted from patent literature), so no downloads or preprocessing are required: just pick one when loading your checker.
Using lightweight MolBloom filters, it verifies whether a compound's scaffolds, generic scaffolds or ring systems already exist in the selected database. It can also flag uncommon bonds via the LACAN algorithm and report structural alerts. Taken together, these checks give a fast read on the plausibility of a molecule's ring systems and scaffolds, and ensure that its atom and bond environments have precedent.
Installation
pip install chembl-gen-check
Usage example
from chembl_gen_check import Checker
checker = Checker("chembl")
#checker = Checker("surechembl")
smiles = "CCN(CC)C(=O)C[C@H]1C[C@@H]1c1ccccc1"
checker.load_smiles(smiles)
# Murcko scaffold found in the loaded database (True/False)
checker.check_scaffold()
# Generic Murcko scaffold found in loaded database (True/False)
checker.check_skeleton()
# All molecule ring systems found in loaded database (True/False)
checker.check_ring_systems()
# Number of structural alerts using the ChEMBL set in RDKit(integer)
checker.check_structural_alerts()
# LACAN hard pass/fail filter (default mode="threshold"): reject if any bond's
# PMI is below the threshold t (default 0.05). Returns 1.0 (pass) or 0.0 (fail).
checker.check_lacan()
How LACAN Works
Reference: Dehaen, W. LACAN. ChemRxiv preprint.
LACAN scores a molecule one bond at a time. Each bond is split into its two atom environments, and a PMI ratio (pointwise mutual information) is computed from the reference database:
PMI = P(env_a, env_b) / (P(env_a) * P(env_b))
that is, how often the two halves are actually bonded together versus how often
they would be by pure chance. PMI > 1 means the bond is more common than
chance, PMI ≈ 1 is as expected, and PMI ≈ 0 flags a junction that is
essentially never seen in the database (a likely artifact).
A molecule is summarized by its weakest bond, min_PMI:
mode="threshold"(default): a single bond is enough to fail the whole molecule — if any bond hasPMI < t(default0.05) it returns0.0(fail), otherwise1.0(pass).mode="score": returnsmin_PMI / (1 + min_PMI), a value in[0, 1). Higher is more reasonable;0.5corresponds tomin_PMI = 1.
Code to extract ring systems adapted from: W Patrick Walters. useful_rdkit_utils
Code to calculate LACAN scores adapted from: Dehaen, W. LACAN. https://github.com/dehaenw/lacan/
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file chembl_gen_check-0.1.4.tar.gz.
File metadata
- Download URL: chembl_gen_check-0.1.4.tar.gz
- Upload date:
- Size: 46.1 MB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
70e755c7fdb5ebb143325e42b15dd482605e1516dce257cdb7eafb6dd06ae96b
|
|
| MD5 |
a5b5d3f6e3808fdee15f9c98bd152699
|
|
| BLAKE2b-256 |
765e116de689dba23eee05c31ad33bf5779abb9f310af99e91c8ddfaa41182e3
|
Provenance
The following attestation bundles were made for chembl_gen_check-0.1.4.tar.gz:
Publisher:
ci.yml on eloyfelix/chembl_gen_check
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
chembl_gen_check-0.1.4.tar.gz -
Subject digest:
70e755c7fdb5ebb143325e42b15dd482605e1516dce257cdb7eafb6dd06ae96b - Sigstore transparency entry: 1897723617
- Sigstore integration time:
-
Permalink:
eloyfelix/chembl_gen_check@272c4f346f4f4fe4e5297431b0f10c169f7b1281 -
Branch / Tag:
refs/tags/0.1.4 - Owner: https://github.com/eloyfelix
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
ci.yml@272c4f346f4f4fe4e5297431b0f10c169f7b1281 -
Trigger Event:
release
-
Statement type:
File details
Details for the file chembl_gen_check-0.1.4-py3-none-any.whl.
File metadata
- Download URL: chembl_gen_check-0.1.4-py3-none-any.whl
- Upload date:
- Size: 46.3 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2a8eaf69d3fba2553dfd0edc3cca94cc2209b7b4073dc6a016bb2e1b5a2fd434
|
|
| MD5 |
383703cd0635b3cbcf0a931ab0a3da76
|
|
| BLAKE2b-256 |
b682bac51cb68917ef235b27d0bd2e6d86abf7cd7830cf47ec3992a9a76947bf
|
Provenance
The following attestation bundles were made for chembl_gen_check-0.1.4-py3-none-any.whl:
Publisher:
ci.yml on eloyfelix/chembl_gen_check
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
chembl_gen_check-0.1.4-py3-none-any.whl -
Subject digest:
2a8eaf69d3fba2553dfd0edc3cca94cc2209b7b4073dc6a016bb2e1b5a2fd434 - Sigstore transparency entry: 1897723694
- Sigstore integration time:
-
Permalink:
eloyfelix/chembl_gen_check@272c4f346f4f4fe4e5297431b0f10c169f7b1281 -
Branch / Tag:
refs/tags/0.1.4 - Owner: https://github.com/eloyfelix
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
ci.yml@272c4f346f4f4fe4e5297431b0f10c169f7b1281 -
Trigger Event:
release
-
Statement type: