Skip to main content

molecular filter for adjacent fragments

Project description

LACAN: Leveraging Adjacent Co-occurrence of Atomic Neighborhoods

LACAN is a cheminformatics toolkit for scoring, mutating, and generating drug-like molecules using a statistical model of chemical bond environments learned from ChEMBL. It is designed as a library for generative chemistry pipelines and includes an adaptive genetic algorithm that can optimise molecules toward any user-defined scoring function.

"All sorts of things in this world behave like mirrors." — Jacques Lacan

📖 Full documentation: https://lacan.readthedocs.io/en/latest/


How it works

For every bond in a molecule, LACAN computes a pair of ECFP2-like atom environment identifiers, one per endpoint. Each identifier encodes atomic number, degree, hydrogen count, formal charge, and ring type (none / non-aromatic / aromatic), hashed to a 32-bit integer. The two hashes form a bond pair.

A profile (e.g. chembl.pickle) stores, for a large training corpus:

  • idx: how often each atom environment appears
  • pairs: how often each bond-pair co-occurs
  • setsize: total number of bonds seen

The pointwise mutual information (PMI) for a single bond:

observed  = pairs[(env1, env2)] / setsize
expected  = (idx[env1] / setsize / 2) × (idx[env2] / setsize / 2)
bond_PMI  = observed / expected

The molecule-level score uses the minimum per-bond PMI:

score = min_PMI / (1 + min_PMI)

A score near 0 means at least one bond is chemically unusual; near 1.0 means all bond environments are well-represented in the training data. Bonds below a threshold (default PMI < 0.05) are reported as bad_bonds.


Quick start

from rdkit import Chem
from lacan import lacan, gen

profile = lacan.load_profile("chembl")

# Score a molecule
mol = Chem.MolFromSmiles("CCCc1nn(C)c2c(=O)[nH]c(-c3ccccc3)nc12")
score, info = lacan.score_mol(mol, profile)
print(f"Score: {score:.3f}  bad bonds: {info['bad_bonds']}")

# Generate drug-like molecules
mols = gen.generate_filtered_molecules(profile, n_molecules=100, n_jobs=-1)

# Optimise toward any scoring function
def my_score(mols):
    return [lacan.score_mol(m, profile)[0] for m in mols]

winners = gen.generate_optimized_molecules(my_score, profile,
                                            startN=50, generations=20)
# Returns: list of (smiles, score) sorted best-first

Build a custom profile

from rdkit import Chem
from lacan.lacan import get_profile_for_mols

suppl = Chem.SmilesMolSupplier("my_molecules.smi", titleLine=False)
profile = get_profile_for_mols(suppl, profile_name="my_profile", n_jobs=-1)
# Saved to lacan/data/my_profile.pickle
# Reload with: lacan.load_profile("my_profile")

For the full module reference, GA parameters, corpus biasing, and protection API see the documentation.


Example notebooks

Worked examples are in lacan/example_notebooks/:

Notebook Contents
generate_molecules.ipynb Random generation, corpus biasing, running the GA
optimize_from_mol.ipynb Lead optimisation from a seed molecule, pharmacophore protection, mol_cleaner
mutating_molecules.ipynb Atom-level mutations and score filtering
evaluate_bonds.ipynb Per-bond PMI scoring and visualisation
median_molecules.ipynb Molecular crossover
shape_optimize_vortioxetine.ipynb 3D shape-guided scaffold hopping with pharmacophore locking

Installation

pip install lacan

Installation is done via Pip. This package requires Python ≥ 3.9 and RDKit.

For installing from source:

git clone https://github.com/wdehaen/lacan.git
cd lacan
pip install .

Running the tests

pip install pytest
pytest                              # full suite (~155 tests)
pytest tests/test_protect.py -v    # single module

Citation

Preprint coming soon.

If you use LACAN in your research, please cite:

Dehaen W. (2026). LACAN: Leveraging Adjacent Co-occurrence of Atomic Neighborhoods
for molecular scoring and generation. [Preprint]

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lacan-1.0.1.tar.gz (3.2 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

lacan-1.0.1-py3-none-any.whl (3.3 MB view details)

Uploaded Python 3

File details

Details for the file lacan-1.0.1.tar.gz.

File metadata

  • Download URL: lacan-1.0.1.tar.gz
  • Upload date:
  • Size: 3.2 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.17

File hashes

Hashes for lacan-1.0.1.tar.gz
Algorithm Hash digest
SHA256 81f4643336cb735dac1ac8538340c1a62a01625e35bce226d01928c554c092e2
MD5 81c42fae7ebbb0930aca0527f189f4ec
BLAKE2b-256 8c81f56cbfe0e368543bb4a227748845e46e5ae9e8263dd0cad999715d094911

See more details on using hashes here.

File details

Details for the file lacan-1.0.1-py3-none-any.whl.

File metadata

  • Download URL: lacan-1.0.1-py3-none-any.whl
  • Upload date:
  • Size: 3.3 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.17

File hashes

Hashes for lacan-1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 54e96ca493c1d7fa304c2eba18d27b2dec7833ac3132ecef1826468038576201
MD5 b809a934461b7e115bc9cf70e315d1c5
BLAKE2b-256 319ff3567c2e329af3afed6bb6128b33605280abab760dc5995ff28fc7d5c30a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page