molecular filter for adjacent fragments
Project description
LACAN: Leveraging Adjacent Co-occurrence of Atomic Neighborhoods
LACAN is a cheminformatics toolkit for scoring, mutating, and generating drug-like molecules using a statistical model of chemical bond environments learned from ChEMBL. It is designed as a library for generative chemistry pipelines and includes an adaptive genetic algorithm that can optimise molecules toward any user-defined scoring function.
"All sorts of things in this world behave like mirrors." — Jacques Lacan
📖 Full documentation: https://lacan.readthedocs.io/en/latest/
How it works
For every bond in a molecule, LACAN computes a pair of ECFP2-like atom environment identifiers, one per endpoint. Each identifier encodes atomic number, degree, hydrogen count, formal charge, and ring type (none / non-aromatic / aromatic), hashed to a 32-bit integer. The two hashes form a bond pair.
A profile (e.g. chembl.pickle) stores, for a large training corpus:
idx: how often each atom environment appearspairs: how often each bond-pair co-occurssetsize: total number of bonds seen
The pointwise mutual information (PMI) for a single bond:
observed = pairs[(env1, env2)] / setsize
expected = (idx[env1] / setsize / 2) × (idx[env2] / setsize / 2)
bond_PMI = observed / expected
The molecule-level score uses the minimum per-bond PMI:
score = min_PMI / (1 + min_PMI)
A score near 0 means at least one bond is chemically unusual; near 1.0 means all bond environments are well-represented in the training data. Bonds below a threshold (default PMI < 0.05) are reported as bad_bonds.
Quick start
from rdkit import Chem
from lacan import lacan, gen
profile = lacan.load_profile("chembl")
# Score a molecule
mol = Chem.MolFromSmiles("CCCc1nn(C)c2c(=O)[nH]c(-c3ccccc3)nc12")
score, info = lacan.score_mol(mol, profile)
print(f"Score: {score:.3f} bad bonds: {info['bad_bonds']}")
# Generate drug-like molecules
mols = gen.generate_filtered_molecules(profile, n_molecules=100, n_jobs=-1)
# Optimise toward any scoring function
def my_score(mols):
return [lacan.score_mol(m, profile)[0] for m in mols]
winners = gen.generate_optimized_molecules(my_score, profile,
startN=50, generations=20)
# Returns: list of (smiles, score) sorted best-first
Build a custom profile
from rdkit import Chem
from lacan.lacan import get_profile_for_mols
suppl = Chem.SmilesMolSupplier("my_molecules.smi", titleLine=False)
profile = get_profile_for_mols(suppl, profile_name="my_profile", n_jobs=-1)
# Saved to lacan/data/my_profile.pickle
# Reload with: lacan.load_profile("my_profile")
For the full module reference, GA parameters, corpus biasing, and protection API see the documentation.
Example notebooks
Worked examples are in lacan/example_notebooks/:
| Notebook | Contents |
|---|---|
generate_molecules.ipynb |
Random generation, corpus biasing, running the GA |
optimize_from_mol.ipynb |
Lead optimisation from a seed molecule, pharmacophore protection, mol_cleaner |
mutating_molecules.ipynb |
Atom-level mutations and score filtering |
evaluate_bonds.ipynb |
Per-bond PMI scoring and visualisation |
median_molecules.ipynb |
Molecular crossover |
shape_optimize_vortioxetine.ipynb |
3D shape-guided scaffold hopping with pharmacophore locking |
Installation
pip install lacan
Installation is done via Pip. This package requires Python ≥ 3.9 and RDKit.
For installing from source:
git clone https://github.com/wdehaen/lacan.git
cd lacan
pip install .
Running the tests
pip install pytest
pytest # full suite (~155 tests)
pytest tests/test_protect.py -v # single module
Citation
Preprint coming soon.
If you use LACAN in your research, please cite:
Dehaen W. (2026). LACAN: Leveraging Adjacent Co-occurrence of Atomic Neighborhoods
for molecular scoring and generation. [Preprint]
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file lacan-1.0.1.tar.gz.
File metadata
- Download URL: lacan-1.0.1.tar.gz
- Upload date:
- Size: 3.2 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.17
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
81f4643336cb735dac1ac8538340c1a62a01625e35bce226d01928c554c092e2
|
|
| MD5 |
81c42fae7ebbb0930aca0527f189f4ec
|
|
| BLAKE2b-256 |
8c81f56cbfe0e368543bb4a227748845e46e5ae9e8263dd0cad999715d094911
|
File details
Details for the file lacan-1.0.1-py3-none-any.whl.
File metadata
- Download URL: lacan-1.0.1-py3-none-any.whl
- Upload date:
- Size: 3.3 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.17
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
54e96ca493c1d7fa304c2eba18d27b2dec7833ac3132ecef1826468038576201
|
|
| MD5 |
b809a934461b7e115bc9cf70e315d1c5
|
|
| BLAKE2b-256 |
319ff3567c2e329af3afed6bb6128b33605280abab760dc5995ff28fc7d5c30a
|