Python implementation of Dompé's 'Molecular Anatomy'
Project description
A Python library for extracting multiple types of molecular scaffolds, frameworks, and wireframes.
scaffound provides a hierarchical approach to molecular decomposition derived from [1], allowing for a detailed analysis of chemical structures beyond the traditional Bemis-Murcko scaffold.
scaffound is an extended implementation of Dompé's Molecular Anatomy to identify different types of molecular scaffolds, frameworks and wireframes.
Core concepts ⚛️
The library extracts three main types of scaffolds:
- Basic Scaffold: The core ring systems and their linkers.
- Decorated Scaffold: The basic scaffold plus all heteroatoms directly attached to it by unsaturated bonds.
- Augmented Scaffold: The decorated scaffold plus all atoms belonging to the longest carbon chain (including substituents and side chains).
These scaffolds can be further abstracted into:
- Frameworks: Scaffolds made generic by replacing all heteroatoms with carbons.
- Wireframes: Scaffolds made both saturated (all bonds replaced with single bonds) and generic.
⚙️ A Note on Augmented Scaffolds
The seminal algorithm for determining the augmented scaffold relies on identifying the longest path within the molecular graph.
However, the original description's method does not determine a unique path when multiple paths of the same maximum length exist and chooses one among all solutions.
"[...] the longest atom chain, considering also substitutions, is retained but all terminal non-carbon atoms, belonging to side chains, are iteratively pruned (Augmented Scaffold)."
"[...] three paths can be identified two of them are the longest with the same length and the first identified is retained"
This means that multiple valid paths could be chosen for the same molecule, each resulting in a different augmented scaffold.
Consequently, while scaffound strictly adheres to the published logic, its implementation may identify different (yet equally valid) longest paths than those in the original work.
This can lead to variations in the resulting augmented scaffolds compared to the examples in the source publication.
To address this ambiguity, scaffound also implements its own deterministic canonical longest path algorithm.
This ensures a single deterministic outcome.
Installation 🪄
pip install scaffound
Geting started 🚀
# A simple usage example
from rdkit import Chem
from scaffound import MolecularAnatomy
# Create an RDKit molecule object
mol = Chem.MolFromSmiles('O=C(c1ccccc1)N1CCN(c2ccccc2)CC1')
# Analyze the molecule
anatomy = MolecularAnatomy(mol)
# Access different scaffold/framework/wireframe types
basic_scaffold = anatomy.basic_scaffold
decorated_framework = anatomy.decorated_framework
augmented_wireframe = anatomy.augmented_wireframe
# You can now work with these new molecule objects
print(Chem.MolToSmiles(basic_scaffold))
# Output: c1ccc(C2CCN(c3ccccc3)CC2)cc1
The MolecularAnatomy object also decomposes the molecule's generic and saturated graphs into all the scaffolds, frameworks and wireframes mentioned above.
An entire decomposition can be accessed using its to_dict() or to_pandas() methods.
Mind you that some decompositions of the original, generic, and saturated graphs are identical (see decomposition_equivalence.ipynb).
For instance:
- the basic framework of the original graph is the same as the basic scaffold of the generic graph,
- the basic wireframe of the original graph is the same as the basic wireframe of the generic graph,
- the basic framework of the saturated graph is the same as the decorated_framework of the saturated graph.
Advanced usage 💪
If performance is needed, one can use functions to access only the type of scaffold/framework/wireframe needed (since the MolecularAnatomy decomposes a molecule ahead of time into all the possible scaffolds).
from scaffound import (get_generic_graph, # All heteroatoms replaced by carbons
get_saturated_graph, # All bonds replaced by single bonds
# Scaffold types
get_basic_scaffold, get_decorated_scaffold, get_augmented_scaffold,
# Framework types
get_basic_framework, get_decorated_framework, get_augmented_framework,
# Wireframe types
get_basic_wireframe, get_decorated_wireframe, get_augmented_wireframe)
Furthermore, one can deactivate scaffound's deterministic longest path algorithm and revert to the original with the following:
from scaffound import MinMaxShortestPathOptions
opts = MinMaxShortestPathOptions(original_algorithm=True)
MolecularAnatomy(mol, opts=pts)
get_augmented_scaffold(mol, opts=opts)
get_augmented_framework(mol, opts=opts)
get_augmented_wireframe(mol, opts=opts)
Validation ✅
This library has been rigorously tested against the exemplary file from the seminal scientific article that introduced these concepts.
The reference data has been corrected within this repository to ensure it aligns 100% with the paper's detailed algorithm and figures, providing a reliable and verified tool (see tests/MODIFICATIONS.txt).
An adapted version of this reference data is also provided (tests/cox2_816_inhibitors_adapted_lsp.txt) to reflect the results of scaffound's deterministic longest path algorithm, which is also described in tests/MODIFICATIONS.txt.
References 📜
- [1] Manelfi, C., Gemei, M., Talarico, C. et al.
“Molecular Anatomy”: a new multi-dimensional hierarchical scaffold analysis tool.
J Cheminform 13, 54 (2021). https://doi.org/10.1186/s13321-021-00526-y
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file scaffound-0.0.1.tar.gz.
File metadata
- Download URL: scaffound-0.0.1.tar.gz
- Upload date:
- Size: 28.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
afbfef2be10cc2fcb1dbf604ef88dde2d6e5186bfaa942f130a9bf678573296b
|
|
| MD5 |
4246e29e55789518001765c620c619e5
|
|
| BLAKE2b-256 |
04844446a41c724aaebfaa51e33887c5653b06748bc8e4faf12f5821818326f5
|
File details
Details for the file scaffound-0.0.1-py3-none-any.whl.
File metadata
- Download URL: scaffound-0.0.1-py3-none-any.whl
- Upload date:
- Size: 22.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
40ab80a495b11c0ead94d39cefa3626bee29ba6eae1578d4d41659823331709e
|
|
| MD5 |
05d968ae71396acabbd5b3a34ae81c87
|
|
| BLAKE2b-256 |
03483fb68b9d98d3f16f4797475e17d50f87775338f54bfc7ffd36cc1787129a
|