Skip to main content

Python implementation of Dompé's 'Molecular Anatomy'

Project description

scaffound logo


A Python library for extracting multiple types of molecular scaffolds, frameworks, and wireframes.

scaffound provides a hierarchical approach to molecular decomposition derived from [1], allowing for a detailed analysis of chemical structures beyond the traditional Bemis-Murcko scaffold.

scaffound is an extended implementation of Dompé's Molecular Anatomy to identify different types of molecular scaffolds, frameworks and wireframes.

License: MIT

Core concepts ⚛️

The library extracts three main types of scaffolds:

  • Basic Scaffold: The core ring systems and their linkers.
  • Decorated Scaffold: The basic scaffold plus all heteroatoms directly attached to it by unsaturated bonds.
  • Augmented Scaffold: The decorated scaffold plus all atoms belonging to the longest carbon chain (including substituents and side chains).

These scaffolds can be further abstracted into:

  • Frameworks: Scaffolds made generic by replacing all heteroatoms with carbons.
  • Wireframes: Scaffolds made both saturated (all bonds replaced with single bonds) and generic.

scaffound hierarchy

⚙️ A Note on Augmented Scaffolds

The seminal algorithm for determining the augmented scaffold relies on identifying the longest path within the molecular graph.
However, the original description's method does not determine a unique path when multiple paths of the same maximum length exist and chooses one among all solutions.

"[...] the longest atom chain, considering also substitutions, is retained but all terminal non-carbon atoms, belonging to side chains, are iteratively pruned (Augmented Scaffold)."

"[...] three paths can be identified two of them are the longest with the same length and the first identified is retained"

This means that multiple valid paths could be chosen for the same molecule, each resulting in a different augmented scaffold.

Consequently, while scaffound strictly adheres to the published logic, its implementation may identify different (yet equally valid) longest paths than those in the original work.
This can lead to variations in the resulting augmented scaffolds compared to the examples in the source publication.

To address this ambiguity, scaffound also implements its own deterministic canonical longest path algorithm. This ensures a single deterministic outcome.

Installation 🪄

pip install scaffound

Geting started 🚀

# A simple usage example
from rdkit import Chem
from scaffound import MolecularAnatomy

# Create an RDKit molecule object
mol = Chem.MolFromSmiles('O=C(c1ccccc1)N1CCN(c2ccccc2)CC1')


# Analyze the molecule
anatomy = MolecularAnatomy(mol)

# Access different scaffold/framework/wireframe types
basic_scaffold = anatomy.basic_scaffold
decorated_framework = anatomy.decorated_framework
augmented_wireframe = anatomy.augmented_wireframe

# You can now work with these new molecule objects
print(Chem.MolToSmiles(basic_scaffold))
# Output: c1ccc(C2CCN(c3ccccc3)CC2)cc1

The MolecularAnatomy object also decomposes the molecule's generic and saturated graphs into all the scaffolds, frameworks and wireframes mentioned above.

scaffound hierarchy

An entire decomposition can be accessed using its to_dict() or to_pandas() methods.

Mind you that some decompositions of the original, generic, and saturated graphs are identical (see decomposition_equivalence.ipynb).
For instance:

  • the basic framework of the original graph is the same as the basic scaffold of the generic graph,
  • the basic wireframe of the original graph is the same as the basic wireframe of the generic graph,
  • the basic framework of the saturated graph is the same as the decorated_framework of the saturated graph.

Advanced usage 💪

If performance is needed, one can use functions to access only the type of scaffold/framework/wireframe needed (since the MolecularAnatomy decomposes a molecule ahead of time into all the possible scaffolds).

from scaffound import (get_generic_graph, # All heteroatoms replaced by carbons
                       get_saturated_graph, # All bonds replaced by single bonds
                       # Scaffold types
                       get_basic_scaffold, get_decorated_scaffold, get_augmented_scaffold,
                       # Framework types
                       get_basic_framework, get_decorated_framework, get_augmented_framework,
                       # Wireframe types
                       get_basic_wireframe, get_decorated_wireframe, get_augmented_wireframe)

Furthermore, one can deactivate scaffound's deterministic longest path algorithm and revert to the original with the following:

from scaffound import MinMaxShortestPathOptions

opts = MinMaxShortestPathOptions(original_algorithm=True)

MolecularAnatomy(mol, opts=pts)
get_augmented_scaffold(mol, opts=opts)
get_augmented_framework(mol, opts=opts)
get_augmented_wireframe(mol, opts=opts)

Validation ✅

This library has been rigorously tested against the exemplary file from the seminal scientific article that introduced these concepts.
The reference data has been corrected within this repository to ensure it aligns 100% with the paper's detailed algorithm and figures, providing a reliable and verified tool (see tests/MODIFICATIONS.txt).
An adapted version of this reference data is also provided (tests/cox2_816_inhibitors_adapted_lsp.txt) to reflect the results of scaffound's deterministic longest path algorithm, which is also described in tests/MODIFICATIONS.txt.

References 📜

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scaffound-0.0.1.tar.gz (28.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

scaffound-0.0.1-py3-none-any.whl (22.8 kB view details)

Uploaded Python 3

File details

Details for the file scaffound-0.0.1.tar.gz.

File metadata

  • Download URL: scaffound-0.0.1.tar.gz
  • Upload date:
  • Size: 28.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.11

File hashes

Hashes for scaffound-0.0.1.tar.gz
Algorithm Hash digest
SHA256 afbfef2be10cc2fcb1dbf604ef88dde2d6e5186bfaa942f130a9bf678573296b
MD5 4246e29e55789518001765c620c619e5
BLAKE2b-256 04844446a41c724aaebfaa51e33887c5653b06748bc8e4faf12f5821818326f5

See more details on using hashes here.

File details

Details for the file scaffound-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: scaffound-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 22.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.11

File hashes

Hashes for scaffound-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 40ab80a495b11c0ead94d39cefa3626bee29ba6eae1578d4d41659823331709e
MD5 05d968ae71396acabbd5b3a34ae81c87
BLAKE2b-256 03483fb68b9d98d3f16f4797475e17d50f87775338f54bfc7ffd36cc1787129a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page