Skip to main content

Shape-constrained molecule generation via Equivariant Diffusion and GCN

Project description

ML Conformer Generator

ML Conformer Generator is a tool for shape-constrained molecule generation using an Equivariant Diffusion Model (EDM) and a Graph Convolutional Network (GCN). It is designed to generate 3D molecular conformations that are both chemically valid and spatially similar to a reference shape.

Supported features

  • Shape-guided molecular generation

    Generate novel molecules that conform to arbitrary 3D shapes—such as protein binding pockets or custom-defined spatial regions.

  • Reference-based conformer similarity

    Create molecules conformations of which closely resemble a reference structure, supporting scaffold-hopping and ligand-based design workflows.

  • Fragment-based inpainting

    Fix specific substructures or fragments within a molecule and complete or grow the rest in a geometrically consistent manner.


Installation

  1. Install the package:

pip install mlconfgen

  1. Load the weights from Huggingface

https://huggingface.co/Membrizard/ml_conformer_generator

edm_moi_chembl_15_39.pt

adj_mat_seer_chembl_15_39.pt


🐍 Python API

See interactive examples: ./python_api_demo.ipynb

from rdkit import Chem
from mlconfgen import MLConformerGenerator, evaluate_samples

model = MLConformerGenerator(
                             edm_weights="./edm_moi_chembl_15_39.pt",
                             adj_mat_seer_weights="./adj_mat_seer_chembl_15_39.pt",
                             diffusion_steps=100,
                            )

reference = Chem.MolFromMolFile('./assets/demo_files/ceyyag.mol')

samples = model.generate_conformers(reference_conformer=reference, n_samples=20, variance=2)

aligned_reference, std_samples = evaluate_samples(reference, samples)

🚀 Overview

This solution employs:

  • Equivariant Diffusion Model (EDM) [1]: For generating atom coordinates and types under a shape constraint.
  • Graph Convolutional Network (GCN) [2]: For predicting atom adjacency matrices.
  • Deterministic Standardization Pipeline: For refining and validating generated molecules.

🧠 Model Training

  • Trained on 1.6 million compounds from the ChEMBL database.
  • Filtered to molecules with 15–39 heavy atoms.
  • Supported elements: H, C, N, O, F, P, S, Cl, Br.

🧪 Standardization Pipeline

The generated molecules are post-processed through the following steps:

  • Largest Fragment picker
  • Valence check
  • Kekulization
  • RDKit sanitization
  • Constrained Geometry optimization via MMFF94 Molecular Dynamics

📏 Evaluation Pipeline

Aligns and Evaluates shape similarity between generated molecules and a reference using Shape Tanimoto Similarity [3] via Gaussian Molecular Volume overlap.

Hydrogens are ignored in both reference and generated samples for this metric.


📊 Performance (100 Denoising Steps)

Tested on 100,000 samples using 1,000 CCDC Virtual Screening [4] reference compounds.

  • Avg time to generate 50 valid samples: 11.46 sec (NVIDIA H100)
  • ⚡️ Generation speed: 4.18 valid molecules/sec
  • 💾 GPU memory (per generation thread): Up to 4.0 GB
  • 📐 Avg Shape Tanimoto Similarity: 53.32%
  • 🎯 Max Shape Tanimoto Similarity: 99.69%
  • 🔬 Avg Chemical Tanimoto Similarity (2-hop 2048-bit Morgan Fingerprints): 10.87%
  • 🧬 % Chemically novel (vs. training set): 99.84%
  • ✔️ % Valid molecules (post-standardization): 48%
  • 🔁 % Unique molecules in generated set: 99.94%
  • 📎 Fréchet Fingerprint Distance (2-hop 2048-bit Morgan Fingerprints):
    • To ChEMBL: 4.13
    • To PubChem: 2.64
    • To ZINC (250k): 4.95

Generation Examples

ex1 ex2 ex3 ex4


💾 Access & Licensing

The Python package and inference code are available on GitHub under Apache 2.0 License

https://github.com/Membrizard/ml_conformer_generator

The trained model Weights are available at

https://huggingface.co/Membrizard/ml_conformer_generator

And are licensed under CC BY-NC-ND 4.0

The usage of the trained weights for any profit-generating activity is restricted.

For commercial licensing and inference-as-a-service, contact: Denis Sapegin


ONNX Inference:

For torch Free inference an ONNX version of the model is present.

Weights of the model in ONNX format are available at:

https://huggingface.co/Membrizard/ml_conformer_generator

egnn_chembl_15_39.onnx

adj_mat_seer_chembl_15_39.onnx

from mlconfgen import MLConformerGeneratorONNX
from rdkit import Chem

model = MLConformerGeneratorONNX(
                                 egnn_onnx="./egnn_chembl_15_39.onnx",
                                 adj_mat_seer_onnx="./adj_mat_seer_chembl_15_39.onnx",
                                 diffusion_steps=100,
                                )

reference = Chem.MolFromMolFile('./assets/demo_files/yibfeu.mol')
samples = model.generate_conformers(reference_conformer=reference, n_samples=20, variance=2)

Install ONNX GPU runtime (if needed): pip install onnxruntime-gpu


Export to ONNX

An option to compile the model to ONNX is provided

requires onnxscript==0.2.2

pip install onnxscript

from mlconfgen import MLConformerGenerator
from onnx_export import export_to_onnx

model = MLConformerGenerator()
export_to_onnx(model)

This compiles and saves the ONNX files to: ./

Streamlit App

streamlit_app

Running

  • Move the trained PyTorch weights into ./streamlit_app

./streamlit_app/edm_moi_chembl_15_39.pt

./streamlit_app/adj_mat_seer_chembl_15_39.pt

  • Install the dependencies pip install -r ./streamlit_app/requirements.txt

  • Bring the app UI up:

    cd ./streamlit_app
    streamlit run app.py
    

Streamlit App Development

  1. To enable development mode for the 3D viewer (stspeck), set _RELEASE = False in ./streamlit/stspeck/__init__.py.

  2. Navigate to the 3D viewer frontend and start the development server:

    cd ./frontend/speck/frontend
    npm run start
    

    This will launch the dev server at http://localhost:3001

  3. In a separate terminal, run the Streamlit app from the root frontend directory:

    cd ./streamlit_app
    streamlit run app.py
    
  4. To build the production version of the 3D viewer, run:

    cd ./streamlit_app/stspeck/frontend
    npm run build
    

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mlconfgen-0.2.1.tar.gz (49.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mlconfgen-0.2.1-py3-none-any.whl (59.7 kB view details)

Uploaded Python 3

File details

Details for the file mlconfgen-0.2.1.tar.gz.

File metadata

  • Download URL: mlconfgen-0.2.1.tar.gz
  • Upload date:
  • Size: 49.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.8

File hashes

Hashes for mlconfgen-0.2.1.tar.gz
Algorithm Hash digest
SHA256 70f0b91aa6612f0721a73948f29e4fe393cfc27ca8a4dc59d8c320f00aebee43
MD5 1002db08a21f8d717d23e85ea06693ef
BLAKE2b-256 b08f193d7bd900d510d2304d10446b149eb3fc89a83072f316d5128dfe749118

See more details on using hashes here.

Provenance

The following attestation bundles were made for mlconfgen-0.2.1.tar.gz:

Publisher: release.yaml on Membrizard/ml_conformer_generator

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file mlconfgen-0.2.1-py3-none-any.whl.

File metadata

  • Download URL: mlconfgen-0.2.1-py3-none-any.whl
  • Upload date:
  • Size: 59.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.8

File hashes

Hashes for mlconfgen-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 fa83e2d2b429a09864e50cc7cff4e5ed6fffe6dbf71ed0ee21094fb088d6f185
MD5 a4abe397d084d9a470361a147b9ae529
BLAKE2b-256 2b23c9e29189dc48b156a62c0774959873e6e1f92d1b7ac663c05410010b7ba4

See more details on using hashes here.

Provenance

The following attestation bundles were made for mlconfgen-0.2.1-py3-none-any.whl:

Publisher: release.yaml on Membrizard/ml_conformer_generator

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page