Skip to main content

A unified generative AI framework for 3D molecular generation using diffusion models

Project description

MolCraftDiffusion

A unified generative AI framework for 3D molecular generation using diffusion models, designed to streamline the entire workflow from model training to deployment in data-driven computational chemistry pipelines.

MolCraftDiffusion enables researchers to train 3D molecular diffusion models, develop predictive models, and perform guided molecular generation for applications such as catalyst discovery, drug design, and exploration of chemical space.

workflow

Key Features

MolCraftDiffusion provides a complete pipeline for training/fine-tuning diffusion models, building predictive property models, and applying them to data-driven molecular generation tasks within a unified framework.

  • End-to-End 3D Molecular Generation Workflow: Support training diffusion model, and preditive models, and utilize them for various molecular generation tasks, all within a unified framework.
  • Curriculum learning: Efficient way for training and fine-tuning 3D molecular diffusion models
  • Guidance Tools: MolCraftDiffusion includes several guidance mechanisms that enable the generation of molecules with desired structural or physicochemical properties.
    • Property-Targeted Generation: Generate molecules with a target physicochemical or electronic properties (e.g., excitation energy, dipole moment)
    • Inpainting: Systematically explore structural variants around reference molecules
    • Outpainting: Extend a molecule by generating new parts.
  • Command-Line Interface: A simple and flexible CLI interface enables users to perform training, generation, prediction, and analysis tasks directly from the command line.

PyPI arXiv DOI Weights Dataset Tutorials

Try our interactive demo for molecular generation: MolCraftDiffusion-demo

Installation

# 1. Create environment
conda create -n molcraft python=3.11 -y
conda activate molcraft

# 2. Install conda-only tools (xtb, openbabel)
conda install -c conda-forge xtb==6.7.1 openbabel -y

# 3. Install MolCraftDiffusion with PyTorch + PyG + sparse extensions
pip install molcraftdiffusion[gpu] \
    --find-links https://data.pyg.org/whl/torch-2.6.0+cu124.html

# or CPU-only:
pip install molcraftdiffusion[cpu] \
    --extra-index-url https://download.pytorch.org/whl/cpu \
    --find-links https://data.pyg.org/whl/torch-2.6.0+cpu.html

# Optional: symmetry metrics (requires numpy==1.24.*)
pip install cosymlib

Development / editable install

git clone https://github.com/pregHosh/MolCraftDiffusion
cd MolCraftDiffusion
pip install -e .[gpu] \
    --find-links https://data.pyg.org/whl/torch-2.6.0+cu124.html

Usage

Pre-trained Models

Pre-trained diffusion models are available at Hugging Face or in the models/edm_pretrained/ directory. We suggest to start from this model for downstream application.

There are two ways to run experiments: using the MolCraftDiff command-line tool (recommended) or by executing the Python scripts directly.

1. MolCraftDiff CLI (Recommended)

Make sure you have installed the package in editable mode as described above, and that you run the commands from the root of the project directory.

Commands:

  • train: Run a training job.
  • generate: Run a molecule generation job.
  • predict: Run prediction with a trained model.
  • eval_predict: Evaluate predictions.
  • analyze: Perform analysis and post-processing on generated molecules.
  • data: Data processing utilities (preparation, augmentation, and dataset operations).

Command Syntax:

MolCraftDiff [COMMAND] [CONFIG_NAME/ARGUMENTS]
  • [COMMAND]: One of train, generate, predict, eval_predict, analyze, or data.
  • [CONFIG_NAME]: The name of the configuration file from the configs/ directory (e.g., train, example_diffusion_config).
  • [ARGUMENTS]: Additional command-line arguments to override configuration settings.

Examples:

# Train a model using the 'example_diffusion_config.yaml' configuration
MolCraftDiff train example_diffusion_config

# Generate molecules using the 'my_generation_config.yaml' configuration
MolCraftDiff generate my_generation_config

# Predict properties using a trained model
MolCraftDiff predict my_prediction_config

# Compile molecular data into an ASE database
MolCraftDiff data prepare compile -s data_dir/ -d dataset.db

Getting Help:

To see the main help message and a list of all commands:

MolCraftDiff --help

To get help for a specific command:

MolCraftDiff train --help

2. Direct Script Execution

You can also execute the scripts in the scripts/ directory directly.

Training:

python scripts/train.py tasks=[TASK]

where TASK is one of the following: diffusion, guidance, regression.

Generation:

python scripts/generate.py interference=[INTERFERENCE]

where INTERFERENCE is one of the following: gen_cfg, gen_cfggg, gen_conditional, gen.

Prediction:

python scripts/predict.py

3. Analysis & Post-processing

The analyze command provides a suite of tools for processing and evaluating generated molecules.

Subcommands:

  • optimize: Optimize molecular geometries using GFN-xTB.
  • metrics: Compute validity and connectivity metrics.
  • compare: Calculate RMSD, energy differences, and geometric properties (bonds/angles) between generated and reference structures.
  • xyz2mol: Convert XYZ files to SMILES and extract fingerprints/scaffolds.

Examples:

# Optimize geometries in a directory
MolCraftDiff analyze optimize -i generated_molecules/

# Compute validity metrics
MolCraftDiff analyze metrics -i generated_molecules/

# Compare generated structures with ground truth (requires optimized counterparts)
MolCraftDiff analyze compare generated_molecules/ --bonds

# Convert XYZ to SMILES
MolCraftDiff analyze xyz2mol -x generated_molecules/

Visualization

Generated 3D molecules and their properties can be visualized using the 3DMolViewer package.

We also recommend our in-house and lightweight X11 molecular viewer V package.

Tutorials

Tutorials are now hosted in the docs site: https://preghosh.github.io/MolCraftDiffusion/

The local tutorials/ directory is deprecated and will be removed in a future release.

Project Structure

├── .project-root
├── justfile
├── pyproject.toml
├── README.md
├── setup.py
└── src
    └── MolecularDiffusion
       ├── __init__.py
       ├── _version.py
       ├── molcraftdiff.py
       ├── callbacks
       │   ├── __init__.py
       │   └── train_helper.py
       ├── cli
       │   ├── __init__.py
       │   ├── analyze.py
       │   ├── eval_predict.py
       │   ├── generate.py
       │   ├── main.py
       │   ├── predict.py
       │   └── train.py
       ├── configs
       │   ├── data
       │   ├── hydra
       │   ├── interference
       │   ├── logger
       │   ├── tasks
       │   └── trainer
       ├── core
       │   ├── __init__.py
       │   ├── core.py
       │   ├── engine.py
       │   ├── logger.py
       │   └── meter.py
       ├── data
       │   ├── __init__.py
       │   ├── dataloader.py
       │   ├── dataset.py
       │   └── component
       ├── modules
       │   ├── __init__.py
       │   ├── layers
       │   ├── models
       │   └── tasks
       ├── runmodes
       │   ├── __init__.py
       │   ├── analyze
       │   │   ├── __init__.py
       │   │   ├── compute_energy_rmsd.py
       │   │   ├── compute_metrics.py
       │   │   ├── compute_pair_geometry.py
       │   │   ├── xtb_optimization.py
       │   │   └── xyz2mol.py
       │   ├── generate
       │   └── train
       └── utils
           ├── __init__.py
           ├── comm.py
           ├── diffusion_utils.py
           ├── file.py
           ├── geom_analyzer.py
           ├── geom_constant.py
           ├── geom_constraint.py
           ├── geom_metrics.py
           ├── geom_utils.py
           ├── io.py
           ├── molgraph_utils.py
           ├── plot_function.py
           ├── pretty.py
           ├── sascore.py
           ├── smilify.py
           └── torch.py

License

This project is licensed under the MIT License.

Citation

If you use MolecularDiffusion in your research, please cite the following:

ChemRxiv: MolecularDiffusion: A Unified Generative-AI Framework for 3D Molecular Design

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

molcraftdiffusion-1.2.0.tar.gz (548.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

molcraftdiffusion-1.2.0-py3-none-any.whl (657.4 kB view details)

Uploaded Python 3

File details

Details for the file molcraftdiffusion-1.2.0.tar.gz.

File metadata

  • Download URL: molcraftdiffusion-1.2.0.tar.gz
  • Upload date:
  • Size: 548.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.19

File hashes

Hashes for molcraftdiffusion-1.2.0.tar.gz
Algorithm Hash digest
SHA256 cb743ddbae22a84b440745c7174a1e1be4c8c942e61c8498e28ac1e62f92df61
MD5 8ad1f68dc4001058c119b55824d021d0
BLAKE2b-256 d52b335c1a87137c14b93f91e88c59a86ede89134f045c6d779c14e9a7b3d840

See more details on using hashes here.

File details

Details for the file molcraftdiffusion-1.2.0-py3-none-any.whl.

File metadata

File hashes

Hashes for molcraftdiffusion-1.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 91470713c719162d0b10bd0842566bb98ac1f850be7335ce1d07b124e44d8871
MD5 2009b7b654fa83e2a17b43cc0e592dc9
BLAKE2b-256 1ec926f4938da408dad3766165356833d378bf8a8d6abbc9c2e2a42142095f13

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page